Hi I'm trying to replicate some dimension reduction results computed with Matlab's plsregress using sklearn's PLSRegression. However, I'm finding that the output of the transform method in sklearn's PLSRegression differs from Matlab results by a constant scale factor across each component (constant across features but different across components).
I used some dummy data that I could load in Matlab to test this. I found that if I normalized (with zscores) the sklearn and Matlab's outputs, I got the same results (see attached figures). I have attached the code that can replicate this. The whole test can be run from testPLS.m (you need matlab 2014+). I'm using python3.5 64bit in Windows with the Anaconda environment and sklearn 0.17.1-np110py35_1 Thanks - Fernando
clear classes; clear; clc; close all; % you should have python installed pyversion; % adds current folder to MATLAB's python search path (kludge: current % folder must contain langModelMod) if count(py.sys.path,'') == 0 insert(py.sys.path,int32(0),''); end % Reload python module mod = py.importlib.import_module('testPLS'); py.importlib.reload(mod); % Load dummy data load spectra X = NIR; y = octane; % Choose 10 components nc = 6; % Apply matlabs pls regress with SIMPLS [~,~,XS_matlab,~,~,~,~,stats] = plsregress((X),(y),nc); % XS_matlab = stats.W; % Apply sklearn pls XS_sklearn = py.testPLS.testPLS(toggleNumpy(X),toggleNumpy(y),int32(nc)); XS_sklearn = toggleNumpy(XS_sklearn); XS_matlab = bsxfun(@minus,XS_matlab, mean(XS_matlab,1)); XS_sklearn = bsxfun(@minus,XS_sklearn, mean(XS_sklearn,1)); % Compute z-scores XS_sklearn_norm = zscore(XS_sklearn); XS_matlab_norm = zscore(XS_matlab); % Plot chosen components for idxC = 1:nc figure(idxC) subplot(2,1,1) plot(XS_sklearn(:,idxC)) hold on plot(XS_matlab(:,idxC), '--') hold off title(['(may be inverted) Unnormalized reduced data for component ' num2str(idxC)]) legend('sklearn','matlab') xlabel('feature') ylabel('amplitude') subplot(2,1,2) plot(XS_sklearn_norm(:,idxC)) hold on plot(XS_matlab_norm(:,idxC), '--') hold off title(['(may be inverted) z-scores for reduced data for component ' num2str(idxC)]) legend('sklearn','matlab') xlabel('feature') ylabel('amplitude') end
testPLS.py
Description: Binary data
function outArray = toggleNumpy(inArray, varargin) % matlab is a bit lame when it comes to converting to numpy, it only takes % vectors: % % http://www.mathworks.com/help/matlab/matlab_external/passing-data-to-python.html % % this function toggles and array between a numpy and MATLAB state p = inputParser; p.addParameter('verboseFlag', true, @islogical); p.parse(varargin{:}); if isnumeric(inArray) % MATLAB input given, build 2d python numpy array outArray = py.numpy.array(inArray(:)'); outArray = outArray.reshape(size(inArray), pyargs('order','F')); return end % Python numpy array given, convert to MATLAB array dim = double(py.len(inArray.shape)); assert(ismember(dim, 1:2), 'only 1 or 2 dimensional array supported (empty?)'); % we may also support 3d+ arrays ... just haven't tested yet % http://www.mathworks.com/matlabcentral/answers/157347-convert-python-numpy-array-to-double % d is for double, see link below on types outArray = double(py.array.array('d', py.numpy.nditer(inArray))); shape = double(py.array.array('d', py.list(inArray.shape))); switch dim case 1 % python has 1d arrays (all of MATLABs arrays are at least 2d), % for this reason there is ambiguity as to whether we should % make a row or column vector out of a 1d array ... if p.Results.verboseFlag warning('1d numpy array passed, building col vector (maybe input was row?)'); end outArray = outArray(:); case 2 outArray = reshape(outArray, fliplr(shape)).'; otherwise error('not supported') end end
------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohomanageengine
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general