Hi

I'm trying to replicate some dimension reduction results computed with
Matlab's plsregress using sklearn's PLSRegression. However, I'm finding
that the output of the transform method in sklearn's PLSRegression differs
from Matlab results by a constant scale factor across each component
(constant across features but different across components).

I used some dummy data that I could load in Matlab to test this. I found
that if I normalized (with zscores) the sklearn and Matlab's outputs, I got
the same results (see attached figures). I have attached the code that can
replicate this. The whole test can be run from testPLS.m (you need matlab
2014+).

I'm using python3.5 64bit in Windows with the Anaconda environment and
 sklearn 0.17.1-np110py35_1

Thanks

- Fernando
clear classes;
clear;
clc;
close all;

% you should have python installed
pyversion;

% adds current folder to MATLAB's python search path (kludge: current
% folder must contain langModelMod)
if count(py.sys.path,'') == 0
    insert(py.sys.path,int32(0),'');
end

% Reload python module
mod = py.importlib.import_module('testPLS');
py.importlib.reload(mod);

% Load dummy data
load spectra
X = NIR;
y = octane;

% Choose 10 components
nc = 6;

% Apply matlabs pls regress with SIMPLS
[~,~,XS_matlab,~,~,~,~,stats] = plsregress((X),(y),nc);

% XS_matlab = stats.W;

% Apply sklearn pls
XS_sklearn = py.testPLS.testPLS(toggleNumpy(X),toggleNumpy(y),int32(nc));
XS_sklearn = toggleNumpy(XS_sklearn);

XS_matlab = bsxfun(@minus,XS_matlab, mean(XS_matlab,1));
XS_sklearn = bsxfun(@minus,XS_sklearn, mean(XS_sklearn,1));

% Compute z-scores
XS_sklearn_norm = zscore(XS_sklearn);
XS_matlab_norm = zscore(XS_matlab);

% Plot chosen components
for idxC = 1:nc
    figure(idxC)
    subplot(2,1,1)
    plot(XS_sklearn(:,idxC))
    hold on
    plot(XS_matlab(:,idxC), '--')
    hold off
    title(['(may be inverted) Unnormalized reduced data for component ' 
num2str(idxC)])
    legend('sklearn','matlab')
    xlabel('feature')
    ylabel('amplitude')
    
    subplot(2,1,2)
    plot(XS_sklearn_norm(:,idxC))
    hold on
    plot(XS_matlab_norm(:,idxC), '--')
    hold off
    title(['(may be inverted) z-scores for reduced data for component ' 
num2str(idxC)])
    legend('sklearn','matlab')
    xlabel('feature')
    ylabel('amplitude')
end

Attachment: testPLS.py
Description: Binary data

function outArray = toggleNumpy(inArray, varargin)
% matlab is a bit lame when it comes to converting to numpy, it only takes
% vectors:
%
% 
http://www.mathworks.com/help/matlab/matlab_external/passing-data-to-python.html
%
% this function toggles and array between a numpy and MATLAB state

p = inputParser;
p.addParameter('verboseFlag', true, @islogical);
p.parse(varargin{:});

if isnumeric(inArray)
    % MATLAB input given, build 2d python numpy array
    outArray = py.numpy.array(inArray(:)');
    outArray = outArray.reshape(size(inArray), pyargs('order','F'));
    return
end

% Python numpy array given, convert to MATLAB array

dim = double(py.len(inArray.shape));
assert(ismember(dim, 1:2), 'only 1 or 2 dimensional array supported (empty?)');
% we may also support 3d+ arrays ... just haven't tested yet

% 
http://www.mathworks.com/matlabcentral/answers/157347-convert-python-numpy-array-to-double
% d is for double, see link below on types
outArray = double(py.array.array('d', py.numpy.nditer(inArray)));
shape = double(py.array.array('d', py.list(inArray.shape)));

switch dim
    case 1
        % python has 1d arrays (all of MATLABs arrays are at least 2d),
        % for this reason there is ambiguity as to whether we should
        % make a row or column vector out of a 1d array ...
        if p.Results.verboseFlag
            warning('1d numpy array passed, building col vector (maybe input 
was row?)');
        end
        outArray = outArray(:);
    case 2        
        outArray = reshape(outArray, fliplr(shape)).';
    otherwise
        error('not supported')
end

end
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to