Hi! I've noticed that PLSRegression seems to cross validate incredibly poorly when scale=True. Could there be a bug here, or is there something I'm not getting this time, too? I noticed the very small (i.e. large negative) cross validation scores on a dataset that was far from unit variance; there, too, cross validation was extremely poor: around 0.4 in score when scaling was disabled, but (for example) -54422617.41005663 when scaling was enabled!
In [1]: import numpy as np In [2]: from sklearn import cross_decomposition In [3]: x = np.random.random((10,17)) In [4]: y = np.random.random((10, 3)) In [5]: pls = cross_decomposition.PLSRegression(scale=True) In [6]: pls.fit(x,y) Out[6]: PLSRegression(copy=True, max_iter=500, n_components=2, scale=True, tol=1e-06) In [7]: from sklearn import model_selection In [8]: model_selection.cross_val_score(pls, x, y) Out[8]: array([-10.1680294 , -12.94229352, -13.39506559]) In [9]: pls = cross_decomposition.PLSRegression(scale=False) In [10]: model_selection.cross_val_score(pls, x, y) Out[10]: array([-0.5904095 , -1.16551493, -1.71555855]) Cheers Paul
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
