Hi!

I've noticed that PLSRegression seems to cross validate incredibly poorly when 
scale=True. Could there be a bug here, or is there something I'm not getting 
this time, too? I noticed the very small (i.e. large negative) cross validation 
scores on a dataset that was far from unit variance; there, too, cross 
validation was extremely poor: around 0.4 in score when scaling was disabled, 
but (for example) -54422617.41005663 when scaling was enabled!

In [1]: import numpy as np

In [2]: from sklearn import cross_decomposition

In [3]: x = np.random.random((10,17))

In [4]: y = np.random.random((10, 3))

In [5]: pls = cross_decomposition.PLSRegression(scale=True)

In [6]: pls.fit(x,y)
Out[6]: PLSRegression(copy=True, max_iter=500, n_components=2, scale=True, 
tol=1e-06)

In [7]: from sklearn import model_selection

In [8]: model_selection.cross_val_score(pls, x, y)
Out[8]: array([-10.1680294 , -12.94229352, -13.39506559])

In [9]: pls = cross_decomposition.PLSRegression(scale=False)

In [10]: model_selection.cross_val_score(pls, x, y)
Out[10]: array([-0.5904095 , -1.16551493, -1.71555855])

Cheers
Paul

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to