Thanks! Arthur and Michael.

>From Michael's suggestion, I use Linear Regression to train the datasets,
which achieves what I want. But the CCA result is completely deviates from
Linear Regression.

Here's what I do.

*# training stages for Linear Regression*
*regr = linear_model.LinearRegression()*
*regr.fit(features, labels)*

*# training stages for CCA*
*cca = CCA(n_components = 1, max_iter = 500000000)*
*cca.fit(features, labels) *

Then *predict* method is used to evaluate final result like the following.

*regr.predict(feature)*
*cca.predict(feature)*

But the result varies quite a lot from different regression methods and
linear regression is much more close to the labels.


e.g.
CCA, LR
-472.411445, 31.447136,
-164.174335, 32.793054,
-108.513509, 33.019758,
-143.083823, 20.058607,
2.047881, 35.981544,
-335.829902, 30.801075,
-341.004312, 30.299629,
-340.211106, 26.244057,
-165.895824, 32.845650,

May I know what's wrong within this?

Thanks a lot in advance.


2015-10-20 14:22 GMT+08:00 Michael Eickenberg <michael.eickenb...@gmail.com>
:

> also, using CCA on a 1D Y is the same as linear regression. So you probably
> do that instead
>
>
> On Tuesday, October 20, 2015, Arthur Mensch <arthur.men...@inria.fr>
> wrote:
>
>> Hi Dai,
>>
>> CCA finds the vectors in x space and Y space that maximizes the
>> correlation corr(u' X, v' Y), and continues finding such vectors under the
>> constrain that (u_i)_i, (v_i)_i are orthogonal.
>>
>> As in your case dim Y = 1, you can only set n_components = 1: the vector
>> v will be [1], and u will be the linear combination u that maximizes
>> corr(u' X, Y). I guess it should be in the doc.
>>
>> You cannot find more than one pair of vector (u, v) as v is already a
>> basis of Y space. Y variability is entirely explained with (u, v) only,
>> hence the warning.
>> Le 20 oct. 2015 06:52, "Dai Yan" <kanshu...@gmail.com> a écrit :
>>
>>> Hello,
>>>
>>>
>>> I hope use CCA(Canonical Correlation Analysis)  to fit problem set with
>>> size of (35000, 117) to its label (35000, 1), 35000 is samples and 117 is
>>> feature dimension per sample.
>>>
>>> Now I have the following two problems.
>>>
>>> 1) How to choose appropriate CCA n_compoents parameter to fix my samples?
>>>
>>>
>>>
>>> 2) When classifying with n_components = 1 or n_components = 2, the fit
>>> procedure quits with the following messages,
>>>
>>> /usr/local/lib/python2.7/dist-packages/sklearn/cross_decomposition/pls_.py:277:
>>> UserWarning: Y residual constant at iteration 1
>>>   warnings.warn('Y residual constant at iteration %s' % k)
>>>
>>> And here I paste some of my codes to show CCA initialization parameters.
>>>
>>> *cca = CCA(max_iter=500000, tol=1e-05)*
>>> *cca.fit(features, labels) # features.shape = [35000, 117], labels =
>>> [35000, 1]*
>>>
>>>
>>> Could you give me some hints on this?
>>>
>>>
>>> Thanks,
>>>
>>> Yan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to