Hi,
As the author of the quoted sentence from the documentation, I must say
that this is a bit personal wrt to my own experience/goals and it needs to
be corrected with more objective facts (eg about the computational
complexity as Mathieu mentionned). By "efficiency", I meant prediction
power in terms of score... But this is a fact: regression in
high-dimensional spaces is hard anyway. Or it requires many samples.
I agree that the computational complexity suffers actually more from
n_samples than n_features. It makes sense. However I think that for
anisotropic kernels (ie when you use componentwise tensor product kernels),
n_features does have an influence on the prediction time, doesn't it ?
So, Tao, do not hesitate to use GPML for high-dimensional problems!... But
do not expect better (or worse) performance than Support Vector
Regression... IMHO, with a good fitting technique for both predictors you
can achieve similar scores... except GPML's kernels have this anisotropy
feature which can make the difference on your data!? Tell us!
Cheers,
Vincent
2012/3/26 Tao-wei Huang <[email protected]>
> Hi Mathieu,
>
> Thank you for your reply. If it's expensive in terms of sample size, it
> totally make senses for me. However, I am still confused by the statement
> in scikit-learn documents:
>
> "It loses efficiency in high dimensional spaces – namely when the number
> of features exceeds a few dozens. It might indeed give poor performance and
> it loses computational efficiency."
> http://scikit-learn.org/stable/modules/gaussian_process.html
>
> Even if 'the number of features' here refers to the number of sample size,
> I don't think the model would be inefficient just with sample numbers over
> dozens. Could you or anyone else make it clear for me please? thanks!
>
> Cheers,
> Tao
>
>
>
>
> On Mon, Mar 26, 2012 at 10:24 AM, Mathieu Blondel <[email protected]>wrote:
>
>> If I'm not mistaken, Gaussian Processes are expensive for large
>> n_samples, not for large n_features. The reason is because the kernel
>> matrix (called covariance matrix in the GP literature) needs to be
>> inversed, which takes O(n_samples^3) complexity with a Cholesky
>> decomposition. That said, kernels methods like SVMs or Gaussian Processes
>> are usually not used much with high-dimensional data. Kernels are useful to
>> implicitly project low-dimensional data to higher (even infinite)
>> dimensional spaces. If your data is already high-dimensional, there's
>> nothing to gain from using kernels. A good example is text classification,
>> where everyone is using linear kernels.
>>
>> HTH,
>> Mathieu
>>
>>
>> ------------------------------------------------------------------------------
>> This SF email is sponsosred by:
>> Try Windows Azure free for 90 days Click Here
>> http://p.sf.net/sfu/sfd2d-msazure
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general