Re: [Scikit-learn-general] why Gaussian Processes is not good for high-dimensional problems

Martin Fergie Mon, 26 Mar 2012 14:33:29 -0700

Hi Tao,

I have a fair bit of experience using GPs with 300-400 features and
100-150 samples for performing regression. Generally they tend to cope
quite well in these scenarios. You need to carefully choose your
kernel, if there are lots of irrelevant features, then you should use
an anisotropic kernel as mentioned above. This learns an individual
weight for each feature, leading to better predictive accuracy.


In contrast, if your features are relatively reliable, then sometimes
using an isotropic kernel will give better results, With regard to
computational complexity, GPs scale very poorly in terms of n_samples,
so are only applicable in certain contexts. Scaling with features
varies depending on your choice of kernel, an anisotropic kernel has
to learn a hyper parameter for each feature, as such takes longer to
train than an isotropic kernel. In my work I've noticed this to be
very significant! However prediction time should be the same.
Just have an experiment with different kernels, and see what works well.

(Disclaimer, I've not used the scikit's implementation, my experience
comes mainly from the GPML matlab package).

Hope this helps,
Martin

On 26 March 2012 17:13, Aman Thakral <[email protected]> wrote:
>
>
> On Mon, Mar 26, 2012 at 12:02 PM, Vincent Dubourg
> <[email protected]> wrote:
>>
>> Hi,
>>
>> As the author of the quoted sentence from the documentation, I must say
>> that this is a bit personal wrt to my own experience/goals and it needs to
>> be corrected with more objective facts (eg about the computational
>> complexity as Mathieu mentionned). By "efficiency", I meant prediction power
>> in terms of score... But this is a fact: regression in high-dimensional
>> spaces is hard anyway. Or it requires many samples.
>>
>> I agree that the computational complexity suffers actually more from
>> n_samples than n_features. It makes sense. However I think that for
>> anisotropic kernels (ie when you use componentwise tensor product kernels),
>> n_features does have an influence on the prediction time, doesn't it ?
>>
>> So, Tao, do not hesitate to use GPML for high-dimensional problems!... But
>> do not expect better (or worse) performance than Support Vector
>> Regression... IMHO, with a good fitting technique for both predictors you
>> can achieve similar scores... except GPML's kernels have this anisotropy
>> feature which can make the difference on your data!? Tell us!
>>
>> Cheers,
>> Vincent
>>
>> 2012/3/26 Tao-wei Huang <[email protected]>
>>>
>>> Hi Mathieu,
>>>
>>> Thank you for your reply. If it's expensive in terms of sample size, it
>>> totally make senses for me. However, I am still confused by the statement in
>>> scikit-learn documents:
>>>
>>> "It loses efficiency in high dimensional spaces – namely when the number
>>> of features exceeds a few dozens. It might indeed give poor performance and
>>> it loses computational efficiency."
>>> http://scikit-learn.org/stable/modules/gaussian_process.html
>>>
>>> Even if 'the number of features' here refers to the number of sample
>>> size, I don't think the model would be inefficient just with sample numbers
>>> over dozens. Could you or anyone else make it clear for me please? thanks!
>>>
>>> Cheers,
>>> Tao
>>>
>>>
>>>
>>>
>>> On Mon, Mar 26, 2012 at 10:24 AM, Mathieu Blondel <[email protected]>
>>> wrote:
>>>>
>>>> If I'm not mistaken, Gaussian Processes are expensive for large
>>>> n_samples, not for large n_features. The reason is because the kernel 
>>>> matrix
>>>> (called covariance matrix in the GP literature) needs to be inversed, which
>>>> takes O(n_samples^3) complexity with a Cholesky decomposition. That said,
>>>> kernels methods like SVMs or Gaussian Processes are usually not used much
>>>> with high-dimensional data. Kernels are useful to implicitly project
>>>> low-dimensional data to higher (even infinite) dimensional spaces. If your
>>>> data is already high-dimensional, there's nothing to gain from using
>>>> kernels. A good example is text classification, where everyone is using
>>>> linear kernels.
>>>>
>>>> HTH,
>>>> Mathieu
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF email is sponsosred by:
>>>> Try Windows Azure free for 90 days Click Here
>>>> http://p.sf.net/sfu/sfd2d-msazure
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF email is sponsosred by:
>>> Try Windows Azure free for 90 days Click Here
>>> http://p.sf.net/sfu/sfd2d-msazure
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> This SF email is sponsosred by:
>> Try Windows Azure free for 90 days Click Here
>> http://p.sf.net/sfu/sfd2d-msazure
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> Hi,
>
> I have experience using PLS for high dimensional regression (which is now
> part of scikit-learn), with relatively few observations, and my results have
> been promising.  I've also written a PLS algorithm which uses pandas that I
> have used to solve several problems in my domain (examining the effects of
> weather on crop disease and crop yield). PLS has been used a lot in
> chemometrics, as well as for analyzing DNA microarray data (which is very
> high dimensional with very few observations), and some applications is
> neuroscience. If you're interested, I can try to dig up some resources.
>
> Cheers,
> Aman
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] why Gaussian Processes is not good for high-dimensional problems

Reply via email to