Re: [Scikit-learn-general] Caching dataset

Mathieu Blondel Wed, 09 Nov 2011 05:20:53 -0800

I completely agree that the current API goes against the philosophy of
SVMs (sparse solutions). When kernel="precomputed", we should accept
both n_test x n_train and n_test x n_sv arrays in predict. Can you
file a bug report?


In the mean time, you can use a n_test x n_train matrix but compute
only the values for the pairs that involve support vectors like here:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/tests/test_svm.py#L137

Mathieu

On Wed, Nov 9, 2011 at 8:11 PM, Matt Henderson <[email protected]> wrote:
> Hi,
> That's true, in a way I may as well precompute them all- because I never
> know which ones are going to come up.  It would make sense though in general
> not to require such a big calculation.
> Cheers,
> Matt
>
> On 9 November 2011 11:07, Andreas Müller <[email protected]> wrote:
>>
>> Hi Matt.
>> I had a similar setup as you did once. As my kernel was very slow, it
>> helped a lot -
>> though I precomputed all kernel values.
>> I'm pretty sure the underlying libsvm supports only providing the kernel
>> values
>> at the support vectors. I'm not sure why this is not supported by sklearn
>> at the moment.
>>
>> On the other hand, that wouldn't really help for your use case.
>> If you use different classifiers and parameter combinations, each of the
>> classifiers
>> will have different sets of support vectors and you could not precompute
>> the kernel for all of them.
>>
>> Cheers,
>> Andy
>>
>> On 11/09/2011 11:47 AM, Matt Henderson wrote:
>>
>> Hi Andy,
>> Thanks for the example.  I actually started experimenting with defining my
>> own python function kernel, which caches its results so that it is fast once
>> it has already been called once with the same input.  (Useful since I am
>> training on the same data multiple classifiers and comparing different
>> parameters..)
>> I noticed that at testing, the kernel gets called with the test data and
>> ALL of the training data, like you mentioned.
>> That could make things a lot slower than they need to be, right?  I
>> thought one of the main advantages of SVMs is the sparse representation of
>> the training set which they derive- and this is being lost apparently.
>> Cheers,
>> Matt
>>
>> On 9 November 2011 10:25, Andreas Müller <[email protected]> wrote:
>>>
>>> Hi Matt.
>>> Did you figure it out yet?
>>>
>>> Here is an example:
>>> https://gist.github.com/1351047
>>> It seems that at the moment, you have to use the whole training set to
>>> generate the kernel at test time.
>>> Not sure why, maybe for ease of use.
>>>
>>> Can anyone comment on that?
>>>
>>> Cheers,
>>> Andy
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> RSA(R) Conference 2012
>>> Save $700 by Nov 18
>>> Register now
>>> http://p.sf.net/sfu/rsa-sfdev2dev1
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> RSA(R) Conference 2012
>> Save $700 by Nov 18
>> Register now
>> http://p.sf.net/sfu/rsa-sfdev2dev1
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> RSA(R) Conference 2012
>> Save $700 by Nov 18
>> Register now
>> http://p.sf.net/sfu/rsa-sfdev2dev1
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Caching dataset

Reply via email to