hello ian,

can you show a snippet of the code you use to train your svm?
and give us the dimensions of your problem?

Alex

On Mon, Dec 5, 2011 at 9:51 PM, Ian Goodfellow <[email protected]> wrote:
> On Fri, Dec 2, 2011 at 3:36 AM, Olivier Grisel <[email protected]> 
> wrote:
>> 2011/12/2 Ian Goodfellow <[email protected]>:
>>> On Fri, Oct 7, 2011 at 5:14 AM, Olivier Grisel <[email protected]> 
>>> wrote:
>>>> 2011/10/7 Ian Goodfellow <[email protected]>:
>>>>> Thanks. Yes it does appear that liblinear uses only a 64 bit dense format,
>>>>> so this memory usage is normal/caused by the implementation of liblinear.
>>>>>
>>>>> You may want to update the documentation hosted at this site:
>>>>> http://scikit-learn.sourceforge.net/modules/svm.html#
>>>>>
>>>>> It has a section on "avoiding data copy" which only says that the data
>>>>> should be C contiguous.
>>>>
>>>> Thanks for the report, this should be fixed:
>>>>
>>>>  https://github.com/scikit-learn/scikit-learn/commit/bf68b538e8fe251303fc0f7469aad6e8bf56a1d0
>>>>
>>>>> It looks like there's a different implementation of libsvm that uses a 
>>>>> dense
>>>>> format so I'll look into using that.
>>>>
>>>> Yes the libsvm in scikit-learn can use both dense and sparse inputs
>>>> (actually we embed both the original sparse implementation and a dense
>>>> fork of it).
>>>
>>> How can I access the dense version of libsvm via scikit-learn?
>>
>> Those are the classes at the sklearn.svm level as opposed to the
>> classes at the sklearn.svm.sparse level.
>
> This would seem to invalidate the explanation of the excessive memory
> consumption of the sklearn.svm classes earlier in this e-mail thread.
> If I've been using the dense version all along, why is the memory
> consumption so high?
>
> If I train using an 11 GB design matrix, it ends up getting killed on
> a machine with 64 GB of RAM. If the only issue were converting to 64
> bit it ought to use on the order of 33 GB of RAM (11 to hold the
> original data and 22 to hold the converted data). Does the training
> algorithm itself construct very large data structures for intermediate
> results? Is there a way to verify that sklearn is using dense libsvm
> under the hood? The conversion to sparse matrices sounded like a
> fairly plausible explanation for the memory consumption I was seeing.
>
>>
>> --
>> Olivier
>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure
>> contains a definitive record of customers, application performance,
>> security threats, fraudulent activity, and more. Splunk takes this
>> data and makes sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-novd2d
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to