hello ian, can you show a snippet of the code you use to train your svm? and give us the dimensions of your problem?
Alex On Mon, Dec 5, 2011 at 9:51 PM, Ian Goodfellow <[email protected]> wrote: > On Fri, Dec 2, 2011 at 3:36 AM, Olivier Grisel <[email protected]> > wrote: >> 2011/12/2 Ian Goodfellow <[email protected]>: >>> On Fri, Oct 7, 2011 at 5:14 AM, Olivier Grisel <[email protected]> >>> wrote: >>>> 2011/10/7 Ian Goodfellow <[email protected]>: >>>>> Thanks. Yes it does appear that liblinear uses only a 64 bit dense format, >>>>> so this memory usage is normal/caused by the implementation of liblinear. >>>>> >>>>> You may want to update the documentation hosted at this site: >>>>> http://scikit-learn.sourceforge.net/modules/svm.html# >>>>> >>>>> It has a section on "avoiding data copy" which only says that the data >>>>> should be C contiguous. >>>> >>>> Thanks for the report, this should be fixed: >>>> >>>> https://github.com/scikit-learn/scikit-learn/commit/bf68b538e8fe251303fc0f7469aad6e8bf56a1d0 >>>> >>>>> It looks like there's a different implementation of libsvm that uses a >>>>> dense >>>>> format so I'll look into using that. >>>> >>>> Yes the libsvm in scikit-learn can use both dense and sparse inputs >>>> (actually we embed both the original sparse implementation and a dense >>>> fork of it). >>> >>> How can I access the dense version of libsvm via scikit-learn? >> >> Those are the classes at the sklearn.svm level as opposed to the >> classes at the sklearn.svm.sparse level. > > This would seem to invalidate the explanation of the excessive memory > consumption of the sklearn.svm classes earlier in this e-mail thread. > If I've been using the dense version all along, why is the memory > consumption so high? > > If I train using an 11 GB design matrix, it ends up getting killed on > a machine with 64 GB of RAM. If the only issue were converting to 64 > bit it ought to use on the order of 33 GB of RAM (11 to hold the > original data and 22 to hold the converted data). Does the training > algorithm itself construct very large data structures for intermediate > results? Is there a way to verify that sklearn is using dense libsvm > under the hood? The conversion to sparse matrices sounded like a > fairly plausible explanation for the memory consumption I was seeing. > >> >> -- >> Olivier >> http://twitter.com/ogrisel - http://github.com/ogrisel >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure >> contains a definitive record of customers, application performance, >> security threats, fraudulent activity, and more. Splunk takes this >> data and makes sense of it. IT sense. And common sense. >> http://p.sf.net/sfu/splunk-novd2d >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
