Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-06 Thread Olivier Grisel
2011/12/6 Andreas Mueller : > On 12/06/2011 04:55 AM, Gael Varoquaux wrote: >> On Mon, Dec 05, 2011 at 10:54:42PM +0100, Olivier Grisel wrote: >>> - libsvm uses SMO (a dual solver) and supports non-linear kernels and >>> has complexity ~ n_samples^3 hence cannot scale to large n_samples >>> (e.g. m

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-06 Thread Andreas Mueller
On 12/06/2011 04:55 AM, Gael Varoquaux wrote: > On Mon, Dec 05, 2011 at 10:54:42PM +0100, Olivier Grisel wrote: >> - libsvm uses SMO (a dual solver) and supports non-linear kernels and >> has complexity ~ n_samples^3 hence cannot scale to large n_samples >> (e.g. more than 50k). >> - liblinear uses

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-05 Thread Gael Varoquaux
On Mon, Dec 05, 2011 at 10:54:42PM +0100, Olivier Grisel wrote: > - libsvm uses SMO (a dual solver) and supports non-linear kernels and > has complexity ~ n_samples^3 hence cannot scale to large n_samples > (e.g. more than 50k). > - liblinear uses some kind of fancy coordinate descent (primal or du

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-05 Thread Olivier Grisel
2011/12/5 Alexandre Gramfort : > look at > > sklearn.multiclass Indeed, these tools allows the user to build a meta learner with any multiclass logic on top of a binary classifier implementations (hence both LinearSVC and SVC can be used as the underlying binary classifier implementations). htt

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-05 Thread Olivier Grisel
2011/12/5 Ian Goodfellow : > > ok, I was using LinearSVC, so I guess I am still not using the dense > implementation. > > Is there a way to use one-against-rest rather than one-against-many > classification with the SVC class? What is one-against-many? SVC mutliclass support comes directly from th

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-05 Thread Alexandre Gramfort
look at sklearn.multiclass Alex On Mon, Dec 5, 2011 at 10:37 PM, Ian Goodfellow wrote: > On Mon, Dec 5, 2011 at 4:24 PM, Olivier Grisel > wrote: >> 2011/12/5 Ian Goodfellow : >>> On Fri, Dec 2, 2011 at 3:36 AM, Olivier Grisel >>> wrote: 2011/12/2 Ian Goodfellow : > On Fri, Oct 7, 2

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-05 Thread Ian Goodfellow
On Mon, Dec 5, 2011 at 4:24 PM, Olivier Grisel wrote: > 2011/12/5 Ian Goodfellow : >> On Fri, Dec 2, 2011 at 3:36 AM, Olivier Grisel >> wrote: >>> 2011/12/2 Ian Goodfellow : On Fri, Oct 7, 2011 at 5:14 AM, Olivier Grisel wrote: > 2011/10/7 Ian Goodfellow : >> Thanks. Yes it d

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-05 Thread Olivier Grisel
2011/12/5 Ian Goodfellow : > On Fri, Dec 2, 2011 at 3:36 AM, Olivier Grisel > wrote: >> 2011/12/2 Ian Goodfellow : >>> On Fri, Oct 7, 2011 at 5:14 AM, Olivier Grisel >>> wrote: 2011/10/7 Ian Goodfellow : > Thanks. Yes it does appear that liblinear uses only a 64 bit dense format, >

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-05 Thread Alexandre Gramfort
hello ian, can you show a snippet of the code you use to train your svm? and give us the dimensions of your problem? Alex On Mon, Dec 5, 2011 at 9:51 PM, Ian Goodfellow wrote: > On Fri, Dec 2, 2011 at 3:36 AM, Olivier Grisel > wrote: >> 2011/12/2 Ian Goodfellow : >>> On Fri, Oct 7, 2011 at 5:

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-05 Thread Ian Goodfellow
On Fri, Dec 2, 2011 at 3:36 AM, Olivier Grisel wrote: > 2011/12/2 Ian Goodfellow : >> On Fri, Oct 7, 2011 at 5:14 AM, Olivier Grisel >> wrote: >>> 2011/10/7 Ian Goodfellow : Thanks. Yes it does appear that liblinear uses only a 64 bit dense format, so this memory usage is normal/caused

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-02 Thread Olivier Grisel
2011/12/2 Ian Goodfellow : > On Fri, Oct 7, 2011 at 5:14 AM, Olivier Grisel > wrote: >> 2011/10/7 Ian Goodfellow : >>> Thanks. Yes it does appear that liblinear uses only a 64 bit dense format, >>> so this memory usage is normal/caused by the implementation of liblinear. >>> >>> You may want to u

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-12-01 Thread Ian Goodfellow
On Fri, Oct 7, 2011 at 5:14 AM, Olivier Grisel wrote: > 2011/10/7 Ian Goodfellow : >> Thanks. Yes it does appear that liblinear uses only a 64 bit dense format, >> so this memory usage is normal/caused by the implementation of liblinear. >> >> You may want to update the documentation hosted at thi

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-10 Thread Fabian Pedregosa
On Fri, Oct 7, 2011 at 5:07 AM, Olivier Grisel wrote: > 2011/10/7 Ian Goodfellow : >> I understand that LinearSVC is implemented using liblinear, which I thought >> should work well with large datasets. However, when I pass LinearSVC.fit a >> design matrix of size 40,000 x 14,400 (in float32 forma

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-07 Thread Olivier Grisel
2011/10/7 Ian Goodfellow : > Thanks. Yes it does appear that liblinear uses only a 64 bit dense format, > so this memory usage is normal/caused by the implementation of liblinear. > > You may want to update the documentation hosted at this site: > http://scikit-learn.sourceforge.net/modules/svm.htm

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-07 Thread Olivier Grisel
2011/10/7 Mathieu Blondel : > By the way, I suspect that that predict method is also sub-optimal > because, since the support vectors and the coefficients are stored in > numpy arrays or scipy matrices, predict has to make the conversion to > liblinear's model structure at every call. This is the p

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-07 Thread Mathieu Blondel
By the way, I suspect that that predict method is also sub-optimal because, since the support vectors and the coefficients are stored in numpy arrays or scipy matrices, predict has to make the conversion to liblinear's model structure at every call. This is the price that we currently pay for pickl

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-07 Thread Mathieu Blondel
For dense-data, I recommend SGDClassifier or SVC if you want to use a kernel. I'm thinking that in the mid-term we may want to ship our own Cython implementation of liblinear (from what I saw, it didn't seem that hard to implement). Mathieu ---

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-07 Thread Olivier Grisel
2011/10/7 Gael Varoquaux : > On Fri, Oct 07, 2011 at 08:44:53AM +, [email protected] wrote: >> I just wanted to say that we have similar problems in our lab >> which we "solved" by buying more RAM. >> It would be great to have single precision implementations >> of both SGDClassifier and Linear

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-07 Thread Olivier Grisel
2011/10/7 : > We wrestled with exactly this issue for decision trees, so its clear now that > a general solution would be very beneficial to scikit-learn. For liblinear it might be a bit complicated since both the C++ code and the cython wrapper would have to be rewritten to generate the two ver

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-07 Thread Gael Varoquaux
On Fri, Oct 07, 2011 at 08:44:53AM +, [email protected] wrote: > I just wanted to say that we have similar problems in our lab > which we "solved" by buying more RAM. > It would be great to have single precision implementations > of both SGDClassifier and LinearSVC in scikits.learn. SGDClassif

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-07 Thread bdholt1
: [Scikit-learn-general] Memory consumption of LinearSVC.fit > However I am pretty sure that it will force a copy of your data to be > double precision (64bit). As you suggested, this is the case for both LinearSVC and SGDClassifier. > If you install cython you can patch the > source c

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-07 Thread Andreas Mueller
> However I am pretty sure that it will force a copy of your data to be > double precision (64bit). As you suggested, this is the case for both LinearSVC and SGDClassifier. > If you install cython you can patch the > source code to force single precision instead. > > We might want to add support

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-06 Thread Ian Goodfellow
Thanks. Yes it does appear that liblinear uses only a 64 bit dense format, so this memory usage is normal/caused by the implementation of liblinear. You may want to update the documentation hosted at this site: http://scikit-learn.sourceforge.net/modules/svm.html# It has a section on "avoiding da

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-06 Thread Olivier Grisel
If your data is really dense, then you should try to use the SGDClassifier model instead of LinearSVC. It has an implementation for dense numpy arrays hence will use twice as less memory as a sparse representation. However I am pretty sure that it will force a copy of your data to be double precis

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-06 Thread Olivier Grisel
2011/10/7 Olivier Grisel : > > It would fix your issue though... I meant: It would *not* fix your memory issue though... -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel -- All of the data generated in

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-06 Thread Olivier Grisel
2011/10/7 Ian Goodfellow : > I understand that LinearSVC is implemented using liblinear, which I thought > should work well with large datasets. However, when I pass LinearSVC.fit a > design matrix of size 40,000 x 14,400 (in float32 format, so 2.3 gigabytes) > > it ends up using at least 8 additio

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-06 Thread xinfan meng
I don't know if it's relevant. But you should really try the newest version, which is 0.9. On Fri, Oct 7, 2011 at 10:52 AM, Ian Goodfellow wrote: > I understand that LinearSVC is implemented using liblinear, which I thought > should work well with large datasets. However, when I pass LinearSVC.fi

[Scikit-learn-general] Memory consumption of LinearSVC.fit

2011-10-06 Thread Ian Goodfellow
I understand that LinearSVC is implemented using liblinear, which I thought should work well with large datasets. However, when I pass LinearSVC.fit a design matrix of size 40,000 x 14,400 (in float32 format, so 2.3 gigabytes) it ends up using at least 8 additional gigabytes of RAM! I know that the