Re: [Scikit-learn-general] Online learning

2013-11-07 Thread Jim
Andy writes: > I would venture that which one is better would depend on the nature of > your data. > Do you know the number of types beforehand? And do all types have 1000 > categories? The number of Types is defined, however the number of categories keeps increasing...but as I see it is un

Re: [Scikit-learn-general] Online learning

2013-11-06 Thread Andy
On 11/06/2013 05:00 PM, Jim wrote: >> Then you don't need a OneVsRestClassifier as OvR is the default >> strategy for SGD. You do need to put a maximum on the number of >> classes before you start learning, though. > I see. Thank you for the advice. This was initial novice iteration of the > so

Re: [Scikit-learn-general] Online learning

2013-11-06 Thread Jim
[Mistakenly posted as separate thread before, please ignore previous post] I see. Thank you for the advice. This was initial novice iteration of the solution and needs improvement of course. In terms of which, in order to keep the behaviour of the classifier consistent, instead of a single cla

Re: [Scikit-learn-general] Online learning

2013-11-06 Thread Jim
> Then you don't need a OneVsRestClassifier as OvR is the default > strategy for SGD. You do need to put a maximum on the number of > classes before you start learning, though. I see. Thank you for the advice. This was initial novice iteration of the solution and needs improvement of course. In t

Re: [Scikit-learn-general] Online learning

2013-11-06 Thread Lars Buitinck
2013/11/6 Jim <[email protected]>: > No I am primarily working on multiclass classification with constantly > increasing number of classes Then you don't need a OneVsRestClassifier as OvR is the default strategy for SGD. You do need to put a maximum on the number of classes before you start learni

Re: [Scikit-learn-general] Online learning

2013-11-05 Thread Jim
> > > 2) Will there be a interface for online learning in OnevsRestClassifier? > > If you care to implement it, then we're happy to accept a patch. Are > you doing multi-label classification? No I am primarily working on multiclass classification with constantly increasing number of classes --

Re: [Scikit-learn-general] Online learning

2013-11-05 Thread Lars Buitinck
2013/11/5 Jim <[email protected]>: > 1) This might sound like a basic question, but when performing a partial_fit > in SGD Classifiers, does the new data to train on have to be in the > categories that are already in the set? Or conversely, if I come across/ want > to add a few documents(in a docum

[Scikit-learn-general] Online learning

2013-11-05 Thread Jim
1) This might sound like a basic question, but when performing a partial_fit in SGD Classifiers, does the new data to train on have to be in the categories that are already in the set? Or conversely, if I come across/ want to add a few documents(in a document classifier example) in a new categor

Re: [Scikit-learn-general] Online learning

2012-08-03 Thread Abhi
Olivier Grisel writes: > I don't know if there is any maximum file size on the gists. Just try > and if it fails use something else such as dropbox public folder or > Amazon S3 if you have an account. > Adding all the files to https://github.com/abhirk/LinearSVC. Thanks.

Re: [Scikit-learn-general] Online learning

2012-07-31 Thread Olivier Grisel
2012/7/31 Abhi : > Abhi writes: > >> >> Olivier Grisel writes: >> > >> > Could you please try to come up with one or two minimalistic >> > reproduction scripts for the ch2.fit_transform and LinearSVC.fit >> > segfaults? Is it just that it is exhausting memory on your system? Are >> > you running

Re: [Scikit-learn-general] Online learning

2012-07-30 Thread Abhi
Abhi writes: > > Olivier Grisel writes: > > > Could you please try to come up with one or two minimalistic > > reproduction scripts for the ch2.fit_transform and LinearSVC.fit > > segfaults? Is it just that it is exhausting memory on your system? Are > > you running a 32bit or a 64bit OS? How

Re: [Scikit-learn-general] Online learning

2012-07-30 Thread Abhi
Olivier Grisel writes: > > 2012/7/25 Abhi : > > > > Hello, > > Sorry for getting back late..I originally had experimented with different > > classifiers including SGDClassifier, it seemed faster but much less accurate, > > about 93% for 3 emails[and decreasing as the number of emails

Re: [Scikit-learn-general] Online learning

2012-07-24 Thread Olivier Grisel
2012/7/25 Abhi : > > Hello, > Sorry for getting back late..I originally had experimented with different > classifiers including SGDClassifier, it seemed faster but much less accurate, > about 93% for 3 emails[and decreasing as the number of emails increases], > but have not tried with the i

Re: [Scikit-learn-general] Online learning

2012-07-24 Thread Abhi
ers, > Andy > > - Ursprüngliche Mail - > Von: "Fred Mailhot" gmail.com> > An: scikit-learn-general lists.sourceforge.net > Gesendet: Samstag, 14. Juli 2012 22:14:51 > Betreff: Re: [Scikit-learn-general] Online learning > > On 14 July 2012 04:22,

Re: [Scikit-learn-general] Online learning

2012-07-14 Thread Andreas Müller
Mailhot" An: [email protected] Gesendet: Samstag, 14. Juli 2012 22:14:51 Betreff: Re: [Scikit-learn-general] Online learning On 14 July 2012 04:22, Olivier Grisel < [email protected] > wrote: 2012/7/13 Abhi < [email protected] >: > Hell

Re: [Scikit-learn-general] Online learning

2012-07-14 Thread Fred Mailhot
On 14 July 2012 04:22, Olivier Grisel wrote: > 2012/7/13 Abhi : > > Hello, > >My problem is to classify a set of 200k+ emails into approx. 2800 > categories. > > Currently the method I am using is calculating tfidf and using > LinearSVC() > > [with a good accuracy of 98%] for classification

Re: [Scikit-learn-general] Online learning

2012-07-14 Thread Mathieu Blondel
On Sat, Jul 14, 2012 at 8:22 PM, Olivier Grisel wrote: > > LinearSVC is based on liblinear that only implements batch > optimization. Instead you can use SGDClassifier that features > partial_fit method that you can call several consecutive times on > chunks of data for incremental learning. You

Re: [Scikit-learn-general] Online learning

2012-07-14 Thread Olivier Grisel
2012/7/13 Abhi : > Hello, >My problem is to classify a set of 200k+ emails into approx. 2800 > categories. > Currently the method I am using is calculating tfidf and using LinearSVC() > [with a good accuracy of 98%] for classification. The training time is ~30-60 > min [~16g of mem, and dou

[Scikit-learn-general] Online learning

2012-07-13 Thread Abhi
Hello, My problem is to classify a set of 200k+ emails into approx. 2800 categories. Currently the method I am using is calculating tfidf and using LinearSVC() [with a good accuracy of 98%] for classification. The training time is ~30-60 min [~16g of mem, and doubles every 75000 mails]. I wa

Re: [Scikit-learn-general] Online Learning API

2012-03-11 Thread Alexandre Gramfort
hi andy, we've already had a lot of discussion on the online learning problem and the partial_fit. Maybe someone feels like summarizing but otherwise you should search the mailing list archive for "online" and "partial_fit". Alex On Sun, Mar 11, 2012 at 2:02 PM, Andreas Mueller wrote: > Hi ever

[Scikit-learn-general] Online Learning API

2012-03-11 Thread Andreas Mueller
Hi everybody. There has been some talk about online learning API on the list but I am not really sure what is meant by that. Could someone please clarify what the applications are that you have in mind and what features you would like to have? In my mind, online learning is pretty close to the "pa