Re: [Scikit-learn-general] Online learning

2012-07-14 Thread Andreas Müller
Hi Fred. As each sample is used individually and the weights updated after each sample, it doesn't matter. If you pass very small "batches", the overhead of calling the fitting function is probably bigger, though. Cheers, Andy - Ursprüngliche Mail - Von: "Fred Mailhot" An: scikit-learn

Re: [Scikit-learn-general] Online learning

2012-07-14 Thread Fred Mailhot
On 14 July 2012 04:22, Olivier Grisel wrote: > 2012/7/13 Abhi : > > Hello, > >My problem is to classify a set of 200k+ emails into approx. 2800 > categories. > > Currently the method I am using is calculating tfidf and using > LinearSVC() > > [with a good accuracy of 98%] for classification

Re: [Scikit-learn-general] Online learning

2012-07-14 Thread Mathieu Blondel
On Sat, Jul 14, 2012 at 8:22 PM, Olivier Grisel wrote: > > LinearSVC is based on liblinear that only implements batch > optimization. Instead you can use SGDClassifier that features > partial_fit method that you can call several consecutive times on > chunks of data for incremental learning. You

Re: [Scikit-learn-general] Norm of SVR weight vector

2012-07-14 Thread Olivier Grisel
Maybe you could use the number of support vectors as a measure of the complexity instead. -- Olivier -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape

Re: [Scikit-learn-general] Online learning

2012-07-14 Thread Olivier Grisel
2012/7/13 Abhi : > Hello, >My problem is to classify a set of 200k+ emails into approx. 2800 > categories. > Currently the method I am using is calculating tfidf and using LinearSVC() > [with a good accuracy of 98%] for classification. The training time is ~30-60 > min [~16g of mem, and dou

Re: [Scikit-learn-general] Norm of SVR weight vector

2012-07-14 Thread Charanpal Dhanjal
Thanks Andy for your response, and sorry for being unclear. Here is the idea behind what I wish to study: Let's say we have a set S of examples and I wish to find the best SVR penalty parameter C from a set T of Cs. As you probably know, one way of doing this is to use, for example, 5 fold cros