Hi Fred.
As each sample is used individually and the weights updated after each sample,
it doesn't matter.
If you pass very small "batches", the overhead of calling the fitting function
is probably bigger, though.
Cheers,
Andy
- Ursprüngliche Mail -
Von: "Fred Mailhot"
An: scikit-learn
On 14 July 2012 04:22, Olivier Grisel wrote:
> 2012/7/13 Abhi :
> > Hello,
> >My problem is to classify a set of 200k+ emails into approx. 2800
> categories.
> > Currently the method I am using is calculating tfidf and using
> LinearSVC()
> > [with a good accuracy of 98%] for classification
On Sat, Jul 14, 2012 at 8:22 PM, Olivier Grisel wrote:
>
> LinearSVC is based on liblinear that only implements batch
> optimization. Instead you can use SGDClassifier that features
> partial_fit method that you can call several consecutive times on
> chunks of data for incremental learning.
You
Maybe you could use the number of support vectors as a measure of the
complexity instead.
--
Olivier
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape
2012/7/13 Abhi :
> Hello,
>My problem is to classify a set of 200k+ emails into approx. 2800
> categories.
> Currently the method I am using is calculating tfidf and using LinearSVC()
> [with a good accuracy of 98%] for classification. The training time is ~30-60
> min [~16g of mem, and dou
Thanks Andy for your response, and sorry for being unclear.
Here is the idea behind what I wish to study: Let's say we have a set S
of examples and I wish to find the best SVR penalty parameter C from a
set T of Cs. As you probably know, one way of doing this is to use, for
example, 5 fold cros