2011/11/17 SK Sn <[email protected]>:
> @Olivier, the quick reproduction of the error using 20Newsgroups -
> https://gist.github.com/1372557
> Also, does it mean, actually, for text classification problems, trees are
> used less often?

Probably yes, as simple linear models are often much faster to train
and more scalable and most text classification problems are
approximately linearly separable (e.g. using non-linear models such as
gaussian kernels results in potential over-fitting and much longer
training times).

Would be interesting to try the new Random Forest though once it's
merged though.

> @Mathieu, is this the case only for Ridge? kNN, NB, linearSVC do not have
> such a behavior.
> If for Ridge, different solvers are used, which result should I refer to as
> result from Ridge?

Ok so if I understand the real issue is:

# with .toarray(), results: f1:0.99634, precision 0.99637
# only X (sparse), results: f1:0.99524, precision 0.99526
# All other classifiers (kNN, NB, etc) have consistant results no
matter toarray() or not.

I wonder it this is not just about rounding errors. Still f1 score >
0.995 is excellent. I would not call that a bug :P

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to