The difference are normally about 0.1% - 0.5%. The highest difference I
experienced is about 1%.
If different solvers are used as Mathieu mentioned, it is quite
understandable.

What I was just wondering is that why it is just for RidgeClassifier that I
got such abnormal behavior.
Would love to try out Random Forest once it is merged. ;)
On 17 November 2011 16:40, Olivier Grisel <[email protected]> wrote:

> 2011/11/17 SK Sn <[email protected]>:
> > @Olivier, the quick reproduction of the error using 20Newsgroups -
> > https://gist.github.com/1372557
> > Also, does it mean, actually, for text classification problems, trees are
> > used less often?
>
> Probably yes, as simple linear models are often much faster to train
> and more scalable and most text classification problems are
> approximately linearly separable (e.g. using non-linear models such as
> gaussian kernels results in potential over-fitting and much longer
> training times).
>
> Would be interesting to try the new Random Forest though once it's
> merged though.
>
> > @Mathieu, is this the case only for Ridge? kNN, NB, linearSVC do not have
> > such a behavior.
> > If for Ridge, different solvers are used, which result should I refer to
> as
> > result from Ridge?
>
> Ok so if I understand the real issue is:
>
> # with .toarray(), results: f1:0.99634, precision 0.99637
> # only X (sparse), results: f1:0.99524, precision 0.99526
> # All other classifiers (kNN, NB, etc) have consistant results no
> matter toarray() or not.
>
> I wonder it this is not just about rounding errors. Still f1 score >
> 0.995 is excellent. I would not call that a bug :P
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to