Re: [Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-17 Thread SK Sn
The difference are normally about 0.1% - 0.5%. The highest difference I experienced is about 1%. If different solvers are used as Mathieu mentioned, it is quite understandable. What I was just wondering is that why it is just for RidgeClassifier that I got such abnormal behavior. Would love to try

Re: [Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-17 Thread Olivier Grisel
2011/11/17 SK Sn : > @Olivier, the quick reproduction of the error using 20Newsgroups - > https://gist.github.com/1372557 > Also, does it mean, actually, for text classification problems, trees are > used less often? Probably yes, as simple linear models are often much faster to train and more sca

Re: [Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-17 Thread SK Sn
Thanks guys, for the detailed explanation. It is clear to me now. But, just to clarify the original problem, the results (f1 etc.) from X.todense() and X.toarray() are the same, which all differ from X (scipy.sparse). Cheers. On 17 November 2011 16:33, Olivier Grisel wrote: > 2011/11/17 Lars B

Re: [Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-17 Thread Olivier Grisel
2011/11/17 Lars Buitinck : > 2011/11/16 Olivier Grisel : >> You should never use dense matrices: either scipy.sparse or numpy >> arrays. For text data, you should probably stick to estimators that >> work on scipy.sparse input. > > In the current release. > >> Always use X.toarray() if you really n

Re: [Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-17 Thread Lars Buitinck
2011/11/16 Olivier Grisel : > You should never use dense matrices: either scipy.sparse or numpy > arrays. For text data, you should probably stick to estimators that > work on scipy.sparse input. In the current release. > Always use X.toarray() if you really need to materialize a dense > represen

Re: [Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-17 Thread Mathieu Blondel
On Thu, Nov 17, 2011 at 4:07 PM, SK Sn wrote: > @Mathieu, is this the case only for Ridge? kNN, NB, linearSVC do not have > such a behavior. > If for Ridge, different solvers are used, which result should I refer to as > result from Ridge? Since you're doing text classification, I would report t

Re: [Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-16 Thread SK Sn
@Olivier, the quick reproduction of the error using 20Newsgroups - https://gist.github.com/1372557 Also, does it mean, actually, for text classification problems, trees are used less often? @Mathieu, is this the case only for Ridge? kNN, NB, linearSVC do not have such a behavior. If for Ridge, dif

Re: [Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-16 Thread Mathieu Blondel
On Thu, Nov 17, 2011 at 1:54 AM, SK Sn wrote: > The difference of results (f1/precision/recall) between X sparse and > (X.todense() or X.array()) are about -0.5% to +1.0%. The difference comes from the fact that different solvers are used for sparse matrices and numpy arrays. Mathieu -

Re: [Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-16 Thread Olivier Grisel
2011/11/16 Olivier Grisel : > I don't think the current code base supports sparse data > as input (as is the case for dense data). Sorry I meant: "as is the case for *text* data". -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel -

Re: [Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-16 Thread Olivier Grisel
2011/11/16 SK Sn : > Hi there, > > I experienced abnormal behaviors of RidgeClassifier in context of text > classification. > > Test setup: ~800 documents, ~2500 features, 15 classes, scikit-learn dev > version (version few days ago), classification with KFold. > Problem: > When RidgeClassifier is

[Scikit-learn-general] Possible bug about RidgeClassifier and a question about Tree

2011-11-16 Thread SK Sn
Hi there, I experienced abnormal behaviors of RidgeClassifier in context of text classification. *Test setup:* ~800 documents, ~2500 features, 15 classes, scikit-learn dev version (version few days ago), classification with KFold. *Problem: * When RidgeClassifier is tested, different results (f1,