The difference are normally about 0.1% - 0.5%. The highest difference I
experienced is about 1%.
If different solvers are used as Mathieu mentioned, it is quite
understandable.
What I was just wondering is that why it is just for RidgeClassifier that I
got such abnormal behavior.
Would love to try
2011/11/17 SK Sn :
> @Olivier, the quick reproduction of the error using 20Newsgroups -
> https://gist.github.com/1372557
> Also, does it mean, actually, for text classification problems, trees are
> used less often?
Probably yes, as simple linear models are often much faster to train
and more sca
Thanks guys, for the detailed explanation. It is clear to me now.
But, just to clarify the original problem, the results (f1 etc.) from
X.todense() and X.toarray() are the same, which all differ from X
(scipy.sparse).
Cheers.
On 17 November 2011 16:33, Olivier Grisel wrote:
> 2011/11/17 Lars B
2011/11/17 Lars Buitinck :
> 2011/11/16 Olivier Grisel :
>> You should never use dense matrices: either scipy.sparse or numpy
>> arrays. For text data, you should probably stick to estimators that
>> work on scipy.sparse input.
>
> In the current release.
>
>> Always use X.toarray() if you really n
2011/11/16 Olivier Grisel :
> You should never use dense matrices: either scipy.sparse or numpy
> arrays. For text data, you should probably stick to estimators that
> work on scipy.sparse input.
In the current release.
> Always use X.toarray() if you really need to materialize a dense
> represen
On Thu, Nov 17, 2011 at 4:07 PM, SK Sn wrote:
> @Mathieu, is this the case only for Ridge? kNN, NB, linearSVC do not have
> such a behavior.
> If for Ridge, different solvers are used, which result should I refer to as
> result from Ridge?
Since you're doing text classification, I would report t
@Olivier, the quick reproduction of the error using 20Newsgroups -
https://gist.github.com/1372557
Also, does it mean, actually, for text classification problems, trees are
used less often?
@Mathieu, is this the case only for Ridge? kNN, NB, linearSVC do not have
such a behavior.
If for Ridge, dif
On Thu, Nov 17, 2011 at 1:54 AM, SK Sn wrote:
> The difference of results (f1/precision/recall) between X sparse and
> (X.todense() or X.array()) are about -0.5% to +1.0%.
The difference comes from the fact that different solvers are used for
sparse matrices and numpy arrays.
Mathieu
-
2011/11/16 Olivier Grisel :
> I don't think the current code base supports sparse data
> as input (as is the case for dense data).
Sorry I meant: "as is the case for *text* data".
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
-
2011/11/16 SK Sn :
> Hi there,
>
> I experienced abnormal behaviors of RidgeClassifier in context of text
> classification.
>
> Test setup: ~800 documents, ~2500 features, 15 classes, scikit-learn dev
> version (version few days ago), classification with KFold.
> Problem:
> When RidgeClassifier is
Hi there,
I experienced abnormal behaviors of RidgeClassifier in context of text
classification.
*Test setup:* ~800 documents, ~2500 features, 15 classes, scikit-learn dev
version (version few days ago), classification with KFold.
*Problem: *
When RidgeClassifier is tested, different results (f1,
11 matches
Mail list logo