2013/3/27 Tom Fawcett <[email protected]>:
> I’ve identified a bug/inconsistency in sklearn.feature_extraction.text.
> TfidfVectorizer returns a matrix of type scipy.sparse.csr.csr_matrix; whereas 
> CountVectorizer returns scipy.sparse.coo.coo_matrix, which don’t support 
> multiple (array) indexing.
>
> Below is a short (silly) example that demonstrates the problem.  It took a 
> while to figure out why (in a larger program) I was getting this error.  I am 
> using sklearn.cross_validation.StratifiedKFold which returns an index array 
> for each fold, and the program broke when I started using CountVectorizer.

There's already a pull request that speeds up CountVectorizer and
returns a csr_matrix. I think we should merge it in soon.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Own the Future-Intel&reg; Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest.
Compete for recognition, cash, and the chance to get your game 
on Steam. $5K grand prize plus 10 genre and skill prizes. 
Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to