2013/3/27 Tom Fawcett <[email protected]>: > I’ve identified a bug/inconsistency in sklearn.feature_extraction.text. > TfidfVectorizer returns a matrix of type scipy.sparse.csr.csr_matrix; whereas > CountVectorizer returns scipy.sparse.coo.coo_matrix, which don’t support > multiple (array) indexing. > > Below is a short (silly) example that demonstrates the problem. It took a > while to figure out why (in a larger program) I was getting this error. I am > using sklearn.cross_validation.StratifiedKFold which returns an index array > for each fold, and the program broke when I started using CountVectorizer.
There's already a pull request that speeds up CountVectorizer and returns a csr_matrix. I think we should merge it in soon. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ Own the Future-Intel® Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
