Hi,

I realized that the fit_transform method of TfidfVectorizer returns a CSR
matrix, which supports array indexing, while CountVectorizer returns a COO
matrix, which doesn't. I always liked the clean and interchangeable nature
of sklearn, so I wondered, whether it would  break other pieces if we would
return a CSR matrix in CountVectorizer as well. Or is performance a concern
here?

CountVectorizer's fit_transform:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py#L530

TfidfVectorizer's fit_transform:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py#L942

Thanks,
wr
------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to