Hi,

I realized that TfidfVectorizer returns a CSR matrix, which supports array
indexing, while CountVectorizer returns a COO matrix, which doesn't. I
always liked the clean and interchangeable nature of sklearn, so I
wondered, whether it would  break other pieces if we would return a CSR
matrix in CountVectorizer as well. Or is performance a concern here?

CountVectorizer's fit_transform:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py#L530

TfidfVectorizer's fit_transform:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py#L942

Thanks,
wr
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to