Hi Willi.
I'd think this is an oversight.
Feel free to open an issue on github.
Best,
Andy
On 01/04/2013 12:14 PM, Willi Richert wrote:
Hi,
I realized that the fit_transform method of TfidfVectorizer returns a
CSR matrix, which supports array indexing, while CountVectorizer
returns a COO matrix, which doesn't. I always liked the clean and
interchangeable nature of sklearn, so I wondered, whether it
would break other pieces if we would return a CSR matrix in
CountVectorizer as well. Or is performance a concern here?
CountVectorizer's fit_transform:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py#L530
TfidfVectorizer's fit_transform:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py#L942
Thanks,
wr
------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general