[Scikit-learn-general] lowercase option at CountVectorizer

Florian Lindner Fri, 29 Nov 2013 02:47:29 -0800

Hello,

http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html


says

lowercase : boolean, default True
Convert all characters to lowercase befor tokenizing.

But ofter using the vectorizer like:

vectorizer = CountVectorizer(
    input='filename', decode_error='replace', 
    strip_accents='unicode', preprocessor=mail_preprocessor, 
    stop_words=stop_words, lowercase=True)

vectors = vectorizer.fit_transform(files)

vectorizer.get_feature_names() gives me still lower- and uppercase words?

Anything wrong the the documentation, the code or my perception? ;-)

Regards,
Florian

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] lowercase option at CountVectorizer

Reply via email to