Hello,
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
says
lowercase : boolean, default True
Convert all characters to lowercase befor tokenizing.
But ofter using the vectorizer like:
vectorizer = CountVectorizer(
input='filename', decode_error='replace',
strip_accents='unicode', preprocessor=mail_preprocessor,
stop_words=stop_words, lowercase=True)
vectors = vectorizer.fit_transform(files)
vectorizer.get_feature_names() gives me still lower- and uppercase words?
Anything wrong the the documentation, the code or my perception? ;-)
Regards,
Florian
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general