Hi Florian,
The documentation should be more explicit. What you missed was that:
preprocessor : callable or None (default)
Override the preprocessing (string transformation) stage while
preserving the tokenizing and n-grams generation steps.
means setting this parameter will override lowercase. These dependencies
between parameters should be more explicitly documented, and you are
welcome to submit a PR to do so.
Cheers,
- Joel
On Fri, Nov 29, 2013 at 9:45 PM, Florian Lindner <[email protected]>wrote:
> Hello,
>
>
> http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
>
> says
>
> lowercase : boolean, default True
> Convert all characters to lowercase befor tokenizing.
>
> But ofter using the vectorizer like:
>
> vectorizer = CountVectorizer(
> input='filename', decode_error='replace',
> strip_accents='unicode', preprocessor=mail_preprocessor,
> stop_words=stop_words, lowercase=True)
>
> vectors = vectorizer.fit_transform(files)
>
> vectorizer.get_feature_names() gives me still lower- and uppercase words?
>
> Anything wrong the the documentation, the code or my perception? ;-)
>
> Regards,
> Florian
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general