Re: [Scikit-learn-general] lowercase option at CountVectorizer

2013-11-30 Thread Joel Nothman
Hi Florian, The documentation should be more explicit. What you missed was that: preprocessor : callable or None (default) Override the preprocessing (string transformation) stage while preserving the tokenizing and n-grams generation steps. means setting this parameter will

[Scikit-learn-general] lowercase option at CountVectorizer

2013-11-29 Thread Florian Lindner
Hello, http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html says lowercase : boolean, default True Convert all characters to lowercase befor tokenizing. But ofter using the vectorizer like: vectorizer = CountVectorizer( input='filename', dec