Hi Florian,
The documentation should be more explicit. What you missed was that:
preprocessor : callable or None (default)
Override the preprocessing (string transformation) stage while
preserving the tokenizing and n-grams generation steps.
means setting this parameter will
Hello,
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
says
lowercase : boolean, default True
Convert all characters to lowercase befor tokenizing.
But ofter using the vectorizer like:
vectorizer = CountVectorizer(
input='filename', dec