Re: [Scikit-learn-general] Advice on Extracting Rare Words

2015-05-14 Thread Adam Goodkind
Thanks! That makes a lot of sense. I hadn't thought to use binary count with a count vectorizer. On Thu, May 14, 2015 at 4:15 PM, Andreas Mueller wrote: > You just want df, right? So that is binary CountVectorizer counts. > This will likely give you a lot of garbage [typos and odd spellings] >

Re: [Scikit-learn-general] Advice on Extracting Rare Words

2015-05-14 Thread Andreas Mueller
You just want df, right? So that is binary CountVectorizer counts. This will likely give you a lot of garbage [typos and odd spellings] unless your text is very clean or your tokenizer is very good, or you ran it through a spell checker etc. On 05/14/2015 04:03 PM, Adam Goodkind wrote: Hi, Th