Re: [Scikit-learn-general] CountVectorizer token pattern

2014-09-19 Thread Andy
To detect this, you have to do word n-grams (or character n-grams over word boundaries, which would not result in your problem). If A is a stop-word, that would also not be caught, right? So how would using stop-word instead of minimum length fix your issue? Because you would have rather looked

Re: [Scikit-learn-general] train_test_split return values

2014-09-19 Thread Andy
On 09/18/2014 10:34 PM, Joel Nothman wrote: > A copy If you use a list as input it is not a copy. -- Slashdot TV. Video for Nerds. Stuff that Matters. http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg