Re: [Scikit-learn-general] Text processing using nltk, sklearn and pandas

Lars Buitinck Fri, 12 Jul 2013 09:49:45 -0700

2013/7/11 Tom Fawcett <tom.fawc...@gmail.com>:
>> On Sun, Jul 7, 2013 at 6:58 AM, Joel Nothman <jnoth...@student.usyd.edu.au> 
>> wrote:
>> (But I'm also not convinced that NLTK is the right tool for a lot of 
>> large-scale feature extraction jobs.)
>
> I’m curious – why?


I guess because it's terribly slow. I recently tried to cluster a
sample of Wikipedia text at the word level. I found that about 75% of
the time was spent in MiniBatchKMeans.fit, while the rest of it was
spent inside nltk.word_tokenize (!)

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Text processing using nltk, sklearn and pandas

Reply via email to