2013/7/11 Tom Fawcett <tom.fawc...@gmail.com>: >> On Sun, Jul 7, 2013 at 6:58 AM, Joel Nothman <jnoth...@student.usyd.edu.au> >> wrote: >> (But I'm also not convinced that NLTK is the right tool for a lot of >> large-scale feature extraction jobs.) > > I’m curious – why?
I guess because it's terribly slow. I recently tried to cluster a sample of Wikipedia text at the word level. I found that about 75% of the time was spent in MiniBatchKMeans.fit, while the rest of it was spent inside nltk.word_tokenize (!) -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general