On 03/07/2013 09:40 AM, Roman Sinayev wrote: > I tried but TfIDF is slow after the vectorization. The other thing > was since it is stateless, wouldn't transformation of a test corpus > followed by tfidf result in a totally different matrix? You won't > know which words are responsible for what. > Yes, it does give different results. But it is way more scalable. I think there have been several attempts at speeding up the DictVectorizer using multi-processing, iirc without much success. That doesn't mean you should try, though ;)
------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
