Re: [Scikit-learn-general] CountVectorizer in feature extraction is still slow

Andreas Mueller Thu, 07 Mar 2013 00:56:51 -0800

On 03/07/2013 09:40 AM, Roman Sinayev wrote:
> I tried but TfIDF is slow after the vectorization.  The other thing
> was since it is stateless, wouldn't transformation of a test corpus
> followed by tfidf result in a totally different matrix?  You won't
> know which words are responsible for what.
>
Yes, it does give different results. But it is way more scalable.
I think there have been several attempts at speeding up the 
DictVectorizer using
multi-processing, iirc without much success.
That doesn't mean you should try, though ;)


------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] CountVectorizer in feature extraction is still slow

Reply via email to