On 03/07/2013 09:13 AM, Roman Sinayev wrote:
This module is a crucial bottleneck in NLP problems. I am trying to refactor it and also make it parallel across documents with python multiprocessing module. Is anyone else working on this?
If this is your bottleneck, you should consider using HashingVectorizer:
http://scikit-learn.org/dev/modules/feature_extraction.html#vectorizing-a-large-text-corpus-with-the-hashing-trick
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to