Re: [Scikit-learn-general] TF-Idf

Ark Mon, 24 Sep 2012 11:04:15 -0700

Olivier Grisel <olivier.grisel@...> writes:

> You can use the Pipeline class to build a compound classifier that
> binds a text feature extractor with a classifier to get a text
> document classifier in the end.
> 
 Done!


> 
> 7s is very long. How long is your text document in bytes ? 
The text documents are around 50kB.


> Maybe you
> could Only consider the first kilobytes of the documents and ignore
> the remaining text as testing time (while use the complete documents
> at training time).
> 

Er, I think I am missing something here, if I consider only first few kilobytes 
wouldnt that mean that I loose the features in the rest of the document which 
in 
turn might lead to false match.

> You should also probably profile your script to understand what's
> taking so long. For instance you can use:
> 
>   http://www.vrplumber.com/programming/runsnakerun/
> 

Excellent, thanks...






------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] TF-Idf

Reply via email to