Mutithreaded parsing

Santiago Pérez Mon, 28 Dec 2009 03:10:00 -0800

Hej,

I am developping a modification in Nutch for only accepting outlinks of
Spanish url. I have implemented downloading and parsing the content of each
outlink (in ParseOutFormat) with Jericho and detecting the language with
Lingpipe.


This proccess seems too heavy, especially because it is done by only one
thread, so I would thank any idea for:

Any easier way for detecting the language of an outlink?
Any way for performing a multithreaded outlink extraction as fetcher does?

Thanks in advance
-- 
View this message in context: 
http://old.nabble.com/Mutithreaded-parsing-tp26941947p26941947.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Mutithreaded parsing

Reply via email to