Hej, I am developping a modification in Nutch for only accepting outlinks of Spanish url. I have implemented downloading and parsing the content of each outlink (in ParseOutFormat) with Jericho and detecting the language with Lingpipe.
This proccess seems too heavy, especially because it is done by only one thread, so I would thank any idea for: Any easier way for detecting the language of an outlink? Any way for performing a multithreaded outlink extraction as fetcher does? Thanks in advance -- View this message in context: http://old.nabble.com/Mutithreaded-parsing-tp26941947p26941947.html Sent from the Nutch - Dev mailing list archive at Nabble.com.