Hi Renato,

Regarding places in Nutch code to look:

You can look on HtmlParser.getParse() (resides at plugin/parse-html in Nutch
source distribution )

ParserJob.$ParserMapper.map() invokes ParseUtil.process(), it calls
ParseUtil.parse(), it calls Parser.getParse() (which is
HtmlParser.getParse() here).

Regards,
Alexey



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4041039.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to