Jérôme Charron wrote:
I reproduce this with nutch-0.8 with neko html parser (it seems that script
tags are not removed).
You can switch the html parser implementation to tagsoup. In my tests, all
is ok.
(property parser.html.impl)

Should we switch the default from neko to tagsoup? Are there cases where neko is better?

Doug


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to