Jérôme Charron wrote:
I reproduce this with nutch-0.8 with neko html parser (it seems that script tags are not removed). You can switch the html parser implementation to tagsoup. In my tests, all is ok. (property parser.html.impl)
Should we switch the default from neko to tagsoup? Are there cases where neko is better?
Doug ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
