Hi d_k, On Mon, Jan 20, 2014 at 11:39 AM, <user-digest-h...@nutch.apache.org> wrote:
> > Posting back as promised. :-) > Great > > I just encountered the error "java.lang.NoClassDefFoundError: > org/cyberneko/html/parsers/DOMFragmentParser" and applied the patch > NUTCH-1253-2.x-v2.patch from NUTCH-1253 and executed 'ant runtime' and upon > running './nutch parse -all' (after injecting/generating/fetching) the > error did not go away and I still got the exception. > > OK so a few things here please. I see that the patch introduces trace logging in HtmlParser.class, are you able to change this to debug, then also set log4j.logger.org.apache.nutch.parse.ParserJob=INFO,cmdstdout to log4j.logger.org.apache.nutch.parse.ParserJob=DEBUG,cmdstdout in log4j.properties, this should hopefully remove the likelihood of trace logging setting this off. Can you confirm if DOMFragmentParser actually exists within the new nekohtml artifact 1.9.17 and that the old version is not present and being loaded instead. If this is the case then you will need to manually remove it or alternatively force it's removal with the ant clean target prior to invoking runtime target. Finally, it may be worth taking a look in to the hadoop.log to determine which URL(s) this error stems from? Can you post the relevant section of your log? Thank you Lewis