Hi d_k,

On Mon, Jan 20, 2014 at 11:39 AM, <user-digest-h...@nutch.apache.org> wrote:

>
> Posting back as promised. :-)
>

Great


>
> I just encountered the error "java.lang.NoClassDefFoundError:
> org/cyberneko/html/parsers/DOMFragmentParser" and applied the patch
> NUTCH-1253-2.x-v2.patch from NUTCH-1253 and executed 'ant runtime' and upon
> running './nutch parse -all' (after injecting/generating/fetching) the
> error did not go away and I still got the exception.
>
> OK so a few things here please.
I see that the patch introduces trace logging in HtmlParser.class, are you
able to change this to debug, then also set

log4j.logger.org.apache.nutch.parse.ParserJob=INFO,cmdstdout

to

log4j.logger.org.apache.nutch.parse.ParserJob=DEBUG,cmdstdout

in log4j.properties, this should hopefully remove the likelihood of trace
logging setting this off.

Can you confirm if DOMFragmentParser actually exists within the new
nekohtml artifact 1.9.17 and that the old version is not present and being
loaded instead. If this is the case then you will need to manually remove
it or alternatively force it's removal with the ant clean target prior to
invoking runtime target.

Finally, it may be worth taking a look in to the hadoop.log to determine
which URL(s) this error stems from? Can you post the relevant section of
your log?
Thank you
Lewis

Reply via email to