This error is due to a webpage with an extreme nesting of tags. For example something like <b><i><b><i>.....</i></b></i></b> but thousands of levels deep. It is a form of a spider trap.

I just created NUTCH-497 for this issue and attached a very
rudimentary patch as a workaround. The patch successfully fixes the problem but it is not very robust and has no unit tests as of yet. I have run this successfully myself. I will provide a more robust patch when time allows but this should help you for now.

Dennis Kubes

djames wrote:
Thanks a lot for your help
I'll give you a feedback

Reply via email to