I have finished and attached a solution for NUTCH-497. This uses a
stack instead of recursion in the DOMContentUtils to avoid stack
overflows with extreme nested tags. It also adds a nested tags test
page to the fetcher tests.
Please take a look and if there are no issues with this patch I will
commit in a day or two.
Dennis Kubes
Dennis Kubes wrote:
This error is due to a webpage with an extreme nesting of tags. For
example something like <b><i><b><i>.....</i></b></i></b> but thousands
of levels deep. It is a form of a spider trap.
I just created NUTCH-497 for this issue and attached a very
rudimentary patch as a workaround. The patch successfully fixes the
problem but it is not very robust and has no unit tests as of yet. I
have run this successfully myself. I will provide a more robust patch
when time allows but this should help you for now.
Dennis Kubes
djames wrote:
Thanks a lot for your help
I'll give you a feedback