I have finished and attached a solution for NUTCH-497.  This uses a 
stack instead of recursion in the DOMContentUtils to avoid stack 
overflows with extreme nested tags.  It also adds a nested tags test 
page to the fetcher tests.

Please take a look and if there are no issues with this patch I will 
commit in a day or two.

Dennis Kubes

Dennis Kubes wrote:
> This error is due to a webpage with an extreme nesting of  tags.  For 
> example something like <b><i><b><i>.....</i></b></i></b> but thousands 
> of levels deep.  It is a form of a spider trap.
> 
> I just created NUTCH-497 for this issue and attached a very
> rudimentary patch as a workaround.  The patch successfully fixes the 
> problem but it is not very robust and has no unit tests as of yet.  I 
> have run this successfully myself.  I will provide a more robust patch 
> when time allows but this should help you for now.
> 
> Dennis Kubes
> 
> djames wrote:
>> Thanks a lot for your help
>> I'll give you a feedback

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to