Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider Trap ----------------------------------------------------------------------------------
Key: NUTCH-497 URL: https://issues.apache.org/jira/browse/NUTCH-497 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 0.9.0, 0.8.1, 1.0.0 Environment: all Reporter: Dennis Kubes Assignee: Dennis Kubes Fix For: 1.0.0 Some webpages have a form of a spider trap that causes a StackOverflowException in DomContentUtils by having nested tags with thousands of layers deep. DomContentUtils when trying to get outlinks uses a recursive method to parse the html. With this type of nesting it errors out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers