Neko HTML parser goes on default settings. ------------------------------------------
Key: NUTCH-487 URL: https://issues.apache.org/jira/browse/NUTCH-487 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 0.9.0 Environment: Linux, Java 1.5.0. Reporter: Marcin Okraszewski Attachments: neko_setup.patch The Neko HTML parser set up is done in silent try / catch statement (Nutch 0.9: HtmlParser.java:248-259). The problem is that the first feature being set thrown an exception. So, the whole setup block is skipped. The catch statement does nothing, so probably nobody noticed this. I attach a patch which fixes this. It was done on Nutch 0.9, but SVN trunk contains the same code. The patch does: 1. Fixes augmentations feature. 2. Removes include-comments feature, because I couldn't find anything similar at http://people.apache.org/~andyc/neko/doc/html/settings.html 3. Prints warn message when exception is caught. Please note that now there goes a lot for messages to console (not log4j log), because "report-errors" feature is being set. Shouldn't it be removed? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers