[ https://issues.apache.org/jira/browse/NUTCH-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcin Okraszewski updated NUTCH-487: ------------------------------------- Attachment: neko_setup.patch Patch for Nutch 0.9, which fixes the problem. > Neko HTML parser goes on default settings. > ------------------------------------------ > > Key: NUTCH-487 > URL: https://issues.apache.org/jira/browse/NUTCH-487 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 0.9.0 > Environment: Linux, Java 1.5.0. > Reporter: Marcin Okraszewski > Attachments: neko_setup.patch > > > The Neko HTML parser set up is done in silent try / catch statement (Nutch > 0.9: HtmlParser.java:248-259). The problem is that the first feature being > set thrown an exception. So, the whole setup block is skipped. The catch > statement does nothing, so probably nobody noticed this. > I attach a patch which fixes this. It was done on Nutch 0.9, but SVN trunk > contains the same code. > The patch does: > 1. Fixes augmentations feature. > 2. Removes include-comments feature, because I couldn't find anything similar > at http://people.apache.org/~andyc/neko/doc/html/settings.html > 3. Prints warn message when exception is caught. > Please note that now there goes a lot for messages to console (not log4j > log), because "report-errors" feature is being set. Shouldn't it be removed? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers