Neko HTML parser goes on default settings.
------------------------------------------

                 Key: NUTCH-487
                 URL: https://issues.apache.org/jira/browse/NUTCH-487
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 0.9.0
         Environment: Linux, Java 1.5.0.
            Reporter: Marcin Okraszewski
         Attachments: neko_setup.patch

The Neko HTML parser set up is done in silent try / catch statement (Nutch 0.9: 
HtmlParser.java:248-259). The problem is that the first feature being set 
thrown an exception. So, the whole setup block is skipped. The catch statement 
does nothing, so probably nobody noticed this.

I attach a patch which fixes this. It was done on Nutch 0.9, but SVN trunk 
contains the same code.

The patch does:
1. Fixes augmentations feature.
2. Removes include-comments feature, because I couldn't find anything similar 
at http://people.apache.org/~andyc/neko/doc/html/settings.html
3. Prints warn message when exception is caught.

Please note that now there goes a lot for messages to console (not log4j log), 
because "report-errors" feature is being set. Shouldn't it be removed?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to