I'm trying to run the main function in HtmlParser (just to see test how Nutch's parser works compared to others) and I can't see to figure out how to get it to run.

http://svn.apache.org/viewvc/nutch/branches/branch-1.5.1/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java?revision=1356339&view=markup

when I run it naively, I get an error

Exception in thread "main" java.lang.RuntimeException: org.apache.nutch.parse.HtmlParseFilter not found. at org.apache.nutch.parse.HtmlParseFilters.<init>(HtmlParseFilters.java:55)

in looking at HtmlParseFilters, I see that it throws the runtime exception if it can't find any HtmlParseFilter classes, however, I can't seem to figure out how to make it able to find them (I see the jar's in the plugins dir, but do they have to be registered? could the main() in HtmlParser ever work as is?

any pointers would be appreciated.

thanks.

Reply via email to