I'm trying to run the main function in HtmlParser (just to see test how
Nutch's parser works compared to others) and I can't see to figure out
how to get it to run.
http://svn.apache.org/viewvc/nutch/branches/branch-1.5.1/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java?revision=1356339&view=markup
when I run it naively, I get an error
Exception in thread "main" java.lang.RuntimeException:
org.apache.nutch.parse.HtmlParseFilter not found.
at
org.apache.nutch.parse.HtmlParseFilters.<init>(HtmlParseFilters.java:55)
in looking at HtmlParseFilters, I see that it throws the runtime
exception if it can't find any HtmlParseFilter classes, however, I can't
seem to figure out how to make it able to find them (I see the jar's in
the plugins dir, but do they have to be registered? could the main() in
HtmlParser ever work as is?
any pointers would be appreciated.
thanks.