If you wish to just check the parser, use this command $ nutch parsechecker -dumpText <url>
This should work out of the box without any modification. On Sun, Aug 26, 2012 at 8:48 AM, Shaya Potter <[email protected]> wrote: > I'm trying to run the main function in HtmlParser (just to see test how > Nutch's parser works compared to others) and I can't see to figure out how > to get it to run. > > http://svn.apache.org/viewvc/**nutch/branches/branch-1.5.1/** > src/plugin/parse-html/src/**java/org/apache/nutch/parse/** > html/HtmlParser.java?revision=**1356339&view=markup<http://svn.apache.org/viewvc/nutch/branches/branch-1.5.1/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java?revision=1356339&view=markup> > > when I run it naively, I get an error > > Exception in thread "main" java.lang.RuntimeException: > org.apache.nutch.parse.**HtmlParseFilter not found. > at org.apache.nutch.parse.**HtmlParseFilters.<init>(** > HtmlParseFilters.java:55) > > in looking at HtmlParseFilters, I see that it throws the runtime exception > if it can't find any HtmlParseFilter classes, however, I can't seem to > figure out how to make it able to find them (I see the jar's in the plugins > dir, but do they have to be registered? could the main() in HtmlParser > ever work as is? > > any pointers would be appreciated. > > thanks. > >

