Replacing the current xercesimpl.jar with the one from nutch 1.0 seems to fix the problem.
On Wed, Apr 21, 2010 at 3:14 PM, Harry Nutch <harrynu...@gmail.com> wrote: > Hi, > > I am running the latest version for nutch. While crawling one particular > site I get a AbstractMethodError in the cyberneko plugin for all of it pages > when doing a Fetch. > As i understand, this has to do because of difference between the runtime > and compile version. However, I am running it afresh after an ant clean. > > Any suggestions would be helpful. Btw, i am using java version "1.6.0_18" > on a windows environment > > > java.lang.AbstractMethodError: > org.cyberneko.html.HTMLScanner.getCharacterOffset > ()I > at org.apache.xerces.xni.parser.XMLParseException.<init>(Unknown > Source) > > at > org.cyberneko.html.HTMLConfiguration$ErrorReporter.createException(HT > MLConfiguration.java:673) > at > org.cyberneko.html.HTMLConfiguration$ErrorReporter.reportError(HTMLCo > nfiguration.java:662) > at > org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScann > er.java:2404) > at > org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScann > er.java:2360) > at > org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLSc > anner.java:2267) > at > org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1 > 820) > at > org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789) > at > org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478 > ) > at > org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431 > ) > at > org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser. > java:164) > at > org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249) > > at > org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:212) > at > org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:145) > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:87 > 9) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:646) > java.lang.AbstractMethodError: > org.cyberneko.html.HTMLScanner.getCharacterOffset > ()I > at org.apache.xerces.xni.parser.XMLParseException.<init>(Unknown > Source) > > at > org.cyberneko.html.HTMLConfiguration$ErrorReporter.createException(HT > MLConfiguration.java:673) > at > org.cyberneko.html.HTMLConfiguration$ErrorReporter.reportError(HTMLCo > nfiguration.java:662) > at > org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScann > er.java:2404) > at > org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScann > er.java:2360) > at > org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLSc > anner.java:2267) > at > org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1 > 820) > at > org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789) > at > org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478 > ) > at > org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431 > ) > at > org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser. > java:164) > at > org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249) > > at > org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:212) > at > org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:145) > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:87 > 9) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:646) > > >