Replacing the current xercesimpl.jar with the one from nutch 1.0 seems to
fix the problem.

On Wed, Apr 21, 2010 at 3:14 PM, Harry Nutch <harrynu...@gmail.com> wrote:

> Hi,
>
> I am running the latest version for nutch. While crawling one particular
> site I get a AbstractMethodError in the cyberneko plugin for all of it pages
> when doing a Fetch.
> As i understand, this has to do because of difference between the runtime
> and compile version. However, I am running it afresh after an ant clean.
>
> Any suggestions would be helpful. Btw, i am using java version "1.6.0_18"
> on a windows environment
>
>
> java.lang.AbstractMethodError:
> org.cyberneko.html.HTMLScanner.getCharacterOffset
> ()I
>         at org.apache.xerces.xni.parser.XMLParseException.<init>(Unknown
> Source)
>
>         at
> org.cyberneko.html.HTMLConfiguration$ErrorReporter.createException(HT
> MLConfiguration.java:673)
>         at
> org.cyberneko.html.HTMLConfiguration$ErrorReporter.reportError(HTMLCo
> nfiguration.java:662)
>         at
> org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScann
> er.java:2404)
>         at
> org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScann
> er.java:2360)
>         at
> org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLSc
> anner.java:2267)
>         at
> org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1
> 820)
>         at
> org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
>         at
> org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478
> )
>         at
> org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431
> )
>         at
> org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.
> java:164)
>         at
> org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
>
>         at
> org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:212)
>         at
> org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:145)
>         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82)
>         at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:87
> 9)
>         at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:646)
> java.lang.AbstractMethodError:
> org.cyberneko.html.HTMLScanner.getCharacterOffset
> ()I
>         at org.apache.xerces.xni.parser.XMLParseException.<init>(Unknown
> Source)
>
>         at
> org.cyberneko.html.HTMLConfiguration$ErrorReporter.createException(HT
> MLConfiguration.java:673)
>         at
> org.cyberneko.html.HTMLConfiguration$ErrorReporter.reportError(HTMLCo
> nfiguration.java:662)
>         at
> org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScann
> er.java:2404)
>         at
> org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScann
> er.java:2360)
>         at
> org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLSc
> anner.java:2267)
>         at
> org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1
> 820)
>         at
> org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
>         at
> org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478
> )
>         at
> org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431
> )
>         at
> org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.
> java:164)
>         at
> org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:249)
>
>         at
> org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:212)
>         at
> org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:145)
>         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82)
>         at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:87
> 9)
>         at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:646)
>
>
>

Reply via email to