Hi Steve,

I tried with Nutch 1.9 RC1 and am not getting this exception.
=>  ./nutch parsechecker -D http.agent.name=tralala
http://www.my-ebenefits.com/PenguinRandomHouse/

Probably something that we fixed since 1.5.1 which is rather outdated. Why
don't you give 1.9 a try instead?

Julien



On 12 August 2014 20:34, Steve Cohen <[email protected]> wrote:

> Hello,
>
> I have been running nutch 1.5.1 without a problem but I have run across a
> couple web pages that are giving me a null pointer exception when I try to
> crawl them.
>
> 2014-08-12 14:01:21,844 ERROR org.apache.nutch.parse.html: Error:
> java.lang.NullPointerException
>         at org.apache.xerces.parsers.AbstractDOMParser.characters(Unknown
> Source)
>         at
> org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:195)
>         at
>
> org.cyberneko.html.HTMLScanner$ContentScanner.scanCharacters(HTMLScanner.java:2033)
>         at
> org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1836)
>         at
> org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:809)
>         at
> org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
>         at
> org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>         at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>         at
> org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:347)
>         at
> org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:244)
>         at
> org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:160)
>         at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:35)
>         at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:24)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.lang.Thread.run(Thread.java:662)
> 2014-08-12 14:01:21,844 WARN org.apache.nutch.parse.ParseSegment: Error
> parsing: http://www.my-ebenefits.com/PenguinRandomHouse/: failed(2,200):
> java.lang.NullPointerException
>
>
> What information do I need to provide for you to help me debug the issue?
>
> Thanks,
> Steve
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to