[ 
https://issues.apache.org/jira/browse/NUTCH-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-1884.
------------------------------------
    Resolution: Fixed

Committed to trunk/1.x, r1637237. Nutch 2.x is not affected because there is no 
ParseResult :) Thanks!

> NullPointerException in parsechecker and indexchecker with symlinks in file 
> URL
> -------------------------------------------------------------------------------
>
>                 Key: NUTCH-1884
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1884
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer, parser
>    Affects Versions: 1.9
>         Environment: Mac OS X 10.9.2
> Apache Maven 2.2.1
> Java version: 1.7.0_51
>            Reporter: Mengying Wang
>            Priority: Minor
>             Fix For: 1.10
>
>         Attachments: NUTCH-1884-trunk-v1.patch
>
>
> I have downloaded the Nutch source code from github 
> (https://github.com/apache/nutch), applied the patches (NUTCH-1879 and 
> NUTCH-1880), and then reinstalled the Nutch.  Now the good news is that all 
> urls contain only 1 slash. But unfortunately, the 
> java.lang.NullPointerException warning/error still exists for both of the 
> parsechecker and indexchecker commands.
> Below is the running log:
> (1) $ ./nutch parsechecker 
> "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> signature: 17bdb44990391c96bb8d48d1802ff11c
> Couldn't pass score, url 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
>  (java.lang.NullPointerException)
> ---------
> Url
> ---------------
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/
> ---------
> ParseData
> ---------
> Version: 5
> Status: success(1,0)
> Title: Index of 
> /Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml
> Outlinks: 2
>   outlink: toUrl: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/
>  anchor: ../
>   outlink: toUrl: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml
>  anchor: monitor.xml
> Content Metadata: Content-Length=352 nutch.crawl.score=0.0 Last-Modified=Tue, 
> 14 Oct 2014 20:05:50 GMT Content-Type=text/html 
> Parse Metadata: CharEncodingForConversion=windows-1252 
> OriginalCharEncoding=windows-1252 
> (2) $ ./nutch indexchecker 
> "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> Exception in thread "main" java.lang.NullPointerException
>       at 
> org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at 
> org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to