NullPointerException
--------------------

                 Key: NUTCH-428
                 URL: https://issues.apache.org/jira/browse/NUTCH-428
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 0.8.1
         Environment: Windows XP
            Reporter: Piyush


I am using the NUTCH.Bat provided in one one of the thread. (i am not using 
CYGWIN) Whenever I try to fetch the Item, I am getting fetching failed 
"nullpointerexception" 
I have a URL Directory. which has urls.txt file. there is only one entry in the 
file which is http://www.winzip.com/land_about.htm. 
I have updated the crawl-urlfilter.txt with +^http://www.winzip.com/. 

Is there any other settings I am missing?? Any help is greatly appreciated. 

The command i used to  start the crawl is 
nutch  crawl urls -dir crawlResults -depth 1

Here is my log 

crawl started in: crawlResult
rootUrlDir = urls
threads = 10
depth = 1
Injector: starting
Injector: crawlDb: crawlResult/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: crawlResult/segments/20070110085314
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawlResult/segments/20070110085314
Fetcher: threads: 10
fetching http://www.winzip.com/land_about.htm
fetch of http://www.winzip.com/land_about.htm failed with: 
java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawlResult/crawldb
CrawlDb update: segment: crawlResult/segments/20070110085314
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: crawlResult/linkdb
LinkDb: adding segment: crawlResult/segments/20070110085314
LinkDb: done
Indexer: starting
Indexer: linkdb: crawlResult/linkdb
Indexer: adding segment: crawlResult/segments/20070110085314
Optimizing index.
Indexer: done
Dedup: starting
Dedup: adding indexes in: crawlResult/indexes
Dedup: done
Adding crawlResult/indexes/part-00000
crawl finished: crawlResult
 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to