NullPointerException -------------------- Key: NUTCH-428 URL: https://issues.apache.org/jira/browse/NUTCH-428 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 0.8.1 Environment: Windows XP Reporter: Piyush
I am using the NUTCH.Bat provided in one one of the thread. (i am not using CYGWIN) Whenever I try to fetch the Item, I am getting fetching failed "nullpointerexception" I have a URL Directory. which has urls.txt file. there is only one entry in the file which is http://www.winzip.com/land_about.htm. I have updated the crawl-urlfilter.txt with +^http://www.winzip.com/. Is there any other settings I am missing?? Any help is greatly appreciated. The command i used to start the crawl is nutch crawl urls -dir crawlResults -depth 1 Here is my log crawl started in: crawlResult rootUrlDir = urls threads = 10 depth = 1 Injector: starting Injector: crawlDb: crawlResult/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done Generator: starting Generator: segment: crawlResult/segments/20070110085314 Generator: Selecting best-scoring urls due for fetch. Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: crawlResult/segments/20070110085314 Fetcher: threads: 10 fetching http://www.winzip.com/land_about.htm fetch of http://www.winzip.com/land_about.htm failed with: java.lang.NullPointerException Fetcher: done CrawlDb update: starting CrawlDb update: db: crawlResult/crawldb CrawlDb update: segment: crawlResult/segments/20070110085314 CrawlDb update: Merging segment data into db. CrawlDb update: done LinkDb: starting LinkDb: linkdb: crawlResult/linkdb LinkDb: adding segment: crawlResult/segments/20070110085314 LinkDb: done Indexer: starting Indexer: linkdb: crawlResult/linkdb Indexer: adding segment: crawlResult/segments/20070110085314 Optimizing index. Indexer: done Dedup: starting Dedup: adding indexes in: crawlResult/indexes Dedup: done Adding crawlResult/indexes/part-00000 crawl finished: crawlResult -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers