[ https://issues.apache.org/jira/browse/NUTCH-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sami Siren resolved NUTCH-428. ------------------------------ Resolution: Fixed Fix Version/s: 0.9.0 Most propably you dont have agent name configured in nutch-site.xml. I changed this situation to emit RuntimeException in trunk instead so it's easier to diagnose. > NullPointerException > -------------------- > > Key: NUTCH-428 > URL: https://issues.apache.org/jira/browse/NUTCH-428 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 0.8.1 > Environment: Windows XP > Reporter: Piyush > Fix For: 0.9.0 > > > I am using the NUTCH.Bat provided in one one of the thread. (i am not using > CYGWIN) Whenever I try to fetch the Item, I am getting fetching failed > "nullpointerexception" > I have a URL Directory. which has urls.txt file. there is only one entry in > the file which is http://www.winzip.com/land_about.htm. > I have updated the crawl-urlfilter.txt with +^http://www.winzip.com/. > Is there any other settings I am missing?? Any help is greatly appreciated. > The command i used to start the crawl is > nutch crawl urls -dir crawlResults -depth 1 > Here is my log > crawl started in: crawlResult > rootUrlDir = urls > threads = 10 > depth = 1 > Injector: starting > Injector: crawlDb: crawlResult/crawldb > Injector: urlDir: urls > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: done > Generator: starting > Generator: segment: crawlResult/segments/20070110085314 > Generator: Selecting best-scoring urls due for fetch. > Generator: Partitioning selected urls by host, for politeness. > Generator: done. > Fetcher: starting > Fetcher: segment: crawlResult/segments/20070110085314 > Fetcher: threads: 10 > fetching http://www.winzip.com/land_about.htm > fetch of http://www.winzip.com/land_about.htm failed with: > java.lang.NullPointerException > Fetcher: done > CrawlDb update: starting > CrawlDb update: db: crawlResult/crawldb > CrawlDb update: segment: crawlResult/segments/20070110085314 > CrawlDb update: Merging segment data into db. > CrawlDb update: done > LinkDb: starting > LinkDb: linkdb: crawlResult/linkdb > LinkDb: adding segment: crawlResult/segments/20070110085314 > LinkDb: done > Indexer: starting > Indexer: linkdb: crawlResult/linkdb > Indexer: adding segment: crawlResult/segments/20070110085314 > Optimizing index. > Indexer: done > Dedup: starting > Dedup: adding indexes in: crawlResult/indexes > Dedup: done > Adding crawlResult/indexes/part-00000 > crawl finished: crawlResult > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers