Hi, I would suggest you to check logs in 'logs/hadoop.log' file for more information on the error.
I am not sure what versions of HBase are compatible with Nutch HTH Regards, Kiran On Thu, Nov 1, 2012 at 11:17 PM, cocofan <[email protected]> wrote: > I'm new to Nutch. I've been trying to get through the tutorials > (Nutch2tutorial and the older ones) but I'm getting an error when I try to > do a crawl: > > ==============================**============ > > ^Ccocofan@cocofan-notebook-PC:**~/Dropbox/project/apache-**nutch-2.1/runtime/local$ > bin/nutch crawl urls olr http://localhost:8983/solr/ -depth 3 -topN 5 > FetcherJob: threads: 10 > FetcherJob: parsing: false > FetcherJob: resuming: false > FetcherJob : timelimit set for : -1 > Using queue mode : byHost > Fetcher: threads: 10 > QueueFeeder finished: total 1 records. Hit by time limit :0 > fetching http://nutch.apache.org/ > -finishing thread FetcherThread2, activeThreads=5 > -finishing thread FetcherThread1, activeThreads=4 > -finishing thread FetcherThread4, activeThreads=3 > -finishing thread FetcherThread6, activeThreads=1 > -finishing thread FetcherThread5, activeThreads=2 > -finishing thread FetcherThread8, activeThreads=1 > Fetcher: throughput threshold: -1 > Fetcher: throughput threshold sequence: 5 > -finishing thread FetcherThread3, activeThreads=1 > -finishing thread FetcherThread7, activeThreads=1 > -finishing thread FetcherThread9, activeThreads=1 > -finishing thread FetcherThread0, activeThreads=0 > 0/0 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 51 51 kb/s, 0 > URLs in 0 queues > -activeThreads=0 > ParserJob: resuming: false > ParserJob: forced reparse: false > ParserJob: parsing all > Parsing http://nutch.apache.org/ > FetcherJob: threads: 10 > FetcherJob: parsing: false > FetcherJob: resuming: false > FetcherJob : timelimit set for : -1 > Using queue mode : byHost > Fetcher: threads: 10 > Fetcher: throughput threshold: -1 > Fetcher: throughput threshold sequence: 5 > fetching > http://nutch.apache.org/about.**html<http://nutch.apache.org/about.html> > QueueFeeder finished: total 5 records. Hit by time limit :0 > 10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 16 16 kb/s, > 4 URLs in 1 queues > * queue: http://nutch.apache.org > maxThreads = 1 > inProgress = 0 > crawlDelay = 5000 > minCrawlDelay = 0 > nextFetchTime = 1351824493524 > now = 1351824493242 > 0. > http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html> > 1. > http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html> > 2. http://nutch.apache.org/bot.**html <http://nutch.apache.org/bot.html> > 3. > http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html> > fetching > http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html> > 10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 18 19 kb/s, > 3 URLs in 1 queues > * queue: http://nutch.apache.org > maxThreads = 1 > inProgress = 0 > crawlDelay = 5000 > minCrawlDelay = 0 > nextFetchTime = 1351824498764 > now = 1351824498244 > 0. > http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html> > 1. http://nutch.apache.org/bot.**html <http://nutch.apache.org/bot.html> > 2. > http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html> > fetching > http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html> > 10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 12 2 kb/s, 2 > URLs in 1 queues > * queue: http://nutch.apache.org > maxThreads = 1 > inProgress = 0 > crawlDelay = 5000 > minCrawlDelay = 0 > nextFetchTime = 1351824503930 > now = 1351824503246 > 0. http://nutch.apache.org/bot.**html <http://nutch.apache.org/bot.html> > 1. > http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html> > fetching http://nutch.apache.org/bot.**html<http://nutch.apache.org/bot.html> > 10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 14 18 kb/s, > 1 URLs in 1 queues > * queue: http://nutch.apache.org > maxThreads = 1 > inProgress = 0 > crawlDelay = 5000 > minCrawlDelay = 0 > nextFetchTime = 1351824509093 > now = 1351824508247 > 0. > http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html> > fetching > http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html> > -finishing thread FetcherThread5, activeThreads=9 > -finishing thread FetcherThread3, activeThreads=8 > -finishing thread FetcherThread2, activeThreads=7 > -finishing thread FetcherThread8, activeThreads=3 > -finishing thread FetcherThread7, activeThreads=6 > -finishing thread FetcherThread0, activeThreads=1 > -finishing thread FetcherThread4, activeThreads=2 > -finishing thread FetcherThread6, activeThreads=4 > -finishing thread FetcherThread1, activeThreads=5 > -finishing thread FetcherThread9, activeThreads=0 > 0/0 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 11 2 kb/s, 0 > URLs in 0 queues > -activeThreads=0 > ParserJob: resuming: false > ParserJob: forced reparse: false > ParserJob: parsing all > Skipping http://nutch.apache.org/; different batch id (null) > Parsing > http://nutch.apache.org/about.**html<http://nutch.apache.org/about.html> > Parsing > http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html> > Parsing > http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html> > Parsing http://nutch.apache.org/bot.**html<http://nutch.apache.org/bot.html> > Parsing > http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html> > Skipping http://nutch.apache.org/faq.**html<http://nutch.apache.org/faq.html>; > different batch id (null) > Skipping > http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>; > different batch id (null) > Skipping > http://nutch.apache.org/index.**pdf<http://nutch.apache.org/index.pdf>; > different batch id (null) > Skipping > http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**nightly.html<http://nutch.apache.org/nightly.html>; > different batch id (null) > Skipping > http://nutch.apache.org/old_**downloads.html<http://nutch.apache.org/old_downloads.html>; > different batch id (null) > Skipping > http://nutch.apache.org/sonar.**html<http://nutch.apache.org/sonar.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**tutorial.html<http://nutch.apache.org/tutorial.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**version_control.html<http://nutch.apache.org/version_control.html>; > different batch id (null) > Skipping > http://nutch.apache.org/wiki.**html<http://nutch.apache.org/wiki.html>; > different batch id (null) > FetcherJob: threads: 10 > FetcherJob: parsing: false > FetcherJob: resuming: false > FetcherJob : timelimit set for : -1 > Using queue mode : byHost > Fetcher: threads: 10 > Fetcher: throughput threshold: -1 > Fetcher: throughput threshold sequence: 5 > fetching > http://nutch.apache.org/**nightly.html<http://nutch.apache.org/nightly.html> > QueueFeeder finished: total 5 records. Hit by time limit :0 > 10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 15 15 kb/s, > 4 URLs in 1 queues > * queue: http://nutch.apache.org > maxThreads = 1 > inProgress = 0 > crawlDelay = 5000 > minCrawlDelay = 0 > nextFetchTime = 1351824544459 > now = 1351824543944 > 0. > http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html> > 1. http://nutch.apache.org/faq.**html <http://nutch.apache.org/faq.html> > 2. http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html> > 3. > http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html> > fetching > http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html> > 10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 18 21 kb/s, > 3 URLs in 1 queues > * queue: http://nutch.apache.org > maxThreads = 1 > inProgress = 0 > crawlDelay = 5000 > minCrawlDelay = 0 > nextFetchTime = 1351824550073 > now = 1351824548946 > 0. http://nutch.apache.org/faq.**html <http://nutch.apache.org/faq.html> > 1. http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html> > 2. > http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html> > fetching http://nutch.apache.org/faq.**html<http://nutch.apache.org/faq.html> > 10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 17 15 kb/s, > 2 URLs in 1 queues > * queue: http://nutch.apache.org > maxThreads = 1 > inProgress = 0 > crawlDelay = 5000 > minCrawlDelay = 0 > nextFetchTime = 1351824555568 > now = 1351824553948 > 0. http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html> > 1. > http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html> > fetching > http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html> > 10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 25 51 kb/s, > 1 URLs in 1 queues > * queue: http://nutch.apache.org > maxThreads = 1 > inProgress = 0 > crawlDelay = 5000 > minCrawlDelay = 0 > nextFetchTime = 1351824561359 > now = 1351824558949 > 0. > http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html> > fetching > http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html> > -finishing thread FetcherThread4, activeThreads=8 > -finishing thread FetcherThread8, activeThreads=8 > -finishing thread FetcherThread3, activeThreads=7 > -finishing thread FetcherThread5, activeThreads=6 > -finishing thread FetcherThread0, activeThreads=5 > -finishing thread FetcherThread7, activeThreads=4 > -finishing thread FetcherThread2, activeThreads=3 > -finishing thread FetcherThread6, activeThreads=2 > -finishing thread FetcherThread1, activeThreads=1 > -finishing thread FetcherThread9, activeThreads=0 > 0/0 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 23 15 kb/s, 0 > URLs in 0 queues > -activeThreads=0 > ParserJob: resuming: false > ParserJob: forced reparse: false > ParserJob: parsing all > Skipping http://nutch.apache.org/; different batch id (null) > Skipping > http://nutch.apache.org/about.**html<http://nutch.apache.org/about.html>; > different batch id (null) > Skipping > http://nutch.apache.org/about.**pdf<http://nutch.apache.org/about.pdf>; > different batch id (null) > Skipping > http://nutch.apache.org/**apidocs-1.5/allclasses-frame.**html<http://nutch.apache.org/apidocs-1.5/allclasses-frame.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**apidocs-1.5/overview-frame.**html<http://nutch.apache.org/apidocs-1.5/overview-frame.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**apidocs-1.5/overview-summary.**html<http://nutch.apache.org/apidocs-1.5/overview-summary.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**apidocs-2.1/allclasses-frame.**html<http://nutch.apache.org/apidocs-2.1/allclasses-frame.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**apidocs-2.1/overview-frame.**html<http://nutch.apache.org/apidocs-2.1/overview-frame.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**apidocs-2.1/overview-summary.**html<http://nutch.apache.org/apidocs-2.1/overview-summary.html>; > different batch id (null) > Skipping http://nutch.apache.org/bot.**html<http://nutch.apache.org/bot.html>; > different batch id (null) > Skipping http://nutch.apache.org/bot.**pdf<http://nutch.apache.org/bot.pdf>; > different batch id (null) > Skipping > http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**credits.pdf<http://nutch.apache.org/credits.pdf>; > different batch id (null) > Parsing http://nutch.apache.org/faq.**html<http://nutch.apache.org/faq.html> > Parsing > http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html> > Skipping > http://nutch.apache.org/index.**pdf<http://nutch.apache.org/index.pdf>; > different batch id (null) > Parsing > http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html> > Parsing > http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html> > Parsing > http://nutch.apache.org/**nightly.html<http://nutch.apache.org/nightly.html> > Skipping > http://nutch.apache.org/old_**downloads.html<http://nutch.apache.org/old_downloads.html>; > different batch id (null) > Skipping > http://nutch.apache.org/sonar.**html<http://nutch.apache.org/sonar.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**tutorial.html<http://nutch.apache.org/tutorial.html>; > different batch id (null) > Skipping > http://nutch.apache.org/**version_control.html<http://nutch.apache.org/version_control.html>; > different batch id (null) > Skipping > http://nutch.apache.org/wiki.**html<http://nutch.apache.org/wiki.html>; > different batch id (null) > Exception in thread "main" java.lang.NullPointerException > at java.util.Hashtable.put(**Hashtable.java:411) > at java.util.Properties.**setProperty(Properties.java:**160) > at org.apache.hadoop.conf.**Configuration.set(** > Configuration.java:438) > at org.apache.nutch.indexer.**IndexerJob.createIndexJob(** > IndexerJob.java:128) > at org.apache.nutch.indexer.solr.**SolrIndexerJob.run(** > SolrIndexerJob.java:44) > at org.apache.nutch.crawl.**Crawler.runTool(Crawler.java:**68) > at org.apache.nutch.crawl.**Crawler.run(Crawler.java:192) > at org.apache.nutch.crawl.**Crawler.run(Crawler.java:250) > at org.apache.hadoop.util.**ToolRunner.run(ToolRunner.**java:65) > at org.apache.nutch.crawl.**Crawler.main(Crawler.java:257) > ==============================**====== > > I'm using HBase 90.6 because the latest didn't work for me. Also, I'm > using solr 3.6.1 instead of solr 4.0 for the same problem. > > I was wondering what versions of Nutch, HBase and Solr other users who > have gotten Nutch to work. are using? I'm getting the feeling that only > the right version combinations of all parts works . > > cocofan > -- Kiran Chitturi

