I'm new to Nutch. I've been trying to get through the tutorials (Nutch2tutorial and the older ones) but I'm getting an error when I try to do a crawl:

==========================================

^Ccocofan@cocofan-notebook-PC:~/Dropbox/project/apache-nutch-2.1/runtime/local$ bin/nutch crawl urls olr http://localhost:8983/solr/ -depth 3 -topN 5
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 1 records. Hit by time limit :0
fetching http://nutch.apache.org/
-finishing thread FetcherThread2, activeThreads=5
-finishing thread FetcherThread1, activeThreads=4
-finishing thread FetcherThread4, activeThreads=3
-finishing thread FetcherThread6, activeThreads=1
-finishing thread FetcherThread5, activeThreads=2
-finishing thread FetcherThread8, activeThreads=1
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
-finishing thread FetcherThread3, activeThreads=1
-finishing thread FetcherThread7, activeThreads=1
-finishing thread FetcherThread9, activeThreads=1
-finishing thread FetcherThread0, activeThreads=0
0/0 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 51 51 kb/s, 0 URLs in 0 queues
-activeThreads=0
ParserJob: resuming:    false
ParserJob: forced reparse:    false
ParserJob: parsing all
Parsing http://nutch.apache.org/
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
fetching http://nutch.apache.org/about.html
QueueFeeder finished: total 5 records. Hit by time limit :0
10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 16 16 kb/s, 4 URLs in 1 queues
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1351824493524
  now           = 1351824493242
  0. http://nutch.apache.org/credits.html
  1. http://nutch.apache.org/apidocs-2.1/index.html
  2. http://nutch.apache.org/bot.html
  3. http://nutch.apache.org/apidocs-1.5/index.html
fetching http://nutch.apache.org/credits.html
10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 18 19 kb/s, 3 URLs in 1 queues
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1351824498764
  now           = 1351824498244
  0. http://nutch.apache.org/apidocs-2.1/index.html
  1. http://nutch.apache.org/bot.html
  2. http://nutch.apache.org/apidocs-1.5/index.html
fetching http://nutch.apache.org/apidocs-2.1/index.html
10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 12 2 kb/s, 2 URLs in 1 queues
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1351824503930
  now           = 1351824503246
  0. http://nutch.apache.org/bot.html
  1. http://nutch.apache.org/apidocs-1.5/index.html
fetching http://nutch.apache.org/bot.html
10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 14 18 kb/s, 1 URLs in 1 queues
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1351824509093
  now           = 1351824508247
  0. http://nutch.apache.org/apidocs-1.5/index.html
fetching http://nutch.apache.org/apidocs-1.5/index.html
-finishing thread FetcherThread5, activeThreads=9
-finishing thread FetcherThread3, activeThreads=8
-finishing thread FetcherThread2, activeThreads=7
-finishing thread FetcherThread8, activeThreads=3
-finishing thread FetcherThread7, activeThreads=6
-finishing thread FetcherThread0, activeThreads=1
-finishing thread FetcherThread4, activeThreads=2
-finishing thread FetcherThread6, activeThreads=4
-finishing thread FetcherThread1, activeThreads=5
-finishing thread FetcherThread9, activeThreads=0
0/0 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 11 2 kb/s, 0 URLs in 0 queues
-activeThreads=0
ParserJob: resuming:    false
ParserJob: forced reparse:    false
ParserJob: parsing all
Skipping http://nutch.apache.org/; different batch id (null)
Parsing http://nutch.apache.org/about.html
Parsing http://nutch.apache.org/apidocs-1.5/index.html
Parsing http://nutch.apache.org/apidocs-2.1/index.html
Parsing http://nutch.apache.org/bot.html
Parsing http://nutch.apache.org/credits.html
Skipping http://nutch.apache.org/faq.html; different batch id (null)
Skipping http://nutch.apache.org/index.html; different batch id (null)
Skipping http://nutch.apache.org/index.pdf; different batch id (null)
Skipping http://nutch.apache.org/issue_tracking.html; different batch id (null) Skipping http://nutch.apache.org/mailing_lists.html; different batch id (null)
Skipping http://nutch.apache.org/nightly.html; different batch id (null)
Skipping http://nutch.apache.org/old_downloads.html; different batch id (null)
Skipping http://nutch.apache.org/sonar.html; different batch id (null)
Skipping http://nutch.apache.org/tutorial.html; different batch id (null)
Skipping http://nutch.apache.org/version_control.html; different batch id (null)
Skipping http://nutch.apache.org/wiki.html; different batch id (null)
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
fetching http://nutch.apache.org/nightly.html
QueueFeeder finished: total 5 records. Hit by time limit :0
10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 15 15 kb/s, 4 URLs in 1 queues
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1351824544459
  now           = 1351824543944
  0. http://nutch.apache.org/mailing_lists.html
  1. http://nutch.apache.org/faq.html
  2. http://nutch.apache.org/index.html
  3. http://nutch.apache.org/issue_tracking.html
fetching http://nutch.apache.org/mailing_lists.html
10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 18 21 kb/s, 3 URLs in 1 queues
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1351824550073
  now           = 1351824548946
  0. http://nutch.apache.org/faq.html
  1. http://nutch.apache.org/index.html
  2. http://nutch.apache.org/issue_tracking.html
fetching http://nutch.apache.org/faq.html
10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 17 15 kb/s, 2 URLs in 1 queues
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1351824555568
  now           = 1351824553948
  0. http://nutch.apache.org/index.html
  1. http://nutch.apache.org/issue_tracking.html
fetching http://nutch.apache.org/index.html
10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 25 51 kb/s, 1 URLs in 1 queues
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1351824561359
  now           = 1351824558949
  0. http://nutch.apache.org/issue_tracking.html
fetching http://nutch.apache.org/issue_tracking.html
-finishing thread FetcherThread4, activeThreads=8
-finishing thread FetcherThread8, activeThreads=8
-finishing thread FetcherThread3, activeThreads=7
-finishing thread FetcherThread5, activeThreads=6
-finishing thread FetcherThread0, activeThreads=5
-finishing thread FetcherThread7, activeThreads=4
-finishing thread FetcherThread2, activeThreads=3
-finishing thread FetcherThread6, activeThreads=2
-finishing thread FetcherThread1, activeThreads=1
-finishing thread FetcherThread9, activeThreads=0
0/0 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 23 15 kb/s, 0 URLs in 0 queues
-activeThreads=0
ParserJob: resuming:    false
ParserJob: forced reparse:    false
ParserJob: parsing all
Skipping http://nutch.apache.org/; different batch id (null)
Skipping http://nutch.apache.org/about.html; different batch id (null)
Skipping http://nutch.apache.org/about.pdf; different batch id (null)
Skipping http://nutch.apache.org/apidocs-1.5/allclasses-frame.html; different batch id (null) Skipping http://nutch.apache.org/apidocs-1.5/index.html; different batch id (null) Skipping http://nutch.apache.org/apidocs-1.5/overview-frame.html; different batch id (null) Skipping http://nutch.apache.org/apidocs-1.5/overview-summary.html; different batch id (null) Skipping http://nutch.apache.org/apidocs-2.1/allclasses-frame.html; different batch id (null) Skipping http://nutch.apache.org/apidocs-2.1/index.html; different batch id (null) Skipping http://nutch.apache.org/apidocs-2.1/overview-frame.html; different batch id (null) Skipping http://nutch.apache.org/apidocs-2.1/overview-summary.html; different batch id (null)
Skipping http://nutch.apache.org/bot.html; different batch id (null)
Skipping http://nutch.apache.org/bot.pdf; different batch id (null)
Skipping http://nutch.apache.org/credits.html; different batch id (null)
Skipping http://nutch.apache.org/credits.pdf; different batch id (null)
Parsing http://nutch.apache.org/faq.html
Parsing http://nutch.apache.org/index.html
Skipping http://nutch.apache.org/index.pdf; different batch id (null)
Parsing http://nutch.apache.org/issue_tracking.html
Parsing http://nutch.apache.org/mailing_lists.html
Parsing http://nutch.apache.org/nightly.html
Skipping http://nutch.apache.org/old_downloads.html; different batch id (null)
Skipping http://nutch.apache.org/sonar.html; different batch id (null)
Skipping http://nutch.apache.org/tutorial.html; different batch id (null)
Skipping http://nutch.apache.org/version_control.html; different batch id (null)
Skipping http://nutch.apache.org/wiki.html; different batch id (null)
Exception in thread "main" java.lang.NullPointerException
    at java.util.Hashtable.put(Hashtable.java:411)
    at java.util.Properties.setProperty(Properties.java:160)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:438)
at org.apache.nutch.indexer.IndexerJob.createIndexJob(IndexerJob.java:128) at org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:44)
    at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:192)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
====================================

I'm using HBase 90.6 because the latest didn't work for me. Also, I'm using solr 3.6.1 instead of solr 4.0 for the same problem.

I was wondering what versions of Nutch, HBase and Solr other users who have gotten Nutch to work. are using? I'm getting the feeling that only the right version combinations of all parts works .

                           cocofan

Reply via email to