Hi,

I would suggest you to check logs in 'logs/hadoop.log' file for more
information on the error.

I am not sure what versions of HBase are compatible with Nutch

HTH

Regards,
Kiran


On Thu, Nov 1, 2012 at 11:17 PM, cocofan <[email protected]> wrote:

>    I'm new to Nutch.  I've been trying to get through the tutorials
> (Nutch2tutorial and the older ones) but I'm getting an error when I try to
> do a crawl:
>
> ==============================**============
>
> ^Ccocofan@cocofan-notebook-PC:**~/Dropbox/project/apache-**nutch-2.1/runtime/local$
> bin/nutch crawl urls olr http://localhost:8983/solr/ -depth 3 -topN 5
> FetcherJob: threads: 10
> FetcherJob: parsing: false
> FetcherJob: resuming: false
> FetcherJob : timelimit set for : -1
> Using queue mode : byHost
> Fetcher: threads: 10
> QueueFeeder finished: total 1 records. Hit by time limit :0
> fetching http://nutch.apache.org/
> -finishing thread FetcherThread2, activeThreads=5
> -finishing thread FetcherThread1, activeThreads=4
> -finishing thread FetcherThread4, activeThreads=3
> -finishing thread FetcherThread6, activeThreads=1
> -finishing thread FetcherThread5, activeThreads=2
> -finishing thread FetcherThread8, activeThreads=1
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold sequence: 5
> -finishing thread FetcherThread3, activeThreads=1
> -finishing thread FetcherThread7, activeThreads=1
> -finishing thread FetcherThread9, activeThreads=1
> -finishing thread FetcherThread0, activeThreads=0
> 0/0 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 51 51 kb/s, 0
> URLs in 0 queues
> -activeThreads=0
> ParserJob: resuming:    false
> ParserJob: forced reparse:    false
> ParserJob: parsing all
> Parsing http://nutch.apache.org/
> FetcherJob: threads: 10
> FetcherJob: parsing: false
> FetcherJob: resuming: false
> FetcherJob : timelimit set for : -1
> Using queue mode : byHost
> Fetcher: threads: 10
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold sequence: 5
> fetching 
> http://nutch.apache.org/about.**html<http://nutch.apache.org/about.html>
> QueueFeeder finished: total 5 records. Hit by time limit :0
> 10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 16 16 kb/s,
> 4 URLs in 1 queues
> * queue: http://nutch.apache.org
>   maxThreads    = 1
>   inProgress    = 0
>   crawlDelay    = 5000
>   minCrawlDelay = 0
>   nextFetchTime = 1351824493524
>   now           = 1351824493242
>   0. 
> http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html>
>   1. 
> http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>
>   2. http://nutch.apache.org/bot.**html <http://nutch.apache.org/bot.html>
>   3. 
> http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
> fetching 
> http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html>
> 10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 18 19 kb/s,
> 3 URLs in 1 queues
> * queue: http://nutch.apache.org
>   maxThreads    = 1
>   inProgress    = 0
>   crawlDelay    = 5000
>   minCrawlDelay = 0
>   nextFetchTime = 1351824498764
>   now           = 1351824498244
>   0. 
> http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>
>   1. http://nutch.apache.org/bot.**html <http://nutch.apache.org/bot.html>
>   2. 
> http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
> fetching 
> http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>
> 10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 12 2 kb/s, 2
> URLs in 1 queues
> * queue: http://nutch.apache.org
>   maxThreads    = 1
>   inProgress    = 0
>   crawlDelay    = 5000
>   minCrawlDelay = 0
>   nextFetchTime = 1351824503930
>   now           = 1351824503246
>   0. http://nutch.apache.org/bot.**html <http://nutch.apache.org/bot.html>
>   1. 
> http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
> fetching http://nutch.apache.org/bot.**html<http://nutch.apache.org/bot.html>
> 10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 14 18 kb/s,
> 1 URLs in 1 queues
> * queue: http://nutch.apache.org
>   maxThreads    = 1
>   inProgress    = 0
>   crawlDelay    = 5000
>   minCrawlDelay = 0
>   nextFetchTime = 1351824509093
>   now           = 1351824508247
>   0. 
> http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
> fetching 
> http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
> -finishing thread FetcherThread5, activeThreads=9
> -finishing thread FetcherThread3, activeThreads=8
> -finishing thread FetcherThread2, activeThreads=7
> -finishing thread FetcherThread8, activeThreads=3
> -finishing thread FetcherThread7, activeThreads=6
> -finishing thread FetcherThread0, activeThreads=1
> -finishing thread FetcherThread4, activeThreads=2
> -finishing thread FetcherThread6, activeThreads=4
> -finishing thread FetcherThread1, activeThreads=5
> -finishing thread FetcherThread9, activeThreads=0
> 0/0 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 11 2 kb/s, 0
> URLs in 0 queues
> -activeThreads=0
> ParserJob: resuming:    false
> ParserJob: forced reparse:    false
> ParserJob: parsing all
> Skipping http://nutch.apache.org/; different batch id (null)
> Parsing 
> http://nutch.apache.org/about.**html<http://nutch.apache.org/about.html>
> Parsing 
> http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
> Parsing 
> http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>
> Parsing http://nutch.apache.org/bot.**html<http://nutch.apache.org/bot.html>
> Parsing 
> http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html>
> Skipping http://nutch.apache.org/faq.**html<http://nutch.apache.org/faq.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/index.**pdf<http://nutch.apache.org/index.pdf>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**nightly.html<http://nutch.apache.org/nightly.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/old_**downloads.html<http://nutch.apache.org/old_downloads.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/sonar.**html<http://nutch.apache.org/sonar.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**tutorial.html<http://nutch.apache.org/tutorial.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**version_control.html<http://nutch.apache.org/version_control.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/wiki.**html<http://nutch.apache.org/wiki.html>;
> different batch id (null)
> FetcherJob: threads: 10
> FetcherJob: parsing: false
> FetcherJob: resuming: false
> FetcherJob : timelimit set for : -1
> Using queue mode : byHost
> Fetcher: threads: 10
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold sequence: 5
> fetching 
> http://nutch.apache.org/**nightly.html<http://nutch.apache.org/nightly.html>
> QueueFeeder finished: total 5 records. Hit by time limit :0
> 10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 15 15 kb/s,
> 4 URLs in 1 queues
> * queue: http://nutch.apache.org
>   maxThreads    = 1
>   inProgress    = 0
>   crawlDelay    = 5000
>   minCrawlDelay = 0
>   nextFetchTime = 1351824544459
>   now           = 1351824543944
>   0. 
> http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html>
>   1. http://nutch.apache.org/faq.**html <http://nutch.apache.org/faq.html>
>   2. http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>
>   3. 
> http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
> fetching 
> http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html>
> 10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 18 21 kb/s,
> 3 URLs in 1 queues
> * queue: http://nutch.apache.org
>   maxThreads    = 1
>   inProgress    = 0
>   crawlDelay    = 5000
>   minCrawlDelay = 0
>   nextFetchTime = 1351824550073
>   now           = 1351824548946
>   0. http://nutch.apache.org/faq.**html <http://nutch.apache.org/faq.html>
>   1. http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>
>   2. 
> http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
> fetching http://nutch.apache.org/faq.**html<http://nutch.apache.org/faq.html>
> 10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 17 15 kb/s,
> 2 URLs in 1 queues
> * queue: http://nutch.apache.org
>   maxThreads    = 1
>   inProgress    = 0
>   crawlDelay    = 5000
>   minCrawlDelay = 0
>   nextFetchTime = 1351824555568
>   now           = 1351824553948
>   0. http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>
>   1. 
> http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
> fetching 
> http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>
> 10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 25 51 kb/s,
> 1 URLs in 1 queues
> * queue: http://nutch.apache.org
>   maxThreads    = 1
>   inProgress    = 0
>   crawlDelay    = 5000
>   minCrawlDelay = 0
>   nextFetchTime = 1351824561359
>   now           = 1351824558949
>   0. 
> http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
> fetching 
> http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
> -finishing thread FetcherThread4, activeThreads=8
> -finishing thread FetcherThread8, activeThreads=8
> -finishing thread FetcherThread3, activeThreads=7
> -finishing thread FetcherThread5, activeThreads=6
> -finishing thread FetcherThread0, activeThreads=5
> -finishing thread FetcherThread7, activeThreads=4
> -finishing thread FetcherThread2, activeThreads=3
> -finishing thread FetcherThread6, activeThreads=2
> -finishing thread FetcherThread1, activeThreads=1
> -finishing thread FetcherThread9, activeThreads=0
> 0/0 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 23 15 kb/s, 0
> URLs in 0 queues
> -activeThreads=0
> ParserJob: resuming:    false
> ParserJob: forced reparse:    false
> ParserJob: parsing all
> Skipping http://nutch.apache.org/; different batch id (null)
> Skipping 
> http://nutch.apache.org/about.**html<http://nutch.apache.org/about.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/about.**pdf<http://nutch.apache.org/about.pdf>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**apidocs-1.5/allclasses-frame.**html<http://nutch.apache.org/apidocs-1.5/allclasses-frame.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**apidocs-1.5/overview-frame.**html<http://nutch.apache.org/apidocs-1.5/overview-frame.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**apidocs-1.5/overview-summary.**html<http://nutch.apache.org/apidocs-1.5/overview-summary.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**apidocs-2.1/allclasses-frame.**html<http://nutch.apache.org/apidocs-2.1/allclasses-frame.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**apidocs-2.1/overview-frame.**html<http://nutch.apache.org/apidocs-2.1/overview-frame.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**apidocs-2.1/overview-summary.**html<http://nutch.apache.org/apidocs-2.1/overview-summary.html>;
> different batch id (null)
> Skipping http://nutch.apache.org/bot.**html<http://nutch.apache.org/bot.html>;
> different batch id (null)
> Skipping http://nutch.apache.org/bot.**pdf<http://nutch.apache.org/bot.pdf>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**credits.pdf<http://nutch.apache.org/credits.pdf>;
> different batch id (null)
> Parsing http://nutch.apache.org/faq.**html<http://nutch.apache.org/faq.html>
> Parsing 
> http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>
> Skipping 
> http://nutch.apache.org/index.**pdf<http://nutch.apache.org/index.pdf>;
> different batch id (null)
> Parsing 
> http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
> Parsing 
> http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html>
> Parsing 
> http://nutch.apache.org/**nightly.html<http://nutch.apache.org/nightly.html>
> Skipping 
> http://nutch.apache.org/old_**downloads.html<http://nutch.apache.org/old_downloads.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/sonar.**html<http://nutch.apache.org/sonar.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**tutorial.html<http://nutch.apache.org/tutorial.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/**version_control.html<http://nutch.apache.org/version_control.html>;
> different batch id (null)
> Skipping 
> http://nutch.apache.org/wiki.**html<http://nutch.apache.org/wiki.html>;
> different batch id (null)
> Exception in thread "main" java.lang.NullPointerException
>     at java.util.Hashtable.put(**Hashtable.java:411)
>     at java.util.Properties.**setProperty(Properties.java:**160)
>     at org.apache.hadoop.conf.**Configuration.set(**
> Configuration.java:438)
>     at org.apache.nutch.indexer.**IndexerJob.createIndexJob(**
> IndexerJob.java:128)
>     at org.apache.nutch.indexer.solr.**SolrIndexerJob.run(**
> SolrIndexerJob.java:44)
>     at org.apache.nutch.crawl.**Crawler.runTool(Crawler.java:**68)
>     at org.apache.nutch.crawl.**Crawler.run(Crawler.java:192)
>     at org.apache.nutch.crawl.**Crawler.run(Crawler.java:250)
>     at org.apache.hadoop.util.**ToolRunner.run(ToolRunner.**java:65)
>     at org.apache.nutch.crawl.**Crawler.main(Crawler.java:257)
> ==============================**======
>
>    I'm using HBase 90.6 because the latest didn't work for me. Also, I'm
> using solr 3.6.1 instead of solr 4.0 for the same problem.
>
>   I was wondering what versions of Nutch, HBase and Solr other users who
> have gotten Nutch to work. are using?  I'm getting the feeling that only
> the right version combinations of all parts works .
>
>                            cocofan
>



-- 
Kiran Chitturi

Reply via email to