I looked at the hadoop.log file and got this error:

=====================

2012-11-01 14:46:48,153 INFO  crawl.InjectorJob - InjectorJob: starting
2012-11-01 14:46:48,167 INFO  crawl.InjectorJob - InjectorJob: urlDir: urls
2012-11-01 14:46:51,977 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2012-11-01 14:46:52,027 ERROR security.UserGroupInformation - PriviledgedActionException as:cocofan cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/cocofan/Dropbox/project/apache-nutch-2.1/runtime/local/bin/urls 2012-11-01 14:46:52,027 ERROR crawl.InjectorJob - InjectorJob: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/cocofan/Dropbox/project/apache-nutch-2.1/runtime/local/bin/urls at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)


====================

It looks like it's not finding my urls directory. It there and world read/write/exectutable so I'm not sure why it's not finding it. Any ideas about what's wrong?

Also, where are the configuration files for Hadoop if your using the built-in libraries that comes with Nutch? I don't see a hadoop.conf file in the conf directory.

                               cocofan

On 12-11-02 07:43 AM, kiran chitturi wrote:
Hi,

I would suggest you to check logs in 'logs/hadoop.log' file for more
information on the error.

I am not sure what versions of HBase are compatible with Nutch

HTH

Regards,
Kiran


On Thu, Nov 1, 2012 at 11:17 PM, cocofan <[email protected]> wrote:

    I'm new to Nutch.  I've been trying to get through the tutorials
(Nutch2tutorial and the older ones) but I'm getting an error when I try to
do a crawl:

==============================**============

^Ccocofan@cocofan-notebook-PC:**~/Dropbox/project/apache-**nutch-2.1/runtime/local$
bin/nutch crawl urls olr http://localhost:8983/solr/ -depth 3 -topN 5
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 1 records. Hit by time limit :0
fetching http://nutch.apache.org/
-finishing thread FetcherThread2, activeThreads=5
-finishing thread FetcherThread1, activeThreads=4
-finishing thread FetcherThread4, activeThreads=3
-finishing thread FetcherThread6, activeThreads=1
-finishing thread FetcherThread5, activeThreads=2
-finishing thread FetcherThread8, activeThreads=1
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
-finishing thread FetcherThread3, activeThreads=1
-finishing thread FetcherThread7, activeThreads=1
-finishing thread FetcherThread9, activeThreads=1
-finishing thread FetcherThread0, activeThreads=0
0/0 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 51 51 kb/s, 0
URLs in 0 queues
-activeThreads=0
ParserJob: resuming:    false
ParserJob: forced reparse:    false
ParserJob: parsing all
Parsing http://nutch.apache.org/
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
fetching 
http://nutch.apache.org/about.**html<http://nutch.apache.org/about.html>
QueueFeeder finished: total 5 records. Hit by time limit :0
10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 16 16 kb/s,
4 URLs in 1 queues
* queue: http://nutch.apache.org
   maxThreads    = 1
   inProgress    = 0
   crawlDelay    = 5000
   minCrawlDelay = 0
   nextFetchTime = 1351824493524
   now           = 1351824493242
   0. 
http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html>
   1. 
http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>
   2. http://nutch.apache.org/bot.**html <http://nutch.apache.org/bot.html>
   3. 
http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
fetching 
http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html>
10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 18 19 kb/s,
3 URLs in 1 queues
* queue: http://nutch.apache.org
   maxThreads    = 1
   inProgress    = 0
   crawlDelay    = 5000
   minCrawlDelay = 0
   nextFetchTime = 1351824498764
   now           = 1351824498244
   0. 
http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>
   1. http://nutch.apache.org/bot.**html <http://nutch.apache.org/bot.html>
   2. 
http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
fetching 
http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>
10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 12 2 kb/s, 2
URLs in 1 queues
* queue: http://nutch.apache.org
   maxThreads    = 1
   inProgress    = 0
   crawlDelay    = 5000
   minCrawlDelay = 0
   nextFetchTime = 1351824503930
   now           = 1351824503246
   0. http://nutch.apache.org/bot.**html <http://nutch.apache.org/bot.html>
   1. 
http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
fetching http://nutch.apache.org/bot.**html<http://nutch.apache.org/bot.html>
10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 14 18 kb/s,
1 URLs in 1 queues
* queue: http://nutch.apache.org
   maxThreads    = 1
   inProgress    = 0
   crawlDelay    = 5000
   minCrawlDelay = 0
   nextFetchTime = 1351824509093
   now           = 1351824508247
   0. 
http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
fetching 
http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
-finishing thread FetcherThread5, activeThreads=9
-finishing thread FetcherThread3, activeThreads=8
-finishing thread FetcherThread2, activeThreads=7
-finishing thread FetcherThread8, activeThreads=3
-finishing thread FetcherThread7, activeThreads=6
-finishing thread FetcherThread0, activeThreads=1
-finishing thread FetcherThread4, activeThreads=2
-finishing thread FetcherThread6, activeThreads=4
-finishing thread FetcherThread1, activeThreads=5
-finishing thread FetcherThread9, activeThreads=0
0/0 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 11 2 kb/s, 0
URLs in 0 queues
-activeThreads=0
ParserJob: resuming:    false
ParserJob: forced reparse:    false
ParserJob: parsing all
Skipping http://nutch.apache.org/; different batch id (null)
Parsing http://nutch.apache.org/about.**html<http://nutch.apache.org/about.html>
Parsing 
http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>
Parsing 
http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>
Parsing http://nutch.apache.org/bot.**html<http://nutch.apache.org/bot.html>
Parsing 
http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html>
Skipping http://nutch.apache.org/faq.**html<http://nutch.apache.org/faq.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>;
different batch id (null)
Skipping http://nutch.apache.org/index.**pdf<http://nutch.apache.org/index.pdf>;
different batch id (null)
Skipping 
http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**nightly.html<http://nutch.apache.org/nightly.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/old_**downloads.html<http://nutch.apache.org/old_downloads.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/sonar.**html<http://nutch.apache.org/sonar.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**tutorial.html<http://nutch.apache.org/tutorial.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**version_control.html<http://nutch.apache.org/version_control.html>;
different batch id (null)
Skipping http://nutch.apache.org/wiki.**html<http://nutch.apache.org/wiki.html>;
different batch id (null)
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
fetching 
http://nutch.apache.org/**nightly.html<http://nutch.apache.org/nightly.html>
QueueFeeder finished: total 5 records. Hit by time limit :0
10/10 spinwaiting/active, 1 pages, 0 errors, 0.2 0.2 pages/s, 15 15 kb/s,
4 URLs in 1 queues
* queue: http://nutch.apache.org
   maxThreads    = 1
   inProgress    = 0
   crawlDelay    = 5000
   minCrawlDelay = 0
   nextFetchTime = 1351824544459
   now           = 1351824543944
   0. 
http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html>
   1. http://nutch.apache.org/faq.**html <http://nutch.apache.org/faq.html>
   2. http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>
   3. 
http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
fetching 
http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html>
10/10 spinwaiting/active, 2 pages, 0 errors, 0.2 0.2 pages/s, 18 21 kb/s,
3 URLs in 1 queues
* queue: http://nutch.apache.org
   maxThreads    = 1
   inProgress    = 0
   crawlDelay    = 5000
   minCrawlDelay = 0
   nextFetchTime = 1351824550073
   now           = 1351824548946
   0. http://nutch.apache.org/faq.**html <http://nutch.apache.org/faq.html>
   1. http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>
   2. 
http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
fetching http://nutch.apache.org/faq.**html<http://nutch.apache.org/faq.html>
10/10 spinwaiting/active, 3 pages, 0 errors, 0.2 0.2 pages/s, 17 15 kb/s,
2 URLs in 1 queues
* queue: http://nutch.apache.org
   maxThreads    = 1
   inProgress    = 0
   crawlDelay    = 5000
   minCrawlDelay = 0
   nextFetchTime = 1351824555568
   now           = 1351824553948
   0. http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>
   1. 
http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
fetching 
http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>
10/10 spinwaiting/active, 4 pages, 0 errors, 0.2 0.2 pages/s, 25 51 kb/s,
1 URLs in 1 queues
* queue: http://nutch.apache.org
   maxThreads    = 1
   inProgress    = 0
   crawlDelay    = 5000
   minCrawlDelay = 0
   nextFetchTime = 1351824561359
   now           = 1351824558949
   0. 
http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
fetching 
http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
-finishing thread FetcherThread4, activeThreads=8
-finishing thread FetcherThread8, activeThreads=8
-finishing thread FetcherThread3, activeThreads=7
-finishing thread FetcherThread5, activeThreads=6
-finishing thread FetcherThread0, activeThreads=5
-finishing thread FetcherThread7, activeThreads=4
-finishing thread FetcherThread2, activeThreads=3
-finishing thread FetcherThread6, activeThreads=2
-finishing thread FetcherThread1, activeThreads=1
-finishing thread FetcherThread9, activeThreads=0
0/0 spinwaiting/active, 5 pages, 0 errors, 0.2 0.2 pages/s, 23 15 kb/s, 0
URLs in 0 queues
-activeThreads=0
ParserJob: resuming:    false
ParserJob: forced reparse:    false
ParserJob: parsing all
Skipping http://nutch.apache.org/; different batch id (null)
Skipping 
http://nutch.apache.org/about.**html<http://nutch.apache.org/about.html>;
different batch id (null)
Skipping http://nutch.apache.org/about.**pdf<http://nutch.apache.org/about.pdf>;
different batch id (null)
Skipping 
http://nutch.apache.org/**apidocs-1.5/allclasses-frame.**html<http://nutch.apache.org/apidocs-1.5/allclasses-frame.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**apidocs-1.5/index.html<http://nutch.apache.org/apidocs-1.5/index.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**apidocs-1.5/overview-frame.**html<http://nutch.apache.org/apidocs-1.5/overview-frame.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**apidocs-1.5/overview-summary.**html<http://nutch.apache.org/apidocs-1.5/overview-summary.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**apidocs-2.1/allclasses-frame.**html<http://nutch.apache.org/apidocs-2.1/allclasses-frame.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**apidocs-2.1/index.html<http://nutch.apache.org/apidocs-2.1/index.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**apidocs-2.1/overview-frame.**html<http://nutch.apache.org/apidocs-2.1/overview-frame.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**apidocs-2.1/overview-summary.**html<http://nutch.apache.org/apidocs-2.1/overview-summary.html>;
different batch id (null)
Skipping http://nutch.apache.org/bot.**html<http://nutch.apache.org/bot.html>;
different batch id (null)
Skipping http://nutch.apache.org/bot.**pdf<http://nutch.apache.org/bot.pdf>;
different batch id (null)
Skipping 
http://nutch.apache.org/**credits.html<http://nutch.apache.org/credits.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**credits.pdf<http://nutch.apache.org/credits.pdf>;
different batch id (null)
Parsing http://nutch.apache.org/faq.**html<http://nutch.apache.org/faq.html>
Parsing http://nutch.apache.org/index.**html<http://nutch.apache.org/index.html>
Skipping http://nutch.apache.org/index.**pdf<http://nutch.apache.org/index.pdf>;
different batch id (null)
Parsing 
http://nutch.apache.org/issue_**tracking.html<http://nutch.apache.org/issue_tracking.html>
Parsing 
http://nutch.apache.org/**mailing_lists.html<http://nutch.apache.org/mailing_lists.html>
Parsing 
http://nutch.apache.org/**nightly.html<http://nutch.apache.org/nightly.html>
Skipping 
http://nutch.apache.org/old_**downloads.html<http://nutch.apache.org/old_downloads.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/sonar.**html<http://nutch.apache.org/sonar.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**tutorial.html<http://nutch.apache.org/tutorial.html>;
different batch id (null)
Skipping 
http://nutch.apache.org/**version_control.html<http://nutch.apache.org/version_control.html>;
different batch id (null)
Skipping http://nutch.apache.org/wiki.**html<http://nutch.apache.org/wiki.html>;
different batch id (null)
Exception in thread "main" java.lang.NullPointerException
     at java.util.Hashtable.put(**Hashtable.java:411)
     at java.util.Properties.**setProperty(Properties.java:**160)
     at org.apache.hadoop.conf.**Configuration.set(**
Configuration.java:438)
     at org.apache.nutch.indexer.**IndexerJob.createIndexJob(**
IndexerJob.java:128)
     at org.apache.nutch.indexer.solr.**SolrIndexerJob.run(**
SolrIndexerJob.java:44)
     at org.apache.nutch.crawl.**Crawler.runTool(Crawler.java:**68)
     at org.apache.nutch.crawl.**Crawler.run(Crawler.java:192)
     at org.apache.nutch.crawl.**Crawler.run(Crawler.java:250)
     at org.apache.hadoop.util.**ToolRunner.run(ToolRunner.**java:65)
     at org.apache.nutch.crawl.**Crawler.main(Crawler.java:257)
==============================**======

    I'm using HBase 90.6 because the latest didn't work for me. Also, I'm
using solr 3.6.1 instead of solr 4.0 for the same problem.

   I was wondering what versions of Nutch, HBase and Solr other users who
have gotten Nutch to work. are using?  I'm getting the feeling that only
the right version combinations of all parts works .

                            cocofan




Reply via email to