and my hadoop.log reads 2012-11-28 16:28:21,735 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2012-11-28 16:28:22,804 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000 2012-11-28 16:28:25,804 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000 2012-11-28 16:28:25,805 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2012-11-28 16:28:25,805 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 2012-11-28 16:28:25,805 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 2012-11-28 16:28:28,789 WARN mapred.FileOutputCommitter - Output path is null in cleanup
On Wed, Nov 28, 2012 at 4:35 PM, Nicholas Roberts < [email protected]> wrote: > correction, my mistake, I am getting a different nullpointer error > > Exception in thread "main" java.lang.NullPointerException > at java.util.Hashtable.put(Hashtable.java:411) > at java.util.Properties.setProperty(Properties.java:160) > at org.apache.hadoop.conf.Configuration.set(Configuration.java:438) > at > org.apache.nutch.indexer.IndexerJob.createIndexJob(IndexerJob.java:128) > at org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:44) > at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) > at org.apache.nutch.crawl.Crawler.run(Crawler.java:192) > at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.Crawler.main(Crawler.java:257) > > > > On Wed, Nov 28, 2012 at 4:18 PM, Nicholas Roberts < > [email protected]> wrote: > >> I am working from this tutorial and get a similar error >> http://nlp.solutions.asia/?p=180 >> >> >> On Fri, Nov 2, 2012 at 1:13 PM, cocofan <[email protected]> wrote: >> >>> On 12-11-02 12:45 PM, Lewis John Mcgibbney wrote: >>> >>>> Hi, >>>> >>>> On Fri, Nov 2, 2012 at 5:36 PM, cocofan <[email protected]> wrote: >>>> >>>> 2012-11-01 14:46:52,027 ERROR security.UserGroupInformation - >>>>> PriviledgedActionException as:cocofan >>>>> >>>> I've never seen this Exception before...honestly. >>>> >>>> cause:org.apache.hadoop.**mapreduce.lib.input.**InvalidInputException: >>>>> Input >>>>> path does not exist: >>>>> file:/home/cocofan/Dropbox/**project/apache-nutch-2.1/** >>>>> runtime/local/bin/urls >>>>> 2012-11-01 14:46:52,027 ERROR crawl.InjectorJob - InjectorJob: >>>>> org.apache.hadoop.mapreduce.**lib.input.**InvalidInputException: >>>>> Input path does >>>>> not exist: >>>>> >>>> The rest seems to be pretty straight forward. You appear to be running >>>> nutch from $NUTCH_HOME/runtime/local/bin with the following command >>>> ./nutch XYZ >>>> >>> I am running nutch from /runtime/local and I do have the >>> urls directory in both /runtime/local/bin and /runtime/local (with the >>> seed.txt file in both). >>> >>> The command I'm using is (from /runtime/local): >>> ./bin/nutch crawl urls -solr >>> http://localhost:8983/solr/ -depth 3 -topN 5 >>> >>> Actually it seems to be a problem with hadoop so I was >>> wondering if I need to set a directory in a config file there? >>> >>> >>> Unless you urls directory is located in the ./bin directory (which I >>>> doubt it is) then you should come up one directory and run the command >>>> from $NUTCH_HOME/runtime/local e.g. ./bin/nutch XYZ >>>> >>>> Does this make sense? Please read the tutorial carefully and >>>> thoroughly and it will work perfectly. >>>> >>>> hth >>>> >>>> Lewis >>>> >>>> >>> >> >> >> -- >> -- >> Nicholas Roberts >> US 510-684-8264 >> http://Permaculture.TV >> >> > > > -- > -- > Nicholas Roberts > US 510-684-8264 > http://Permaculture.TV > > -- -- Nicholas Roberts US 510-684-8264 http://Permaculture.TV

