correction, my mistake, I am getting a different nullpointer error Exception in thread "main" java.lang.NullPointerException at java.util.Hashtable.put(Hashtable.java:411) at java.util.Properties.setProperty(Properties.java:160) at org.apache.hadoop.conf.Configuration.set(Configuration.java:438) at org.apache.nutch.indexer.IndexerJob.createIndexJob(IndexerJob.java:128) at org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:44) at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) at org.apache.nutch.crawl.Crawler.run(Crawler.java:192) at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
On Wed, Nov 28, 2012 at 4:18 PM, Nicholas Roberts < [email protected]> wrote: > I am working from this tutorial and get a similar error > http://nlp.solutions.asia/?p=180 > > > On Fri, Nov 2, 2012 at 1:13 PM, cocofan <[email protected]> wrote: > >> On 12-11-02 12:45 PM, Lewis John Mcgibbney wrote: >> >>> Hi, >>> >>> On Fri, Nov 2, 2012 at 5:36 PM, cocofan <[email protected]> wrote: >>> >>> 2012-11-01 14:46:52,027 ERROR security.UserGroupInformation - >>>> PriviledgedActionException as:cocofan >>>> >>> I've never seen this Exception before...honestly. >>> >>> cause:org.apache.hadoop.**mapreduce.lib.input.**InvalidInputException: >>>> Input >>>> path does not exist: >>>> file:/home/cocofan/Dropbox/**project/apache-nutch-2.1/** >>>> runtime/local/bin/urls >>>> 2012-11-01 14:46:52,027 ERROR crawl.InjectorJob - InjectorJob: >>>> org.apache.hadoop.mapreduce.**lib.input.**InvalidInputException: Input >>>> path does >>>> not exist: >>>> >>> The rest seems to be pretty straight forward. You appear to be running >>> nutch from $NUTCH_HOME/runtime/local/bin with the following command >>> ./nutch XYZ >>> >> I am running nutch from /runtime/local and I do have the >> urls directory in both /runtime/local/bin and /runtime/local (with the >> seed.txt file in both). >> >> The command I'm using is (from /runtime/local): >> ./bin/nutch crawl urls -solr >> http://localhost:8983/solr/ -depth 3 -topN 5 >> >> Actually it seems to be a problem with hadoop so I was >> wondering if I need to set a directory in a config file there? >> >> >> Unless you urls directory is located in the ./bin directory (which I >>> doubt it is) then you should come up one directory and run the command >>> from $NUTCH_HOME/runtime/local e.g. ./bin/nutch XYZ >>> >>> Does this make sense? Please read the tutorial carefully and >>> thoroughly and it will work perfectly. >>> >>> hth >>> >>> Lewis >>> >>> >> > > > -- > -- > Nicholas Roberts > US 510-684-8264 > http://Permaculture.TV > > -- -- Nicholas Roberts US 510-684-8264 http://Permaculture.TV

