Cocofan, This is probably not the issue but to be conservative I would recommend you do not build/run this inside a Dropbox folder. Dropbox synchronization takes time and may cause strange results in rare situations. Again, probably not the issue but ...
________________________________________ From: cocofan [[email protected]] Sent: Saturday, November 03, 2012 5:13 AM To: [email protected] Subject: Re: Getting a NullPointerException in Nutch 2.1 On 12-11-02 12:45 PM, Lewis John Mcgibbney wrote: > Hi, > > On Fri, Nov 2, 2012 at 5:36 PM, cocofan <[email protected]> wrote: > >> 2012-11-01 14:46:52,027 ERROR security.UserGroupInformation - >> PriviledgedActionException as:cocofan > I've never seen this Exception before...honestly. > >> cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input >> path does not exist: >> file:/home/cocofan/Dropbox/project/apache-nutch-2.1/runtime/local/bin/urls >> 2012-11-01 14:46:52,027 ERROR crawl.InjectorJob - InjectorJob: >> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does >> not exist: > The rest seems to be pretty straight forward. You appear to be running > nutch from $NUTCH_HOME/runtime/local/bin with the following command > ./nutch XYZ I am running nutch from /runtime/local and I do have the urls directory in both /runtime/local/bin and /runtime/local (with the seed.txt file in both). The command I'm using is (from /runtime/local): ./bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5 Actually it seems to be a problem with hadoop so I was wondering if I need to set a directory in a config file there? > Unless you urls directory is located in the ./bin directory (which I > doubt it is) then you should come up one directory and run the command > from $NUTCH_HOME/runtime/local e.g. ./bin/nutch XYZ > > Does this make sense? Please read the tutorial carefully and > thoroughly and it will work perfectly. > > hth > > Lewis >

