Logging is also different in 0.8. by default it logs to file $NUTCH_HOME/logs/hadoop.log (so you don't need to capture stdout, stderr to log file anymore)
--
Sami Siren

BDalton wrote:

Thank you, that seemed to fix the problem. Unfortunately, another problem
followed.

With command: bin/nutch crawl urls1 -dir newcrawled -depth 2 >& crawl.log

I now get a directory called “newcrawled”, however, the crawl.log is created
empty without any information. Also the index created contains no data. No
error messages. I’m using nightly July 18, and have no problems with 0.7.2.


Sami Siren-2 wrote:
in 0.8 you submit a _directory_ containing urls.txt not the file itself.

so remove /urls.txt part from your commandline and it should go fine.

--
Sami Siren

BDalton wrote:

I get this error,

bin/nutch crawl url.txt -dir newcrawled -depth 2 >& crawl.log

Exception in thread "main" java.io.IOException: Input directory
d:/nutch3/urls/urls.txt in local is invalid.
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)






Reply via email to