Re: nutch on amazon emr
I suggest running it using the stock bin/crawl script from the command line first and then try using the jar that you mentioned. On Jan 1, 2015 12:04 PM, Adil Ishaque Abbasi aiabb...@gmail.com wrote: I tried to run it through custom jar step using script runner jar i.e. s3://elasticmapreduce/libs/script-runner/script-runner.jar Regards Adil I. Abbasi On Thu, Jan 1, 2015 at 8:51 PM, Meraj A. Khan mera...@gmail.com wrote: Can you give us the command that you use to start the crawl? On Jan 1, 2015 10:28 AM, Adil Ishaque Abbasi aiabb...@gmail.com wrote: When I try to nutch crawl script on amazon emr, it gives me this error /mnt/var/lib/hadoop/steps/s-3VT1QRVSURPSH/./crawl: line 81: hdfs:///nutch/bin/nutch: No such file or directory Command exiting with ret '0' Though nutch script is located at hdfs:///nutch/bin/,still it gives this erorr. Any idea what is it that I'm doing wrong ? Regards Adil
Re: nutch on amazon emr
I tried to run it through custom jar step using script runner jar i.e. s3://elasticmapreduce/libs/script-runner/script-runner.jar Regards Adil I. Abbasi On Thu, Jan 1, 2015 at 8:51 PM, Meraj A. Khan mera...@gmail.com wrote: Can you give us the command that you use to start the crawl? On Jan 1, 2015 10:28 AM, Adil Ishaque Abbasi aiabb...@gmail.com wrote: When I try to nutch crawl script on amazon emr, it gives me this error /mnt/var/lib/hadoop/steps/s-3VT1QRVSURPSH/./crawl: line 81: hdfs:///nutch/bin/nutch: No such file or directory Command exiting with ret '0' Though nutch script is located at hdfs:///nutch/bin/,still it gives this erorr. Any idea what is it that I'm doing wrong ? Regards Adil
Re: nutch on amazon emr
Can you give us the command that you use to start the crawl? On Jan 1, 2015 10:28 AM, Adil Ishaque Abbasi aiabb...@gmail.com wrote: When I try to nutch crawl script on amazon emr, it gives me this error /mnt/var/lib/hadoop/steps/s-3VT1QRVSURPSH/./crawl: line 81: hdfs:///nutch/bin/nutch: No such file or directory Command exiting with ret '0' Though nutch script is located at hdfs:///nutch/bin/,still it gives this erorr. Any idea what is it that I'm doing wrong ? Regards Adil
Re: nutch on amazon emr
Hi Adil Why don't you simply SSH to the master node, install Nutch there and run the crawl script in runtime/deploy? You can then monitor your crawl in the usual way using the MapReduce UI. HTH Julien On 1 January 2015 at 17:03, Adil Ishaque Abbasi aiabb...@gmail.com wrote: I tried to run it through custom jar step using script runner jar i.e. s3://elasticmapreduce/libs/script-runner/script-runner.jar Regards Adil I. Abbasi On Thu, Jan 1, 2015 at 8:51 PM, Meraj A. Khan mera...@gmail.com wrote: Can you give us the command that you use to start the crawl? On Jan 1, 2015 10:28 AM, Adil Ishaque Abbasi aiabb...@gmail.com wrote: When I try to nutch crawl script on amazon emr, it gives me this error /mnt/var/lib/hadoop/steps/s-3VT1QRVSURPSH/./crawl: line 81: hdfs:///nutch/bin/nutch: No such file or directory Command exiting with ret '0' Though nutch script is located at hdfs:///nutch/bin/,still it gives this erorr. Any idea what is it that I'm doing wrong ? Regards Adil -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble