Thanks, that was very helpful!
Another newbie question: when I run nutch standalone, I can see what it's 
trying to fetch (in my terminal) as it goes along. How can I watch what it's 
doing when it runs under hadoop? I have clicked around a little bit in the 
hadoop monitoring web app, but haven't found it yet.


      From: Julien Nioche <[email protected]>
 To: "[email protected]" <[email protected]>; Michael Coffey 
<[email protected]> 
 Sent: Wednesday, November 2, 2016 9:51 AM
 Subject: Re: Nutch 1.x on hadoop
   
Michael,

See
http://digitalpebble.blogspot.co.uk/2015/09/index-web-with-aws-cloudsearch.html
for a relatively recent step-by-step tutorial for Nutch 1.x

Julien



On 2 November 2016 at 16:10, Michael Coffey <[email protected]>
wrote:

> I'm having trouble trying to get Nutch 1.12 to run on hadoop 2.7.3.
> I get a class not found exception for org.apache.nutch.crawl.Crawl, as in
> the following attempt.
> $HADOOP_HOME/bin/hadoop jar "/home/mjc/apache-nutch-1.12/
> runtime/deploy/apache-nutch-1.12.job" org.apache.nutch.crawl.Crawl seed
> -dir seed -depth 1 -topN 5Exception in thread "main" 
> java.lang.ClassNotFoundException:
> org.apache.nutch.crawl.Crawl        at java.net.URLClassLoader$1.run(
> URLClassLoader.java:366)
>
> Searching the web, I see that things seem to have changed in recent
> versions of Nutch. However, I have not been able to find a good tutorial or
> step-by-step guide for how to get this to work. I would appreciate any
> advice you could give. Is there documentation somewhere? Should I be using
> an older version??
>
>


-- 

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>


   

Reply via email to