Hi Mickael You can click on the logs for the fetch tasks to see the URLs being fetched
J. On 3 November 2016 at 02:05, Michael Coffey <[email protected]> wrote: > Thanks, that was very helpful! > Another newbie question: when I run nutch standalone, I can see what it's > trying to fetch (in my terminal) as it goes along. How can I watch what > it's doing when it runs under hadoop? I have clicked around a little bit in > the hadoop monitoring web app, but haven't found it yet. > > > From: Julien Nioche <[email protected]> > To: "[email protected]" <[email protected]>; Michael Coffey < > [email protected]> > Sent: Wednesday, November 2, 2016 9:51 AM > Subject: Re: Nutch 1.x on hadoop > > Michael, > > See > http://digitalpebble.blogspot.co.uk/2015/09/index-web-with- > aws-cloudsearch.html > for a relatively recent step-by-step tutorial for Nutch 1.x > > Julien > > > > On 2 November 2016 at 16:10, Michael Coffey <[email protected]> > wrote: > > > I'm having trouble trying to get Nutch 1.12 to run on hadoop 2.7.3. > > I get a class not found exception for org.apache.nutch.crawl.Crawl, as in > > the following attempt. > > $HADOOP_HOME/bin/hadoop jar "/home/mjc/apache-nutch-1.12/ > > runtime/deploy/apache-nutch-1.12.job" org.apache.nutch.crawl.Crawl seed > > -dir seed -depth 1 -topN 5Exception in thread "main" java.lang. > ClassNotFoundException: > > org.apache.nutch.crawl.Crawl at java.net.URLClassLoader$1.run( > > URLClassLoader.java:366) > > > > Searching the web, I see that things seem to have changed in recent > > versions of Nutch. However, I have not been able to find a good tutorial > or > > step-by-step guide for how to get this to work. I would appreciate any > > advice you could give. Is there documentation somewhere? Should I be > using > > an older version?? > > > > > > > -- > > *Open Source Solutions for Text Engineering* > > http://www.digitalpebble.com > http://digitalpebble.blogspot.com/ > #digitalpebble <http://twitter.com/digitalpebble> > > > -- *Open Source Solutions for Text Engineering* http://www.digitalpebble.com http://digitalpebble.blogspot.com/ #digitalpebble <http://twitter.com/digitalpebble>

