Re: Nutch 1.x on hadoop

Divjot Singh Wed, 02 Nov 2016 09:23:49 -0700

Hi

I have used nutch 2.3 so don't know it would help with 1.x. In the deploy
folder there is a crawl script in bin folder.


*runtime/deploy/bin/crawl /tmp/seed.txt group_a 1000 *

the seed.txt file should copied to hdfs.

Thanks
Divjot

On Wed, Nov 2, 2016 at 9:40 PM, Michael Coffey <[email protected]>
wrote:

> I'm having trouble trying to get Nutch 1.12 to run on hadoop 2.7.3.
> I get a class not found exception for org.apache.nutch.crawl.Crawl, as in
> the following attempt.
> $HADOOP_HOME/bin/hadoop jar "/home/mjc/apache-nutch-1.12/
> runtime/deploy/apache-nutch-1.12.job" org.apache.nutch.crawl.Crawl seed
> -dir seed -depth 1 -topN 5Exception in thread "main" 
> java.lang.ClassNotFoundException:
> org.apache.nutch.crawl.Crawl        at java.net.URLClassLoader$1.run(
> URLClassLoader.java:366)
>
> Searching the web, I see that things seem to have changed in recent
> versions of Nutch. However, I have not been able to find a good tutorial or
> step-by-step guide for how to get this to work. I would appreciate any
> advice you could give. Is there documentation somewhere? Should I be using
> an older version??
>
>

Re: Nutch 1.x on hadoop

Reply via email to