Re: Nutch - Hadoop Help

Manikandan Saravanan Tue, 04 Feb 2014 06:21:44 -0800

Okay, the crawl runs well for the most part:

I’m running the crawl script as bin/crawl urls/seed.txt TestCrawl 
http://xxx.xxx.xxx.xxx:8983/solr/ 2


And it’s giving me this:
Exception in thread "main" java.lang.IllegalArgumentException: usage: (-crawlId 
<id>)
        at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:117)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:123)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

After the parse job. What is wrong?
-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople

On 4 February 2014 at 3:11:36 pm, Lewis John Mcgibbney 
([email protected]) wrote:

https://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script  


On Tue, Feb 4, 2014 at 7:04 AM, Manikandan Saravanan <  
[email protected]> wrote:  

> How do I run the crawl script on hadoop?  
> --  
> Manikandan Saravanan  
> Architect - Technology  
> TheSocialPeople <http://thesocialpeople.net>  
>  
> On 4 February 2014 at 1:28:39 am, Lewis John Mcgibbney (  
> [email protected] <//[email protected]>) wrote:  
>  
> Hi Manikandan,  
>  
> On Mon, Feb 3, 2014 at 3:45 PM, <[email protected]>  
> wrote:  
>  
> > And then, I'm running this:  
> > $HADOOP_HOME/bin/hadoop jar /usr/local/nutch/nutch.job  
> > org.apache.nutch.crawl.Crawler dmoz -dir /user/hduser/crawl -depth 3  
> -topN  
> > 5000  
> >  
>  
> You're using the Crawler class. This is not advised at all and is now  
> deprecated. There is no point in downloading the crawl script if you are  
> going to use the Crawler class. I would suggest you using the crawl  
> script.  
>  
>  
> >  
> > org.apache.gora.memory.store.MemStore as the Gora storage class.  
> >  
>  
> Please don't use MemStore its implementation in Gora 0.3 is not thread  
> safe  
> and is only used for trivial tests. Please see the 2.x tutorial on the  
> Nutch wiki for details of how to configure the supported Gora persistent  
> data stores.  
>  
>  
> Once you've used the crawl script, and configured your Nutch deployment  
> job  
> file, please get back to us with your results.  
> Remeber you will always need to regenerate your Nutch job file if you make  
> configuration changes to your Nutch deployment.  
> hth  
> Thanks  
>  
>  


--  
*Lewis*

Re: Nutch - Hadoop Help

Reply via email to