Re: AW: nutch-0.8 crawl problem

Nutch Wed, 22 Feb 2006 04:01:26 -0800

Sorry
./nutch crawl url.txt -dir mydir -depth 4

> Hi Dima
> What did you write on the command line? sh  ./nutch crawl myurls .....


> You need to put your URL's in an input directory (e.g. myurls).
> There you put text files with your URLs (e.g. myurls/myurllist.txt). 

> Kind regards
> Matthias

> -----Ursprüngliche Nachricht-----
> Von: Dima Mazmanov [mailto:[EMAIL PROTECTED] 
> Gesendet: Mittwoch, 22. Februar 2006 09:31
> An: [email protected]
> Betreff: nutch-0.8 crawl problem

> Hi!
> I have problems in crawling..Mainly I cannot even start to crawl.
> I've downloaded latest source of nutch, and after 3 hours of
> struggling with config files, I gave up.
> I have some question I want to ask
> 1) What is hadoop and how can I use it.
> I searched information about hadoop and found that it's no longer
> integrated in nutch.It's another project.
> But in lib folder I found corresponding hadoop-0.1-dev.jar file. But what 
> does he do?
> 2) How can I crawl? :)
> when I type command I get following exception 

> No input directories specified in: Configuration: defaults:
> hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop/mapred/local/localRunner/job_vpit8j.xmlfinal:
> hadoop-site.xml
>         at
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
>         at
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> 060222 131857  map 0%  reduce 0%
> Exception in thread "main" java.io.IOException: Job failed!
>         at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)

> I wrote in hadoop-site.xml following

> <!--StartFragment--><property>
>     <name>fs.default.name</name>
>     <value>localhost:9000</value>
>   </property>

>   <property>
>     <name>mapred.job.tracker</name>
>     <value>localhost:9001</value>
>   </property>

>   <property>
>     <name>dfs.replication</name>
>     <value>1</value>
>   </property>

> But I don't know what does it mean.(just copied from hadoop website)
> So, how can I crawl using nutch-0.8?
> 3) where is ./nutch ndfs?
> When I execute this command I get
> Exception in thread "main" java.lang.NoClassDefFoundError: ndfs
> I had no problems with 0.7 version.
> I decided to move to 0.8 because of parse-swf plugin, since I couldn't 
> compile it.
> Please describe how to use new nutch? Or what do I need to compile parse-swf 
> plugin?





> __________ NOD32 1.1415 (20060221) Information __________

> This message was checked by NOD32 antivirus system.
> http://www.eset.com




-- 
Ñ óâàæåíèåì,
 Nutch                          mailto:[EMAIL PROTECTED]

Re: AW: nutch-0.8 crawl problem

Reply via email to