[Nutch-general] AW: nutch-0.8 crawl problem

Guenter, Matthias Wed, 22 Feb 2006 00:58:04 -0800

Hi Dima
What did you write on the command line? sh  ./nutch crawl myurls .....


You need to put your URL's in an input directory (e.g. myurls). There you put 
text files with your URLs (e.g. myurls/myurllist.txt). 

Kind regards
Matthias

-----Ursprüngliche Nachricht-----
Von: Dima Mazmanov [mailto:[EMAIL PROTECTED] 
Gesendet: Mittwoch, 22. Februar 2006 09:31
An: [email protected]
Betreff: nutch-0.8 crawl problem

Hi!
I have problems in crawling..Mainly I cannot even start to crawl.
I've downloaded latest source of nutch, and after 3 hours of struggling with 
config files, I gave up.
I have some question I want to ask
1) What is hadoop and how can I use it.
I searched information about hadoop and found that it's no longer integrated in 
nutch.It's another project.
But in lib folder I found corresponding hadoop-0.1-dev.jar file. But what does 
he do?
2) How can I crawl? :)
when I type command I get following exception 

No input directories specified in: Configuration: defaults: hadoop-default.xml 
, mapred-default.xml , 
/tmp/hadoop/mapred/local/localRunner/job_vpit8j.xmlfinal: hadoop-site.xml
        at 
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
        at 
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
060222 131857  map 0%  reduce 0%
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)

I wrote in hadoop-site.xml following

<!--StartFragment--><property>
    <name>fs.default.name</name>
    <value>localhost:9000</value>
  </property>

  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

But I don't know what does it mean.(just copied from hadoop website)
So, how can I crawl using nutch-0.8?
3) where is ./nutch ndfs?
When I execute this command I get
Exception in thread "main" java.lang.NoClassDefFoundError: ndfs
I had no problems with 0.7 version.
I decided to move to 0.8 because of parse-swf plugin, since I couldn't compile 
it.
Please describe how to use new nutch? Or what do I need to compile parse-swf 
plugin?




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] AW: nutch-0.8 crawl problem

Reply via email to