Sorry ./nutch crawl url.txt -dir mydir -depth 4 > Hi Dima > What did you write on the command line? sh ./nutch crawl myurls .....
> You need to put your URL's in an input directory (e.g. myurls). > There you put text files with your URLs (e.g. myurls/myurllist.txt). > Kind regards > Matthias > -----Ursprüngliche Nachricht----- > Von: Dima Mazmanov [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 22. Februar 2006 09:31 > An: [email protected] > Betreff: nutch-0.8 crawl problem > Hi! > I have problems in crawling..Mainly I cannot even start to crawl. > I've downloaded latest source of nutch, and after 3 hours of > struggling with config files, I gave up. > I have some question I want to ask > 1) What is hadoop and how can I use it. > I searched information about hadoop and found that it's no longer > integrated in nutch.It's another project. > But in lib folder I found corresponding hadoop-0.1-dev.jar file. But what > does he do? > 2) How can I crawl? :) > when I type command I get following exception > No input directories specified in: Configuration: defaults: > hadoop-default.xml , mapred-default.xml , > /tmp/hadoop/mapred/local/localRunner/job_vpit8j.xmlfinal: > hadoop-site.xml > at > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84) > at > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70) > 060222 131857 map 0% reduce 0% > Exception in thread "main" java.io.IOException: Job failed! > at > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310) > at org.apache.nutch.crawl.Injector.inject(Injector.java:114) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:104) > I wrote in hadoop-site.xml following > <!--StartFragment--><property> > <name>fs.default.name</name> > <value>localhost:9000</value> > </property> > <property> > <name>mapred.job.tracker</name> > <value>localhost:9001</value> > </property> > <property> > <name>dfs.replication</name> > <value>1</value> > </property> > But I don't know what does it mean.(just copied from hadoop website) > So, how can I crawl using nutch-0.8? > 3) where is ./nutch ndfs? > When I execute this command I get > Exception in thread "main" java.lang.NoClassDefFoundError: ndfs > I had no problems with 0.7 version. > I decided to move to 0.8 because of parse-swf plugin, since I couldn't > compile it. > Please describe how to use new nutch? Or what do I need to compile parse-swf > plugin? > __________ NOD32 1.1415 (20060221) Information __________ > This message was checked by NOD32 antivirus system. > http://www.eset.com -- Ñ óâàæåíèåì, Nutch mailto:[EMAIL PROTECTED]
