Additional info: I tried this with "-threads 1" and I still got the same error.
> -----Original Message----- > From: Teruhiko Kurosaka > Sent: 2006-6-05 10:45 > To: '[email protected]'; 'TDLN' > Subject: "Target /tmp/.../map_ynynnj.out already exists" error [RE: help running 5/31 version of nightly build] > > Thank you, Thomas. That's a small change in 0.8 that I overlooked. > Nutch crawl now progresses to a further step. > But it still stalls with an IOException, like show below. > Any further insight? > (I re-ran the same command after removing the tmp directory > and the index directory, > but I hit the same exception.) > > -kuro > > $ ./bin/nutch crawl test/urls -dir test/thoreau-index -depth > 2 2>&1 | tee crawl-thoreau-060605-log.txt > 060605 103451 Running job: job_yaocyb > 060605 103451 > C:/opt/nutch-060531/test/thoreau-index/crawldb/current/part-00 > 000/data:0+125 > 060605 103451 > C:/opt/nutch-060531/test/thoreau-index/segments/20060605103443 > /crawl_fetch/part-00000/data:0+141 > 060605 103451 > C:/opt/nutch-060531/test/thoreau-index/segments/20060605103443 > /crawl_parse/part-00000:0+748 > 060605 103451 job_yaocyb > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_yv2ar3/map_ynynnj.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem > .java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner > .java:101) > java.io.IOException: Job failed! > at > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) > Exception in thread "main" > > > > -----Original Message----- > > From: TDLN [mailto:[EMAIL PROTECTED] > > Sent: 2006-6-03 1:30 > > To: [email protected] > > Subject: Re: help running 5/31 version of nightly build > > > > The syntax for the crawl command is > > > > Crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN N] > > > > So your first parameter should point to the *directory* > containing the > > file with seed urls, not the file itself. > > > > Please fix your syntax and try again. > > > > Rgrds, Thomas > > > > On 6/3/06, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote: > > > I tried to run the May 31 version of the nightly build but > > it failed. > > > It has something to do with the "job", which I thought > would not be > > > needed > > > if I just need to run on a regular file system. Why does > > Nutch try to > > > use Hadoop in the default configuration? Is it necessary? > > > > > > -kuro > > > > > > $ ./bin/nutch crawl test/thoreau-url.txt -dir > > test/thoreau-index -depth > > > 2 > > > 060602 170942 parsing > _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
