Additional info: I tried this with "-threads 1" and I still got the same
error.


> -----Original Message-----
> From: Teruhiko Kurosaka 
> Sent: 2006-6-05 10:45
> To: '[email protected]'; 'TDLN'
> Subject: "Target /tmp/.../map_ynynnj.out already exists" error [RE:
help running 5/31 version of nightly build]
> 
> Thank you, Thomas.  That's a small change in 0.8 that I overlooked.
> Nutch crawl now progresses to a further step.
> But it still stalls with an IOException, like show below.  
> Any further insight?
> (I re-ran the same command after removing the tmp directory 
> and the index directory,
> but I hit the same exception.) 
> 
> -kuro
> 
> $ ./bin/nutch crawl test/urls -dir test/thoreau-index -depth 
> 2 2>&1 | tee crawl-thoreau-060605-log.txt
> 060605 103451 Running job: job_yaocyb
> 060605 103451 
> C:/opt/nutch-060531/test/thoreau-index/crawldb/current/part-00
> 000/data:0+125
> 060605 103451 
> C:/opt/nutch-060531/test/thoreau-index/segments/20060605103443
> /crawl_fetch/part-00000/data:0+141
> 060605 103451 
> C:/opt/nutch-060531/test/thoreau-index/segments/20060605103443
> /crawl_parse/part-00000:0+748
> 060605 103451 job_yaocyb
> java.io.IOException: Target 
> /tmp/hadoop/mapred/local/reduce_yv2ar3/map_ynynnj.out already exists
>         at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
>         at 
> org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem
> .java:191)
>         at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner
> .java:101)
> java.io.IOException: Job failed!
>         at 
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
>         at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)
> Exception in thread "main"
> 
> 
> > -----Original Message-----
> > From: TDLN [mailto:[EMAIL PROTECTED] 
> > Sent: 2006-6-03 1:30
> > To: [email protected]
> > Subject: Re: help running 5/31 version of nightly build
> > 
> > The syntax for the crawl command is
> > 
> > Crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN N]
> > 
> > So your first parameter should point to the *directory* 
> containing the
> > file with seed urls, not the file itself.
> > 
> > Please fix your syntax and try again.
> > 
> > Rgrds, Thomas
> > 
> > On 6/3/06, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote:
> > > I tried to run the May 31 version of the nightly build but 
> > it failed.
> > > It has something to do with the "job", which I thought 
> would not be
> > > needed
> > > if I just need to run on a regular file system.  Why does 
> > Nutch try to
> > > use Hadoop in the default configuration? Is it necessary?
> > >
> > > -kuro
> > >
> > > $ ./bin/nutch crawl test/thoreau-url.txt -dir 
> > test/thoreau-index -depth
> > > 2
> > > 060602 170942 parsing
> 


_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to