31 version of nightly build]

Teruhiko Kurosaka Mon, 05 Jun 2006 10:45:49 -0700

Thank you, Thomas.  That's a small change in 0.8 that I overlooked.
Nutch crawl now progresses to a further step.
But it still stalls with an IOException, like show below.  Any further
insight?
(I re-ran the same command after removing the tmp directory and the
index directory,
but I hit the same exception.)


-kuro

$ ./bin/nutch crawl test/urls -dir test/thoreau-index -depth 2 2>&1 |
tee crawl-thoreau-060605-log.txt
060605 103451 Running job: job_yaocyb
060605 103451
C:/opt/nutch-060531/test/thoreau-index/crawldb/current/part-00000/data:0
+125
060605 103451
C:/opt/nutch-060531/test/thoreau-index/segments/20060605103443/crawl_fet
ch/part-00000/data:0+141
060605 103451
C:/opt/nutch-060531/test/thoreau-index/segments/20060605103443/crawl_par
se/part-00000:0+748
060605 103451 job_yaocyb
java.io.IOException: Target
/tmp/hadoop/mapred/local/reduce_yv2ar3/map_ynynnj.out already exists
        at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
        at
org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
        at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
        at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)
Exception in thread "main"


> -----Original Message-----
> From: TDLN [mailto:[EMAIL PROTECTED] 
> Sent: 2006-6-03 1:30
> To: [email protected]
> Subject: Re: help running 5/31 version of nightly build
> 
> The syntax for the crawl command is
> 
> Crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN N]
> 
> So your first parameter should point to the *directory* containing the
> file with seed urls, not the file itself.
> 
> Please fix your syntax and try again.
> 
> Rgrds, Thomas
> 
> On 6/3/06, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote:
> > I tried to run the May 31 version of the nightly build but 
> it failed.
> > It has something to do with the "job", which I thought would not be
> > needed
> > if I just need to run on a regular file system.  Why does 
> Nutch try to
> > use Hadoop in the default configuration? Is it necessary?
> >
> > -kuro
> >
> > $ ./bin/nutch crawl test/thoreau-url.txt -dir 
> test/thoreau-index -depth
> > 2
> > 060602 170942 parsing


_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] "Target /tmp/.../map_ynynnj.out already exists" error [RE: help running 5/31 version of nightly build]

Reply via email to