31 version of nightly build]

Teruhiko Kurosaka Fri, 16 Jun 2006 17:23:33 -0700

Hello,
I am still unable to run "nutch crawl", which terminates with "Job
Failed!"
IO Exception.
In an attempt to get more info, I increased the logging level and ran 
"nutch crawl" again.  Now it is clear that nutch is failing in renaming
a file.


2006-06-16 17:04:05,932 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:progress(140)) -
C:/opt/nutch-060614/test/index/segments/20060616170358/crawl_parse/part-
00000:0+62
2006-06-16 17:04:05,948 WARN  mapred.LocalJobRunner
(LocalJobRunner.java:run(119)) - job_4wsxze
java.io.IOException: Couldn't rename
/tmp/hadoop/mapred/local/map_5n5aid/part-0.out
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102)
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
        at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

I am wondering what LocalJobRunner is trying to accomplish.  Anybody?


In addition to this fatal exception, I've seen many occurances of this
exception:
2006-06-16 17:04:05,854 INFO  conf.Configuration
(Configuration.java:loadResource(397)) - parsing
file:/C:/opt/nutch-060614/conf/hadoop-site.xml
2006-06-16 17:04:05,870 DEBUG conf.Configuration
(Configuration.java:<init>(67))
 - java.io.IOException: config()
        at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:67)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:115)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:6
1)
        at
org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:18
1)
        at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:277)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:312)
        at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

 
Is this the cause of the fatal exception?


I am not intend to run hadoop at all, so this hadoop-site.xlm is empty.
It just has
<configuration>
</configuration>



Somebody told me of the following binary package, and this one carwls
fine.
http://68.178.249.66/nutch-admin/nutch-0.8-dev_guiBundle_05_02_06.tar.gz
So, some code chages that were introduced within the last few weeks must
be causing this problem.

-kuro


> -----Original Message-----
> From: Teruhiko Kurosaka [mailto:[EMAIL PROTECTED] 
> Sent: 2006-6-05 14:25
> To: [email protected]
> Subject: RE: "Target /tmp/.../map_ynynnj.out already exists" 
> error [RE: help running 5/31 version of nightly build]
> 
> Additional info: I tried this with "-threads 1" and I still 
> got the same
> error.
> 
> 
> > -----Original Message-----
> > From: Teruhiko Kurosaka 
> > Sent: 2006-6-05 10:45
> > To: '[email protected]'; 'TDLN'
> > Subject: "Target /tmp/.../map_ynynnj.out already exists" error [RE:
> help running 5/31 version of nightly build]
> > 
> > Thank you, Thomas.  That's a small change in 0.8 that I overlooked.
> > Nutch crawl now progresses to a further step.
> > But it still stalls with an IOException, like show below.  
> > Any further insight?
> > (I re-ran the same command after removing the tmp directory 
> > and the index directory,
> > but I hit the same exception.) 
> > 
> > -kuro
> > 
> > $ ./bin/nutch crawl test/urls -dir test/thoreau-index -depth 
> > 2 2>&1 | tee crawl-thoreau-060605-log.txt
> > 060605 103451 Running job: job_yaocyb
> > 060605 103451 
> > C:/opt/nutch-060531/test/thoreau-index/crawldb/current/part-00
> > 000/data:0+125
> > 060605 103451 
> > C:/opt/nutch-060531/test/thoreau-index/segments/20060605103443
> > /crawl_fetch/part-00000/data:0+141
> > 060605 103451 
> > C:/opt/nutch-060531/test/thoreau-index/segments/20060605103443
> > /crawl_parse/part-00000:0+748
> > 060605 103451 job_yaocyb
> > java.io.IOException: Target 
> > /tmp/hadoop/mapred/local/reduce_yv2ar3/map_ynynnj.out already exists
> >         at 
> org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
> >         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
> >         at 
> > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem
> > .java:191)
> >         at 
> org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
> >         at 
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner
> > .java:101)
> > java.io.IOException: Job failed!
> >         at 
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
> >         at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
> >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)
> > Exception in thread "main"
> > 
> > 
> > > -----Original Message-----
> > > From: TDLN [mailto:[EMAIL PROTECTED] 
> > > Sent: 2006-6-03 1:30
> > > To: [email protected]
> > > Subject: Re: help running 5/31 version of nightly build
> > > 
> > > The syntax for the crawl command is
> > > 
> > > Crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN N]
> > > 
> > > So your first parameter should point to the *directory* 
> > containing the
> > > file with seed urls, not the file itself.
> > > 
> > > Please fix your syntax and try again.
> > > 
> > > Rgrds, Thomas
> > > 
> > > On 6/3/06, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote:
> > > > I tried to run the May 31 version of the nightly build but 
> > > it failed.
> > > > It has something to do with the "job", which I thought 
> > would not be
> > > > needed
> > > > if I just need to run on a regular file system.  Why does 
> > > Nutch try to
> > > > use Hadoop in the default configuration? Is it necessary?
> > > >
> > > > -kuro
> > > >
> > > > $ ./bin/nutch crawl test/thoreau-url.txt -dir 
> > > test/thoreau-index -depth
> > > > 2
> > > > 060602 170942 parsing
> > 
> 


_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] "Target /tmp/.../map_ynynnj.out already exists" error [RE: help running 5/31 version of nightly build]

Reply via email to