Hello,
I am still unable to run "nutch crawl", which terminates with "Job
Failed!"
IO Exception.
In an attempt to get more info, I increased the logging level and ran
"nutch crawl" again. Now it is clear that nutch is failing in renaming
a file.
2006-06-16 17:04:05,932 INFO mapred.LocalJobRunner
(LocalJobRunner.java:progress(140)) -
C:/opt/nutch-060614/test/index/segments/20060616170358/crawl_parse/part-
00000:0+62
2006-06-16 17:04:05,948 WARN mapred.LocalJobRunner
(LocalJobRunner.java:run(119)) - job_4wsxze
java.io.IOException: Couldn't rename
/tmp/hadoop/mapred/local/map_5n5aid/part-0.out
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)
I am wondering what LocalJobRunner is trying to accomplish. Anybody?
In addition to this fatal exception, I've seen many occurances of this
exception:
2006-06-16 17:04:05,854 INFO conf.Configuration
(Configuration.java:loadResource(397)) - parsing
file:/C:/opt/nutch-060614/conf/hadoop-site.xml
2006-06-16 17:04:05,870 DEBUG conf.Configuration
(Configuration.java:<init>(67))
- java.io.IOException: config()
at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:67)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:115)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:6
1)
at
org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:18
1)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:277)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:312)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)
Is this the cause of the fatal exception?
I am not intend to run hadoop at all, so this hadoop-site.xlm is empty.
It just has
<configuration>
</configuration>
Somebody told me of the following binary package, and this one carwls
fine.
http://68.178.249.66/nutch-admin/nutch-0.8-dev_guiBundle_05_02_06.tar.gz
So, some code chages that were introduced within the last few weeks must
be causing this problem.
-kuro
> -----Original Message-----
> From: Teruhiko Kurosaka [mailto:[EMAIL PROTECTED]
> Sent: 2006-6-05 14:25
> To: [email protected]
> Subject: RE: "Target /tmp/.../map_ynynnj.out already exists"
> error [RE: help running 5/31 version of nightly build]
>
> Additional info: I tried this with "-threads 1" and I still
> got the same
> error.
>
>
> > -----Original Message-----
> > From: Teruhiko Kurosaka
> > Sent: 2006-6-05 10:45
> > To: '[email protected]'; 'TDLN'
> > Subject: "Target /tmp/.../map_ynynnj.out already exists" error [RE:
> help running 5/31 version of nightly build]
> >
> > Thank you, Thomas. That's a small change in 0.8 that I overlooked.
> > Nutch crawl now progresses to a further step.
> > But it still stalls with an IOException, like show below.
> > Any further insight?
> > (I re-ran the same command after removing the tmp directory
> > and the index directory,
> > but I hit the same exception.)
> >
> > -kuro
> >
> > $ ./bin/nutch crawl test/urls -dir test/thoreau-index -depth
> > 2 2>&1 | tee crawl-thoreau-060605-log.txt
> > 060605 103451 Running job: job_yaocyb
> > 060605 103451
> > C:/opt/nutch-060531/test/thoreau-index/crawldb/current/part-00
> > 000/data:0+125
> > 060605 103451
> > C:/opt/nutch-060531/test/thoreau-index/segments/20060605103443
> > /crawl_fetch/part-00000/data:0+141
> > 060605 103451
> > C:/opt/nutch-060531/test/thoreau-index/segments/20060605103443
> > /crawl_parse/part-00000:0+748
> > 060605 103451 job_yaocyb
> > java.io.IOException: Target
> > /tmp/hadoop/mapred/local/reduce_yv2ar3/map_ynynnj.out already exists
> > at
> org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
> > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
> > at
> > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem
> > .java:191)
> > at
> org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
> > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner
> > .java:101)
> > java.io.IOException: Job failed!
> > at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
> > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)
> > Exception in thread "main"
> >
> >
> > > -----Original Message-----
> > > From: TDLN [mailto:[EMAIL PROTECTED]
> > > Sent: 2006-6-03 1:30
> > > To: [email protected]
> > > Subject: Re: help running 5/31 version of nightly build
> > >
> > > The syntax for the crawl command is
> > >
> > > Crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN N]
> > >
> > > So your first parameter should point to the *directory*
> > containing the
> > > file with seed urls, not the file itself.
> > >
> > > Please fix your syntax and try again.
> > >
> > > Rgrds, Thomas
> > >
> > > On 6/3/06, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote:
> > > > I tried to run the May 31 version of the nightly build but
> > > it failed.
> > > > It has something to do with the "job", which I thought
> > would not be
> > > > needed
> > > > if I just need to run on a regular file system. Why does
> > > Nutch try to
> > > > use Hadoop in the default configuration? Is it necessary?
> > > >
> > > > -kuro
> > > >
> > > > $ ./bin/nutch crawl test/thoreau-url.txt -dir
> > > test/thoreau-index -depth
> > > > 2
> > > > 060602 170942 parsing
> >
>
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general