Re: Problem running latest nutch release

Iwan Cornelius Tue, 08 Jan 2008 22:41:29 -0800

Hi Susam,

I get this error for both cases 1 and 2.


I think it's due to running hadoop in local mode (ie single machine). It
seems it's always giving a jobid of 1. I've been using only a single thread
so i'm not sure why this is; then again I don't really understand how the
whole nutch/hadoop system works ...

The weird thing is, sometimes the script (both yours and bin/nutch) will run
all the way through, sometimes for 1 or 2 "depths" of a crawl, sometimes
for the  injecting of urls. It's seemingly random.

I've found nothing online to help out.

Thanks Susam!

On 1/9/08, Susam Pal <[EMAIL PROTECTED]> wrote:
>
> I haven't really worked with the latest trunk. But I am wondering if ...
>
> 1. you get this error when you kill a crawl while it is running, i.e.
> the unfinished crawl is killed and then start a new crawl
>
> 2. you get this error when you crawl using 'bin/nutch crawl' command
> as well as the crawl script?
>
> Regards,
> Susam Pal
>
> > Hi there,
> >
> > I'm having problems running the latest release of nutch. I get the
> following
> > error when I try to crawl:
> >
> > Fetcher: segment: crawl/segments/20080109183955
> > Fetcher: java.io.IOException: Target
> > /tmp/hadoop-me/mapred/local/localRunner/job_local_1.xml already exists
> >         at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
> >         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
> >         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
> >         at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(
> > LocalFileSystem.java:55)
> >         at org.apache.hadoop.fs.FileSystem.copyToLocalFile(
> FileSystem.java
> > :834)
> >         at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(
> > LocalJobRunner.java:86)
> >         at org.apache.hadoop.mapred.LocalJobRunner.submitJob(
> > LocalJobRunner.java:281)
> >         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java
> :558)
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
> >         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:526)
> >         at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:561)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
> >         at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:533)
> >
> > If I manually remove the offending directory it works... sometimes.
> >
> > Any help is appreciated.
> >
> > Regards,
> > IWan
> >
>

Re: Problem running latest nutch release

Reply via email to