Re: Problem running latest nutch release

Iwan Cornelius Sun, 13 Jan 2008 18:06:40 -0800

OK I now have a related problem:

I don't think the file conf/hadoop-site.xml is being read at all! I've
altered the hadoop.tmp.dir property below, but the tmp files are still going
in the default location.  I suspect the other property is not being set
either; hence it's still running with speculative execution on.


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
<name>hadoop.temp.dir</name>
<value>/home/username/tmp/hadoop-username/</value>
<description></description>
</property>

<property>
<name>mapred.speculative.execution</name>
<value>false</value>
<description>If true, then multiple instances of some map tasks may be
executed in parallel.</description>
</property>

</configuration>



On 1/9/08, Dennis Kubes <[EMAIL PROTECTED]> wrote:
>
> Are you running with speculative execution on?
>
> Dennis
>
> Iwan Cornelius wrote:
> > Hi Susam,
> >
> > I get this error for both cases 1 and 2.
> >
> > I think it's due to running hadoop in local mode (ie single machine). It
> > seems it's always giving a jobid of 1. I've been using only a single
> thread
> > so i'm not sure why this is; then again I don't really understand how
> the
> > whole nutch/hadoop system works ...
> >
> > The weird thing is, sometimes the script (both yours and bin/nutch) will
> run
> > all the way through, sometimes for 1 or 2 "depths" of a crawl, sometimes
> > for the  injecting of urls. It's seemingly random.
> >
> > I've found nothing online to help out.
> >
> > Thanks Susam!
> >
> > On 1/9/08, Susam Pal <[EMAIL PROTECTED]> wrote:
> >> I haven't really worked with the latest trunk. But I am wondering if
> ...
> >>
> >> 1. you get this error when you kill a crawl while it is running, i.e.
> >> the unfinished crawl is killed and then start a new crawl
> >>
> >> 2. you get this error when you crawl using 'bin/nutch crawl' command
> >> as well as the crawl script?
> >>
> >> Regards,
> >> Susam Pal
> >>
> >>> Hi there,
> >>>
> >>> I'm having problems running the latest release of nutch. I get the
> >> following
> >>> error when I try to crawl:
> >>>
> >>> Fetcher: segment: crawl/segments/20080109183955
> >>> Fetcher: java.io.IOException: Target
> >>> /tmp/hadoop-me/mapred/local/localRunner/job_local_1.xml already exists
> >>>         at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
> >>>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
> >>>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
> >>>         at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(
> >>> LocalFileSystem.java:55)
> >>>         at org.apache.hadoop.fs.FileSystem.copyToLocalFile(
> >> FileSystem.java
> >>> :834)
> >>>         at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(
> >>> LocalJobRunner.java:86)
> >>>         at org.apache.hadoop.mapred.LocalJobRunner.submitJob(
> >>> LocalJobRunner.java:281)
> >>>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java
> >> :558)
> >>>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java
> :753)
> >>>         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:526)
> >>>         at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:561)
> >>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>>         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
> >>>         at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:533)
> >>>
> >>> If I manually remove the offending directory it works... sometimes.
> >>>
> >>> Any help is appreciated.
> >>>
> >>> Regards,
> >>> IWan
> >>>
> >
>

Re: Problem running latest nutch release

Reply via email to