OK I now have a related problem: I don't think the file conf/hadoop-site.xml is being read at all! I've altered the hadoop.tmp.dir property below, but the tmp files are still going in the default location. I suspect the other property is not being set either; hence it's still running with speculative execution on.
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.temp.dir</name> <value>/home/username/tmp/hadoop-username/</value> <description></description> </property> <property> <name>mapred.speculative.execution</name> <value>false</value> <description>If true, then multiple instances of some map tasks may be executed in parallel.</description> </property> </configuration> On 1/9/08, Dennis Kubes <[EMAIL PROTECTED]> wrote: > > Are you running with speculative execution on? > > Dennis > > Iwan Cornelius wrote: > > Hi Susam, > > > > I get this error for both cases 1 and 2. > > > > I think it's due to running hadoop in local mode (ie single machine). It > > seems it's always giving a jobid of 1. I've been using only a single > thread > > so i'm not sure why this is; then again I don't really understand how > the > > whole nutch/hadoop system works ... > > > > The weird thing is, sometimes the script (both yours and bin/nutch) will > run > > all the way through, sometimes for 1 or 2 "depths" of a crawl, sometimes > > for the injecting of urls. It's seemingly random. > > > > I've found nothing online to help out. > > > > Thanks Susam! > > > > On 1/9/08, Susam Pal <[EMAIL PROTECTED]> wrote: > >> I haven't really worked with the latest trunk. But I am wondering if > ... > >> > >> 1. you get this error when you kill a crawl while it is running, i.e. > >> the unfinished crawl is killed and then start a new crawl > >> > >> 2. you get this error when you crawl using 'bin/nutch crawl' command > >> as well as the crawl script? > >> > >> Regards, > >> Susam Pal > >> > >>> Hi there, > >>> > >>> I'm having problems running the latest release of nutch. I get the > >> following > >>> error when I try to crawl: > >>> > >>> Fetcher: segment: crawl/segments/20080109183955 > >>> Fetcher: java.io.IOException: Target > >>> /tmp/hadoop-me/mapred/local/localRunner/job_local_1.xml already exists > >>> at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246) > >>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125) > >>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116) > >>> at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile( > >>> LocalFileSystem.java:55) > >>> at org.apache.hadoop.fs.FileSystem.copyToLocalFile( > >> FileSystem.java > >>> :834) > >>> at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>( > >>> LocalJobRunner.java:86) > >>> at org.apache.hadoop.mapred.LocalJobRunner.submitJob( > >>> LocalJobRunner.java:281) > >>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java > >> :558) > >>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java > :753) > >>> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:526) > >>> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:561) > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54) > >>> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:533) > >>> > >>> If I manually remove the offending directory it works... sometimes. > >>> > >>> Any help is appreciated. > >>> > >>> Regards, > >>> IWan > >>> > > >
