Ok, thanks for the answer. Looks like will have to queue them up. Cheers!
Chris On 13 July 2011 15:50, Markus Jelsma <[email protected]> wrote: > You're running locally? You cannot run multiple Nutch' locally with each > sharing the same /tmp/ directory: change /tmp/ per crawl or run on Hadoop > or > run in sequence if you can live with it. > > On Wednesday 13 July 2011 16:38:04 Chris Alexander wrote: > > Hi again, > > > > Continuing my investigations into nutch, I attempted running two nutch > > whole-web crawls against two different target URL sets simultaneously and > > with different crawl directories. All seemed to be going very well until > > the exception below appeared in one of the threads. It looks like > > something under the hood is using some lock files that seem to be > > overlapping. Is it possible to run two nutch instances side by side, or > > would it be a better architecture to prefer to have a single instance of > > the script running and have it pick up updates to the URLs it has to > crawl > > (e.g. the user specifying new top-level URLs to crawl). > > > > Cheers > > > > Chris > > > > > > Exception in thread "main" java.io.FileNotFoundException: File > > file:/tmp/hadoop-root/mapred/system/job_local_0001/job.xml does not > exist. > > at > > > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.ja > > va:361) at > > > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:2 > > 45) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192) > > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) > > at > > > org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:6 > > 1) at > > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1197) > > at > > > org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:92) > > at > > > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:373) > > at > > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800) > > at > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) > > at > org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:156) > > at org.apache.nutch.parse.ParseSegment.run(ParseSegment.java:177) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:163) > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >

