Doug Cutting wrote: > Jérôme Charron wrote: > > > In my environment, the crawl command terminate with the following > > error: 2006-07-06 17:41:49,735 ERROR mapred.JobClient > > (JobClient.java:submitJob(273)) - Input directory > > /localpath/crawl/crawldb/current in local is invalid. Exception in > > thread "main" java.io.IOException: Input directory > > /localpathcrawl/crawldb/current in local is invalid. at > > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274) at > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327) at > > org.apache.nutch.crawl.Injector.inject(Injector.java:146) at > > org.apache.nutch.crawl.Crawl.main(Crawl.java:105) > > > Hadoop 0.4.0 by default requires all input directories to exist, > where previous releases did not. So we need to either create an > empty "current" directory or change the InputFormat used in > CrawlDb.createJob() to be one that overrides > InputFormat.areValidInputDirectories(). The former is probably > easier. I've attached a patch. Does this fix things for folks? >
Patch works for me. -- Sami Siren ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
