Re: Error with Hadoop-0.4.0

2006-07-12 Thread Doug Cutting
Sami Siren wrote: Patch works for me. OK. I just committed it. Thanks! Doug

Re: Error with Hadoop-0.4.0

2006-07-12 Thread Sami Siren
Doug Cutting wrote: Jérôme Charron wrote: > In my environment, the crawl command terminate with the following > error: 2006-07-06 17:41:49,735 ERROR mapred.JobClient > (JobClient.java:submitJob(273)) - Input directory > /localpath/crawl/crawldb/current in local is invalid. Exception in > threa

Re: Error with Hadoop-0.4.0

2006-07-11 Thread Sami Siren
Gal Nitzan wrote: To get the same behavior, just try to inject to a new crawldb that doesn't exist. The reason many doesn't get it is that crawldb already exists in their environment. true, I was injecting to existing crawldb. -- Sami Siren

Re: Error with Hadoop-0.4.0

2006-07-10 Thread Doug Cutting
Jérôme Charron wrote: In my environment, the crawl command terminate with the following error: 2006-07-06 17:41:49,735 ERROR mapred.JobClient (JobClient.java:submitJob(273)) - Input directory /localpath/crawl/crawldb/current in local is invalid. Exception in thread "main" java.io.IOException: I

RE: Error with Hadoop-0.4.0

2006-07-10 Thread Gal Nitzan
nutch-dev@lucene.apache.org Subject: Re: Error with Hadoop-0.4.0 Jérôme Charron wrote: > Hi, > > I encountered some problems with Nutch trunk version. > In fact it seems to be related to changes related to Hadoop-0.4.0 and JDK > 1.5 > (more precisely since HADOOP-129 and File

Re: Error with Hadoop-0.4.0

2006-07-10 Thread Andrzej Bialecki
Stefan Groschupf wrote: We tried your suggested fix: Injector by mergeJob.setInputPath(tempDir) (instead of mergeJob.addInputPath (tempDir)) I suspect that this is not the right solution - have you actually tested that the resulting db contains all entries from the input dirs? -- Best regar

Re: Error with Hadoop-0.4.0

2006-07-10 Thread Andrzej Bialecki
Jérôme Charron wrote: What I suggest, is simply to remove the line 75 in createJob method from CrawlDb : setInputPath(new Path(crawlDb, CrawlDatum.DB_DIR_NAME)); In fact, this method is only used by Injector.inject() and CrawlDb.update() and the inputPath setted in createJob is not needed neit

Re: Error with Hadoop-0.4.0

2006-07-07 Thread Stefan Groschupf
We tried your suggested fix: Injector by mergeJob.setInputPath(tempDir) (instead of mergeJob.addInputPath (tempDir)) and this worked without any problem. Thanks for catching that, this saved us a lot of time. Stefan On 07.07.2006, at 16:08, Jérôme Charron wrote: I have the same problem on a

Re: Error with Hadoop-0.4.0

2006-07-07 Thread Jérôme Charron
I have the same problem on a distribute environment! :-( So I think can confirm this is a bug. Thanks for this feedback Stefan. We should fix that. What I suggest, is simply to remove the line 75 in createJob method from CrawlDb : setInputPath(new Path(crawlDb, CrawlDatum.DB_DIR_NAME)); In

Re: Error with Hadoop-0.4.0

2006-07-07 Thread Stefan Groschupf
Hi Jérôme, I have the same problem on a distribute environment! :-( So I think can confirm this is a bug. We should fix that. Stefan On 06.07.2006, at 08:54, Jérôme Charron wrote: Hi, I encountered some problems with Nutch trunk version. In fact it seems to be related to changes related to H

Re: Error with Hadoop-0.4.0

2006-07-06 Thread Jérôme Charron
> I encountered some problems with Nutch trunk version. > In fact it seems to be related to changes related to Hadoop-0.4.0 and JDK > 1.5 > (more precisely since HADOOP-129 and File replacement by Path). > Does somebody have the same error? I am not seeing this (just run inject on a single machin

Re: Error with Hadoop-0.4.0

2006-07-06 Thread Sami Siren
Jérôme Charron wrote: Hi, I encountered some problems with Nutch trunk version. In fact it seems to be related to changes related to Hadoop-0.4.0 and JDK 1.5 (more precisely since HADOOP-129 and File replacement by Path). Does somebody have the same error? I am not seeing this (just run injec

Error with Hadoop-0.4.0

2006-07-06 Thread Jérôme Charron
Hi, I encountered some problems with Nutch trunk version. In fact it seems to be related to changes related to Hadoop-0.4.0 and JDK 1.5 (more precisely since HADOOP-129 and File replacement by Path). In my environment, the crawl command terminate with the following error: 2006-07-06 17:41:49,735