Re: ERROR datanode.DataNode - DatanodeRegistration ... BlockAlreadyExistsException

2009-10-18 Thread Jesse Hires
I verified on both of the datanodes, the only nutch processes running are
one instance each of:
org.apache.hadoop.hdfs.server.datanode.DataNode
and
org.apache.hadoop.mapred.TaskTracker



Jesse

int GetRandomNumber()
{
   return 4; // Chosen by fair roll of dice
// Guaranteed to be random
} // xkcd.com



On Sat, Oct 17, 2009 at 11:49 AM, Andrzej Bialecki  wrote:

> Jesse Hires wrote:
>
>> Does anyone have any insight into the following error I am seeing in the
>> hadoop logs? Is this something I should be concerned with, or is it
>> expected
>> that this shows up in the logs from time to time? If it is not expected,
>> where can I look for more information on what is going on?
>>
>> 2009-10-16 17:02:43,061 ERROR datanode.DataNode -
>> DatanodeRegistration(192.168.1.7:50010,
>> storageID=DS-1226842861-192.168.1.7-50010-1254609174303,
>> infoPort=50075, ipcPort=50020):DataXceiver
>> org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException:
>> Block blk_90983736382565_3277 is valid, and cannot be written to.
>>
>
> Are you sure you are running a single datanode process per machine?
>
>
> --
> Best regards,
> Andrzej Bialecki <><
>  ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


Nutch indexer failing

2009-10-18 Thread Magnús Skúlason
Hi,
I am getting the following exception when indexing (right after adding
segments):
Exception in thread "main"
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
/home/user/nutch/crawl/indexes already exists
at
org.apache.hadoop.mapred.OutputFormatBase.checkOutputSpecs(OutputFormatBase.java:96)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:273)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:134)

The strange thing is that it only happens some times (like every second time
or something like that), and before starting the crawler I delete the folder
/home/user/nutch/crawl

Is there anyone that knows what can be happening here and how I can fix it?
I am on a one year old nutch 0.9 and the problem just started recently.

best regards,
Magnus