Re: ERROR: current leaseholder is trying to recreate file.

Eric Osgood Tue, 20 Oct 2009 15:41:34 -0700

Andrzej,

I just downloaded the most recent trunk from svn as per yourrecommendations for fixing the generate bug. As soon I have it allrebuilt with my configs I will let you know how a crawl of ~1.6mlnpages goes. Hopefully no errors!


Thanks,

Eric

On Oct 20, 2009, at 2:13 PM, Andrzej Bialecki wrote:

Eric Osgood wrote:
This is the error I keep getting whenever I try to fetch more than400K files at a time using a 4 node hadoop cluster running nutch 1.0.org.apache.hadoop.ipc.RemoteException:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:failed to create file /user/hadoop/crawl/segments/20091013161641/crawl_fetch/part-00015/index forDFSClient_attempt_200910131302_0011_r_000015_2 on client192.168.1.201 because current leaseholder is trying to recreate file.
Please see this issue:

https://issues.apache.org/jira/browse/NUTCH-692
Apply the patch that is attached there, rebuild Nutch, and tell meif this fixes your problem.
(the patch will be applied to trunk anyway, since others confirmedthat it fixes this issue).
Can anybody shed some light on this issue? I was under theimpression that 400K was small potatoes for a nutch hadoop combo?
It is. This problem is rare - I think I crawled cumulatively ~500mlnpages in various configs and it didn't occur to me personally. Itrequires a few things to go wrong (see the issue comments).
--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Eric Osgood
---------------------------------------------
Cal Poly - Computer Engineering, Moon Valley Software
---------------------------------------------
eosg...@calpoly.edu, e...@lakemeadonline.com
---------------------------------------------
www.calpoly.edu/~eosgood, www.lakemeadonline.com

Re: ERROR: current leaseholder is trying to recreate file.

Reply via email to