This is the error I keep getting whenever I try to fetch more than
400K files at a time using a 4 node hadoop cluster running nutch 1.0.
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed
to create file /user/hadoop/crawl/segments/20091013161641/crawl_fetch/
part-00015/index for DFSClient_attempt_200910131302_0011_r_000015_2 on
client 192.168.1.201 because current leaseholder is trying to recreate
file.
Can anybody shed some light on this issue? I was under the impression
that 400K was small potatoes for a nutch hadoop combo?
Thanks,
Eric Osgood
---------------------------------------------
Cal Poly - Computer Engineering, Moon Valley Software
---------------------------------------------
eosg...@calpoly.edu, e...@lakemeadonline.com
---------------------------------------------
www.calpoly.edu/~eosgood, www.lakemeadonline.com