Re: "Could not get block locations. Aborting..." exception

2008-10-04 Thread Raghu Angadi
https://issues.apache.org/jira/browse/HADOOP-4346 might explain this. Raghu. Bryan Duxbury wrote: Ok, so, what might I do next to try and diagnose this? Does it sound like it might be an HDFS/mapreduce bug, or should I pore over my own code first? Also, did any of the other exceptions look

Re: "Could not get block locations. Aborting..." exception

2008-09-29 Thread Raghu Angadi
Bryan Duxbury wrote: Ok, so, what might I do next to try and diagnose this? Does it sound like it might be an HDFS/mapreduce bug, or should I pore over my own code first? > Also, did any of the other exceptions look interesting? The exceptions closest to the failure time would be most import

Re: "Could not get block locations. Aborting..." exception

2008-09-29 Thread Bryan Duxbury
Ok, so, what might I do next to try and diagnose this? Does it sound like it might be an HDFS/mapreduce bug, or should I pore over my own code first? Also, did any of the other exceptions look interesting? -Bryan On Sep 29, 2008, at 10:40 AM, Raghu Angadi wrote: Raghu Angadi wrote: Doug

Re: "Could not get block locations. Aborting..." exception

2008-09-29 Thread Raghu Angadi
Raghu Angadi wrote: Doug Cutting wrote: Raghu Angadi wrote: For the current implementation, you need around 3x fds. 1024 is too low for Hadoop. The Hadoop requirement will come down, but 1024 would be too low anyway. 1024 is the default on many systems. Shouldn't we try to make the default

Re: "Could not get block locations. Aborting..." exception

2008-09-29 Thread Raghu Angadi
Raghu Angadi wrote: The most interesting one in my eyes is the too many open files one. My ulimit is 1024. How much should it be? I don't think that I have that many files open in my mappers. They should only be operating on a single file at a time. I can try to run the job again and get an

Re: "Could not get block locations. Aborting..." exception

2008-09-29 Thread Raghu Angadi
Doug Cutting wrote: Raghu Angadi wrote: For the current implementation, you need around 3x fds. 1024 is too low for Hadoop. The Hadoop requirement will come down, but 1024 would be too low anyway. 1024 is the default on many systems. Shouldn't we try to make the default configuration work w

Re: "Could not get block locations. Aborting..." exception

2008-09-29 Thread Doug Cutting
Raghu Angadi wrote: For the current implementation, you need around 3x fds. 1024 is too low for Hadoop. The Hadoop requirement will come down, but 1024 would be too low anyway. 1024 is the default on many systems. Shouldn't we try to make the default configuration work well there? If not, w

Re: "Could not get block locations. Aborting..." exception

2008-09-29 Thread Raghu Angadi
The most interesting one in my eyes is the too many open files one. My ulimit is 1024. How much should it be? I don't think that I have that many files open in my mappers. They should only be operating on a single file at a time. I can try to run the job again and get an lsof if it would be

Re: "Could not get block locations. Aborting..." exception

2008-09-26 Thread Bryan Duxbury
From: Bryan Duxbury [mailto:[EMAIL PROTECTED] Sent: Fri 9/26/2008 4:36 PM To: core-user@hadoop.apache.org Subject: "Could not get block locations. Aborting..." exception Hey all. We've been running into a very annoying problem pretty frequently lately. We'll be running some

RE: "Could not get block locations. Aborting..." exception

2008-09-26 Thread Hairong Kuang
rom: Bryan Duxbury [mailto:[EMAIL PROTECTED] Sent: Fri 9/26/2008 4:36 PM To: core-user@hadoop.apache.org Subject: "Could not get block locations. Aborting..." exception Hey all. We've been running into a very annoying problem pretty frequently lately. We'll be running some j

"Could not get block locations. Aborting..." exception

2008-09-26 Thread Bryan Duxbury
Hey all. We've been running into a very annoying problem pretty frequently lately. We'll be running some job, for instance a distcp, and it'll be moving along quite nicely, until all of the sudden, it sort of freezes up. It takes a while, and then we'll get an error like this one: attempt