Hi Jimmy, HBase is built from latest sources of 0.90 branch (0.90.7-SNAPSHOT), I had the same problem with 0.90.4 Hadoop 0.20.2 from Cloudera CDH3u1
This failure happens during large M/R jobs, I have 10 servers and usually no more than 1 would fail like this, sometimes none. One thing worth mentioning is that the table it is trying to write to has over 5000 regions. -eran On Wed, Mar 28, 2012 at 16:17, Jimmy Xiang <[email protected]> wrote: > Which version of HDFS and HBase are you using? > > When the problem happens, can you access the HDFS, for example, from > hadoop dfs? > > Thanks, > Jimmy > > On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner <[email protected]> wrote: > > Hi, > > > > We have region server sporadically stopping under load due supposedly to > > errors writing to HDFS. Things like: > > > > 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error > while > > syncing > > java.io.IOException: All datanodes 10.1.104.10:50010 are bad. Aborting.. > > > > It's happening with a different region server and data node every time, > so > > it's not a problem with one specific server and there doesn't seem to be > > anything really wrong with either of them. I've already increased the > file > > descriptor limit, datanode xceivers and data node handler count. Any idea > > what can be causing these errors? > > > > > > A more complete log is here: http://pastebin.com/wC90xU2x > > > > Thanks. > > > > -eran >
