Hi everyone, We are having this problem for a while and would really appreciate any suggestions.
We have a 5 node cluster, 4 of them being region servers. I am running a custom workload with YCSB and when the data is loading (heavy insert) at least one of the region servers are dying after about 600000 operations. After the regionserver dies, the loading does terminate after a while. There were couple of times when the loading succeed. The second YCSB test that does both reads and updated do not seem to cause any problems. There are no abnormalities in the logs as far as I can see, the only common point is that all of them(in different trials, different region servers fail) request for a flush as the last logs, given below. .out files are empty. I am looking at the /var/log/hbase folder for logs. Running sun java 6 latest version. I couldn't find any logs that indicates a problem with java. Tried the tests with openjdk and had the same results. I have set ulimits(50000) and xceivers(20000) for multiple users and certain that they are correct. I have tried this with different versions but these logs come from 0.90.1 from cloudera beta4, namenode, hmaster and zookeper run on one machine. Also in the kernel logs, there are no apparent problems. The configs of the servers are as follows, they are desktop servers: intel i5 2400 k 16 gb ram 2xdrives Thanks alot! logs: --- 2011-03-07 15:07:58,301 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for usertable,user1030079237,1299502934627.257739740f58da96d5c5ef51a7d3efc3. because regionserver60020.cacheFlusher; priority=3, compaction queue size=18 2011-03-07 15:07:58,301 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region usertable,user1601881548,1299502135191.f8efb9aa0922fa8a6a53fc49b8155ebc., flushing=false, writesEnabled=false 2011-03-07 15:07:58,301 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for usertable,user1662209069,1299502135191.9fa929e6fb439843cffb604dea3f88f6., current region memstore size 68.6m 2011-03-07 15:07:58,310 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on usertable,user1601881548,1299502135191.f8efb9aa0922fa8a6a53fc49b8155ebc. -end of log file- --- Deniz
