Hi everyone,

We are having this problem for a while and would really appreciate any
suggestions.

We have a 5 node cluster, 4 of them being region servers. I am running a
custom workload with YCSB and when the data is loading (heavy insert) at
least one of the region servers are dying after about 600000 operations.
After the regionserver dies, the loading does terminate after a while. There
were couple of times when the loading succeed. The second YCSB test that
does both reads and updated do not seem to cause any problems.

There are no abnormalities in the logs as far as I can see, the only common
point is that all of them(in different trials, different region servers
fail) request for a flush as the last logs, given below. .out files are
empty. I am looking at the /var/log/hbase folder for logs. Running sun java
6 latest version. I couldn't find any logs that indicates a problem with
java. Tried the tests with openjdk and had the same results.

I have set ulimits(50000) and xceivers(20000) for multiple users and certain
that they are correct. I have tried this with different versions but these
logs come from 0.90.1 from cloudera beta4, namenode, hmaster and zookeper
run on one machine.

Also in the kernel logs, there are no apparent problems.

The configs of the servers are as follows, they are desktop servers:

intel i5 2400 k
16 gb ram
2xdrives

Thanks alot!

logs:

---
2011-03-07 15:07:58,301 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
requested for
usertable,user1030079237,1299502934627.257739740f58da96d5c5ef51a7d3efc3.
because regionserver60020.cacheFlusher; priority=3, compaction queue size=18
2011-03-07 15:07:58,301 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
NOT flushing memstore for region
usertable,user1601881548,1299502135191.f8efb9aa0922fa8a6a53fc49b8155ebc.,
flushing=false, writesEnabled=false
2011-03-07 15:07:58,301 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for
usertable,user1662209069,1299502135191.9fa929e6fb439843cffb604dea3f88f6.,
current region memstore size 68.6m
2011-03-07 15:07:58,310 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Flush requested on
usertable,user1601881548,1299502135191.f8efb9aa0922fa8a6a53fc49b8155ebc.
-end of log file-
---

Deniz

Reply via email to