I'm stumped. I have nothing to go on when no death throes or complaints. This hardware for sure is healthy? Other stuff runs w/o issue? St.Ack
On Mon, Mar 7, 2011 at 8:48 AM, M.Deniz OKTAR <[email protected]> wrote: > I don't know if its normal but I see alot of '0's in the test results when > it tends to fail, such as: > > 1196 sec: 7394901 operations; 0 current ops/sec; > > -- > deniz > > On Mon, Mar 7, 2011 at 6:46 PM, M.Deniz OKTAR <[email protected]> wrote: > >> Hi, >> >> Thanks for the effort, answers below: >> >> >> >> >> On Mon, Mar 7, 2011 at 6:08 PM, Stack <[email protected]> wrote: >> >>> On Mon, Mar 7, 2011 at 5:43 AM, M.Deniz OKTAR <[email protected]> >>> wrote: >>> > We have a 5 node cluster, 4 of them being region servers. I am running a >>> > custom workload with YCSB and when the data is loading (heavy insert) at >>> > least one of the region servers are dying after about 600000 operations. >>> >>> >>> Tell us the character of your 'custom workload' please. >>> >>> >> The workload is below, the part that fails is the loading part (-load) >> which inserts all the records first) >> >> recordcount=10000000 >> operationcount=3000000 >> workload=com.yahoo.ycsb.workloads.CoreWorkload >> >> readallfields=true >> >> readproportion=0.5 >> updateproportion=0.1 >> scanproportion=0 >> insertproportion=0.35 >> readmodifywriteproportion=0.05 >> >> requestdistribution=zipfian >> >> >> >> >>> >>> > There are no abnormalities in the logs as far as I can see, the only >>> common >>> > point is that all of them(in different trials, different region servers >>> > fail) request for a flush as the last logs, given below. .out files are >>> > empty. I am looking at the /var/log/hbase folder for logs. Running sun >>> java >>> > 6 latest version. I couldn't find any logs that indicates a problem with >>> > java. Tried the tests with openjdk and had the same results. >>> > >>> >>> Its strange that flush is the last thing in your log. The process is >>> dead? We are exiting w/o a note in logs? Thats unusual. We usually >>> scream loudly when dying. >>> >> >> Yes, thats the strange part. The last line is a flush as if the process >> never failed. Yes, the process is dead and hbase cannot see the node. >> >> >>> >>> > I have set ulimits(50000) and xceivers(20000) for multiple users and >>> certain >>> > that they are correct. >>> >>> The first line in an hbase log prints out the ulimit it sees. You >>> might check that the hbase process for sure is picking up your ulimit >>> setting. >>> >>> That was a mistake I did a couple of days ago, checked it with cat >> /proc/<pid of reginserver>/limits and all related users like 'hbase' has >> those limits. Checked the logs: >> >> Mon Mar 7 06:41:15 EET 2011 Starting regionserver on test-1 >> ulimit -n 52768 >> >>> >>> > Also in the kernel logs, there are no apparent problems. >>> > >>> >>> (The mystery compounds) >>> >>> > 2011-03-07 15:07:58,301 DEBUG >>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction >>> > requested for >>> > usertable,user1030079237,1299502934627.257739740f58da96d5c5ef51a7d3efc3. >>> > because regionserver60020.cacheFlusher; priority=3, compaction queue >>> size=18 >>> > 2011-03-07 15:07:58,301 DEBUG >>> org.apache.hadoop.hbase.regionserver.HRegion: >>> > NOT flushing memstore for region >>> > >>> usertable,user1601881548,1299502135191.f8efb9aa0922fa8a6a53fc49b8155ebc., >>> > flushing=false, writesEnabled=false >>> > 2011-03-07 15:07:58,301 DEBUG >>> org.apache.hadoop.hbase.regionserver.HRegion: >>> > Started memstore flush for >>> > >>> usertable,user1662209069,1299502135191.9fa929e6fb439843cffb604dea3f88f6., >>> > current region memstore size 68.6m >>> > 2011-03-07 15:07:58,310 DEBUG >>> org.apache.hadoop.hbase.regionserver.HRegion: >>> > Flush requested on >>> > usertable,user1601881548,1299502135191.f8efb9aa0922fa8a6a53fc49b8155ebc. >>> > -end of log file- >>> > --- >>> > >>> >>> Nothing more? >>> >>> >> No, nothing after that. But quite a lot of logs before that, I can send >> them if you'd like. >> >> >> >>> Thanks, >>> St.Ack >>> >> >> Thanks alot! >> >> >
