Viral: Did you use YCSB or LoadTestTool ? Was the load spread relatively evenly across your servers ?
Thanks On Fri, Feb 15, 2013 at 9:19 PM, Viral Bajaria <[email protected]>wrote: > Yeah I noticed very high latency around the time of slow response, > basically my client timed out for those requests. I have pre-split the > table into 128 regions. Unfortunately I didn't have ganglia installed, I > will install ganglia on those boxes and run the perf again and post the > results. > > Regarding the I/O wait, the timeout only happened on one box or that's what > I saw in the logs. When I run the test again with ganglia on, I will verify > if it only happens on one node. > > Thanks, > Viral > > On Fri, Feb 15, 2013 at 8:09 PM, Kevin O'dell <[email protected] > >wrote: > > > If you take a look at sar from 2013-02-16 on > > 10.149.10.10<http://10.149.10.10:41017/> do > > you see any major I/O wait, swapping, or anything out of the norm? Is > this > > occurring on all three region servers? When the perf test is running can > > you verify you are writing to all three nodes? > > > > On Fri, Feb 15, 2013 at 11:03 PM, Ted Yu <[email protected]> wrote: > > > > > The slow response took about 1.5 minutes. During this period, did you > > > observe high latency ? > > > > > > If you have Ganglia installed on master / NN node, do you observe > > abnormal > > > spike ? > > > > > > BTW did you presplit your table ? > > > > > > Thanks > > > > > > On Fri, Feb 15, 2013 at 7:14 PM, Viral Bajaria < > [email protected] > > > >wrote: > > > > > > > Hi, > > > > > > > > (using hbase 0.94.4 and hadoop 1.0.4) > > > > > > > > I have been seeing a lot of the following WARN in my logs: > > > > > > > > 2013-02-16 02:37:11,409 DEBUG > > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=25.18 > MB, > > > > free=2.97 GB, max=3 GB, blocks=1, accesses=52, hits=51, > > hitRatio=98.07%, > > > , > > > > cachingAccesses=52, cachingHits=51, cachingHitsRatio=98.07%, , > > > evictions=0, > > > > evicted=0, evictedPerRun=NaN > > > > 2013-02-16 02:37:33,368 WARN org.apache.hadoop.ipc.HBaseServer: > > > > (responseTooSlow): > > > > > > > > > > > > > > {"processingtimems":97509,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@1c3308bd > > > > ), > > > > rpc version=1, client version=29, > > > > methodsFingerPrint=-1368823753","client":" > > > > 10.149.10.10:41009 > > > > > > > > > > > > > > ","starttimems":1360982155855,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} > > > > 2013-02-16 02:38:37,377 WARN org.apache.hadoop.ipc.HBaseServer: > > > > (responseTooSlow): > > > > > > > > > > > > > > {"processingtimems":97191,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3eafc7ae > > > > ), > > > > rpc version=1, client version=29, > > > > methodsFingerPrint=-1368823753","client":" > > > > 10.149.10.10:41014 > > > > > > > > > > > > > > ","starttimems":1360982220183,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} > > > > 2013-02-16 02:39:29,842 WARN org.apache.hadoop.ipc.HBaseServer: > > > > (responseTooSlow): > > > > > > > > > > > > > > {"processingtimems":85300,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3d615428 > > > > ), > > > > rpc version=1, client version=29, > > > > methodsFingerPrint=-1368823753","client":" > > > > 10.149.10.10:41017 > > > > > > > > > > > > > > ","starttimems":1360982284538,"queuetimems":1,"class":"HRegionServer","responsesize":0,"method":"multi"} > > > > > > > > It's strange because this is a new hbase setup with almost no traffic > > on > > > > it. I am running a perf test and would not expect this to happen. The > > > > regionservers have 12GB heap space and are only using 1GB when that > > error > > > > happens. I just pushed close to 33K rows via a batch and I see the > > > > responseTooSlow. > > > > > > > > I enabled GC logging, but I don't see any GC lockups, and each GC > > attempt > > > > is only taking a few 100 ms. > > > > > > > > What else could be happening here, any pointers on debugging ? My > setup > > > is > > > > 1 Master running with 1 NN (on the same server) with 3 regionservers > > > > running alongside the datanodes. > > > > > > > > Thanks, > > > > Viral > > > > > > > > > > > > > > > -- > > Kevin O'Dell > > Customer Operations Engineer, Cloudera > > >
