bq. there's almost no activity on either side During this period, can you capture stack trace for the region server and pastebin the stack ?
Cheers On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams <[email protected]> wrote: > Hi, everyone. Here's a strange one, at least to me. > > I'm doing some performance profiling, and as a rudimentary test I've > been using YCSB to drive HBase (originally 0.98.3, recently updated to > 0.98.6.) The problem happens on a few different instance sizes, but > this is probably the closest comparison... > > On m3.2xlarge instances, works as expected. > On c3.2xlarge instances, HBase barely responds at all during workloads > that involve read activity, falling silent for ~62 second intervals, > with the YCSB throughput output resembling: > > 0 sec: 0 operations; > 2 sec: 918 operations; 459 current ops/sec; [UPDATE > AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26] > 4 sec: 918 operations; 0 current ops/sec; > 6 sec: 918 operations; 0 current ops/sec; > <snip> > 62 sec: 918 operations; 0 current ops/sec; > 64 sec: 5302 operations; 2192 current ops/sec; [UPDATE > AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56] > 66 sec: 5302 operations; 0 current ops/sec; > 68 sec: 5302 operations; 0 current ops/sec; > (And so on...) > > While that happens there's almost no activity on either side, the CPU's > and disks are idle, no iowait at all. > > There isn't much that jumps out at me when digging through the Hadoop > and HBase logs, except that those 62-second intervals are often (but > note always) associated with ClosedChannelExceptions in the regionserver > logs. But I believe that's just HBase finding that a TCP connection it > wants to reply on had been closed. > > As far as I've seen this happens every time on this or any of the larger > c3 class of instances, surprisingly. The m3 instance class sizes all > seem to work fine. These are built with a custom AMI that has HBase and > all installed, and run via a script, so the different instance type > should be the only difference between them. > > Anyone seen anything like this? Any pointers as to what I could look at > to help diagnose this odd problem? Could there be something I'm > overlooking in the logs? > > Thanks! > > -- Josh > > >
