bq. there's almost no activity on either side

During this period, can you capture stack trace for the region server and
pastebin the stack ?

Cheers

On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams <[email protected]>
wrote:

> Hi, everyone.  Here's a strange one, at least to me.
>
> I'm doing some performance profiling, and as a rudimentary test I've
> been using YCSB to drive HBase (originally 0.98.3, recently updated to
> 0.98.6.)  The problem happens on a few different instance sizes, but
> this is probably the closest comparison...
>
> On m3.2xlarge instances, works as expected.
> On c3.2xlarge instances, HBase barely responds at all during workloads
> that involve read activity, falling silent for ~62 second intervals,
> with the YCSB throughput output resembling:
>
>  0 sec: 0 operations;
>  2 sec: 918 operations; 459 current ops/sec; [UPDATE
> AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26]
>  4 sec: 918 operations; 0 current ops/sec;
>  6 sec: 918 operations; 0 current ops/sec;
> <snip>
>  62 sec: 918 operations; 0 current ops/sec;
>  64 sec: 5302 operations; 2192 current ops/sec; [UPDATE
> AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56]
>  66 sec: 5302 operations; 0 current ops/sec;
>  68 sec: 5302 operations; 0 current ops/sec;
> (And so on...)
>
> While that happens there's almost no activity on either side, the CPU's
> and disks are idle, no iowait at all.
>
> There isn't much that jumps out at me when digging through the Hadoop
> and HBase logs, except that those 62-second intervals are often (but
> note always) associated with ClosedChannelExceptions in the regionserver
> logs.  But I believe that's just HBase finding that a TCP connection it
> wants to reply on had been closed.
>
> As far as I've seen this happens every time on this or any of the larger
> c3 class of instances, surprisingly.  The m3 instance class sizes all
> seem to work fine.  These are built with a custom AMI that has HBase and
> all installed, and run via a script, so the different instance type
> should be the only difference between them.
>
> Anyone seen anything like this?  Any pointers as to what I could look at
> to help diagnose this odd problem?  Could there be something I'm
> overlooking in the logs?
>
> Thanks!
>
> -- Josh
>
>
>

Reply via email to