Re: Performance oddity between AWS instance sizes

Andrew Purtell Fri, 19 Sep 2014 09:14:47 -0700

Thanks for trying the new client out. Shame about that NPE, I'll look into it.



> On Sep 18, 2014, at 8:43 PM, Josh Williams <[email protected]> wrote:
> 
> Hi Andrew,
> 
> I'll definitely bump up the heap on subsequent tests -- thanks for the
> tip.  It was increased to 8 GB, but that didn't make any difference for
> the older YCSB.
> 
> Using your YCSB branch with the updated HBase client definitely makes a
> difference, however, showing consistent throughput for a little while.
> After a little bit of time, so far under about 5 minutes in the few
> times I ran it, it'll hit a NullPointerException[1] ... but it
> definitely seems to point more at a problem in the older YCSB.
> 
> [1] https://gist.github.com/joshwilliams/0570a3095ad6417ca74f
> 
> Thanks for your help,
> 
> -- Josh
> 
> 
>> On Thu, 2014-09-18 at 15:02 -0700, Andrew Purtell wrote:
>> 1 GB heap is nowhere enough to run if you're tying to test something
>> real (or approximate it with YCSB). Try 4 or 8, anything up to 31 GB,
>> use case dependent. >= 32 GB gives away compressed OOPs and maybe GC
>> issues.
>> 
>> Also, I recently redid the HBase YCSB client in a modern way for >=
>> 0.98. See https://github.com/apurtell/YCSB/tree/new_hbase_client . It
>> performs in an IMHO more useful fashion than the previous for what
>> YCSB is intended, but might need some tuning (haven't tried it on a
>> cluster of significant size). One difference you should see is we
>> won't back up for 30-60 seconds after a bunch of threads flush fat 12+
>> MB write buffers.
>> 
>>> On Thu, Sep 18, 2014 at 2:31 PM, Josh Williams <[email protected]> 
>>> wrote:
>>> Ted,
>>> 
>>> Stack trace, that's definitely a good idea.  Here's one jstack snapshot
>>> from the region server while there's no apparent activity going on:
>>> https://gist.github.com/joshwilliams/4950c1d92382ea7f3160
>>> 
>>> If it's helpful, this is the YCSB side of the equation right around the
>>> same time:
>>> https://gist.github.com/joshwilliams/6fa3623088af9d1446a3
>>> 
>>> 
>>> And Gary,
>>> 
>>> As far as the memory configuration, that's a good question.  Looks like
>>> HBASE_HEAPSIZE isn't set, which I now see has a default of 1GB.  There
>>> isn't any swap configured, and 12G of the memory on the instance is
>>> going to file cache, so there's definitely room to spare.
>>> 
>>> Maybe it'd help if I gave it more room by setting HBASE_HEAPSIZE.
>>> Couldn't hurt to try that now...
>>> 
>>> What's strange is running on m3.xlarge, which also has 15G of RAM but
>>> fewer CPU cores, it runs fine.
>>> 
>>> Thanks to you both for the insight!
>>> 
>>> -- Josh
>>> 
>>> 
>>> 
>>>> On Thu, 2014-09-18 at 11:42 -0700, Gary Helmling wrote:
>>>> What do you have HBASE_HEAPSIZE set to in hbase-env.sh?  Is it
>>>> possible that you're overcommitting memory and the instance is
>>>> swapping?  Just a shot in the dark, but I see that the m3.2xlarge
>>>> instance has 30G of memory vs. 15G for c3.2xlarge.
>>>> 
>>>>> On Wed, Sep 17, 2014 at 3:28 PM, Ted Yu <[email protected]> wrote:
>>>>> bq. there's almost no activity on either side
>>>>> 
>>>>> During this period, can you capture stack trace for the region server and
>>>>> pastebin the stack ?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi, everyone.  Here's a strange one, at least to me.
>>>>>> 
>>>>>> I'm doing some performance profiling, and as a rudimentary test I've
>>>>>> been using YCSB to drive HBase (originally 0.98.3, recently updated to
>>>>>> 0.98.6.)  The problem happens on a few different instance sizes, but
>>>>>> this is probably the closest comparison...
>>>>>> 
>>>>>> On m3.2xlarge instances, works as expected.
>>>>>> On c3.2xlarge instances, HBase barely responds at all during workloads
>>>>>> that involve read activity, falling silent for ~62 second intervals,
>>>>>> with the YCSB throughput output resembling:
>>>>>> 
>>>>>> 0 sec: 0 operations;
>>>>>> 2 sec: 918 operations; 459 current ops/sec; [UPDATE
>>>>>> AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26]
>>>>>> 4 sec: 918 operations; 0 current ops/sec;
>>>>>> 6 sec: 918 operations; 0 current ops/sec;
>>>>>> <snip>
>>>>>> 62 sec: 918 operations; 0 current ops/sec;
>>>>>> 64 sec: 5302 operations; 2192 current ops/sec; [UPDATE
>>>>>> AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56]
>>>>>> 66 sec: 5302 operations; 0 current ops/sec;
>>>>>> 68 sec: 5302 operations; 0 current ops/sec;
>>>>>> (And so on...)
>>>>>> 
>>>>>> While that happens there's almost no activity on either side, the CPU's
>>>>>> and disks are idle, no iowait at all.
>>>>>> 
>>>>>> There isn't much that jumps out at me when digging through the Hadoop
>>>>>> and HBase logs, except that those 62-second intervals are often (but
>>>>>> note always) associated with ClosedChannelExceptions in the regionserver
>>>>>> logs.  But I believe that's just HBase finding that a TCP connection it
>>>>>> wants to reply on had been closed.
>>>>>> 
>>>>>> As far as I've seen this happens every time on this or any of the larger
>>>>>> c3 class of instances, surprisingly.  The m3 instance class sizes all
>>>>>> seem to work fine.  These are built with a custom AMI that has HBase and
>>>>>> all installed, and run via a script, so the different instance type
>>>>>> should be the only difference between them.
>>>>>> 
>>>>>> Anyone seen anything like this?  Any pointers as to what I could look at
>>>>>> to help diagnose this odd problem?  Could there be something I'm
>>>>>> overlooking in the logs?
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> -- Josh
> 
>

Re: Performance oddity between AWS instance sizes

Reply via email to