Re: Use experience and performance data of offheap from Alibaba online cluster

Heng Chen Sat, 19 Nov 2016 17:06:06 -0800

The performance looks great!


2016-11-19 18:03 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
> Opening a JIRA would be fine.
> This makes it easier for people to obtain the patch(es).
>
> Cheers
>
>> On Nov 18, 2016, at 11:35 PM, Anoop John <anoop.hb...@gmail.com> wrote:
>>
>> Because of some compatibility issues, we decide that this will be done
>> in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
>> backported patches.  Is it a mega patch at ur end?  Or issue by issue
>> patches?  Latter would be best.  Pls share patches in some place and a
>> list of issues backported. I can help with verifying the issues once
>> so as to make sure we dont miss any...
>>
>> -Anoop-
>>
>>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar <enis....@gmail.com> wrote:
>>> Thanks for sharing this. Great work.
>>>
>>> I don't see any reason why we cannot backport to branch-1.
>>>
>>> Enis
>>>
>>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell <andrew.purt...@gmail.com>
>>> wrote:
>>>
>>>> Yes, please, the patches will be useful to the community even if we decide
>>>> not to backport into an official 1.x release.
>>>>
>>>>
>>>>>> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
>>>>> bbeaudrea...@hubspot.com> wrote:
>>>>>
>>>>> Is the backported patch available anywhere? Not seeing it on the
>>>> referenced
>>>>> JIRA. If it ends up not getting officially backported to branch-1 due to
>>>>> 2.0 around the corner, some of us who build our own deploy may want to
>>>>> integrate into our builds. Thanks! These numbers look great
>>>>>
>>>>>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John <anoop.hb...@gmail.com>
>>>> wrote:
>>>>>>
>>>>>> Hi Yu Li
>>>>>>              Good to see that the off heap work help you..  The perf
>>>>>> numbers looks great.  So this is a compare of on heap L1 cache vs off
>>>> heap
>>>>>> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
>>>>>> cache ON by default I believe.  Will raise a jira for that we can
>>>> discuss
>>>>>> under that.   Seems like L2 off heap cache for data blocks and L1 cache
>>>> for
>>>>>> index blocks seems a right choice.
>>>>>>
>>>>>> Thanks for the backport and the help in testing the feature..  You were
>>>>>> able to find some corner case bugs and helped community to fix them..
>>>>>> Thanks goes to ur whole team.
>>>>>>
>>>>>> -Anoop-
>>>>>>
>>>>>>
>>>>>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li <car...@gmail.com> wrote:
>>>>>>>
>>>>>>> Sorry guys, let me retry the inline images:
>>>>>>>
>>>>>>> Performance w/o offheap:
>>>>>>>
>>>>>>>
>>>>>>> Performance w/ offheap:
>>>>>>>
>>>>>>>
>>>>>>> Peak Get QPS of one single RS during Singles' Day (11/11):
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> And attach the files in case inline still not working:
>>>>>>>
>>>>>>> Performance_without_offheap.png
>>>>>>> <
>>>>>> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVG
>>>> NtM0VxWC1n/view?usp=drive_web
>>>>>>>
>>>>>>>
>>>>>>> Performance_with_offheap.png
>>>>>>> <
>>>>>> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
>>>> VrcUdPc2ww/view?usp=drive_web
>>>>>>>
>>>>>>>
>>>>>>> Peak_Get_QPS_of_Single_RS.png
>>>>>>> <
>>>>>> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
>>>> F6bHpNYnJz/view?usp=drive_web
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Yu
>>>>>>>
>>>>>>>> On 18 November 2016 at 19:29, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Yu:
>>>>>>>> With positive results, more hbase users would be asking for the
>>>> backport
>>>>>>>> of offheap read path patches.
>>>>>>>>
>>>>>>>> Do you think you or your coworker has the bandwidth to publish
>>>> backport
>>>>>>>> for branch-1 ?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>> On Nov 18, 2016, at 12:11 AM, Yu Li <car...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Dear all,
>>>>>>>>>
>>>>>>>>> We have backported read path offheap (HBASE-11425) to our customized
>>>>>>>> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
>>>>>> more
>>>>>>>> than a month, and would like to share our experience, for what it's
>>>>>> worth
>>>>>>>> (smile).
>>>>>>>>>
>>>>>>>>> Generally speaking, we gained a better and more stable
>>>>>>>> throughput/performance with offheap, and below are some details:
>>>>>>>>> 1. QPS become more stable with offheap
>>>>>>>>>
>>>>>>>>> Performance w/o offheap:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Performance w/ offheap:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> These data come from our online A/B test cluster (with 450 physical
>>>>>>>> machines, and each with 256G memory + 64 core) with real world
>>>>>> workloads,
>>>>>>>> it shows using offheap we could gain a more stable throughput as well
>>>> as
>>>>>>>> better performance
>>>>>>>>>
>>>>>>>>> Not showing fully online data here because for online we published
>>>> the
>>>>>>>> version with both offheap and NettyRpcServer together, so no
>>>> standalone
>>>>>>>> comparison data for offheap
>>>>>>>>>
>>>>>>>>> 2. Full GC frequency and cost
>>>>>>>>>
>>>>>>>>> Average Full GC STW time reduce from 11s to 7s with offheap.
>>>>>>>>>
>>>>>>>>> 3. Young GC frequency and cost
>>>>>>>>>
>>>>>>>>> No performance degradation observed with offheap.
>>>>>>>>>
>>>>>>>>> 4. Peak throughput of one single RS
>>>>>>>>>
>>>>>>>>> On Singles Day (11/11), peak throughput of one single RS reached
>>>> 100K,
>>>>>>>> among which 90K from Get. Plus internet in/out data we could know the
>>>>>>>> average result size of get request is ~1KB
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Offheap are used on all online machines (more than 1600 nodes)
>>>> instead
>>>>>>>> of LruCache, so the above QPS is gained from offheap bucketcache,
>>>> along
>>>>>>>> with NettyRpcServer(HBASE-15756).
>>>>>>>>>
>>>>>>>>> Just let us know if any comments. Thanks.
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Yu
>>>>

Re: Use experience and performance data of offheap from Alibaba online cluster

Reply via email to