Re: Use experience and performance data of offheap from Alibaba online cluster

Stephen Boesch Sun, 20 Nov 2016 07:03:58 -0800

Your arguments do not reflect direct knowledge of the numbers.  (a) There
is no super-spikiness int he graphs in the data (b) In any case the data is
marked as 9/25/16 not 11/11/16.  (c) The number of internet users says
little about the number of *concurrent* users.


Overall it would be helpful for those who actually collected the data to
comment - not just speculation from someone who does not. As I had
mentioned already there *may* be a huge fanout from number of
user/application queries to the backend: but *huge* it would seemingly need
to be to generate the numbers shown.

2016-11-19 22:39 GMT-08:00 张铎 <[email protected]>:

> 11.11 is something like the Black Friday. Almost every item on Alibaba will
> discount a lot at 11.11. Alibaba earned a 1 billion revenue within 1
> minute(52 seconds) and 10 billion revenue within 7 minutes(6 minutes 58
> seconds) at 11.11. The Chinese people had payed more 120 billion Chinese
> yuan to alibaba at 11.11. And I remember that Jeff Dean used to give a
> slides that for google the amplification from user queries to the storage
> system queries is also very large(I can not remember the exact number. The
> slides is used to explain that hedge read is very useful for reducing
> latency). So I think the peak throughput is true.
>
> There are more than 600 million people in China that use internet. So if
> they decide to do something to your system at the same time, it looks like
> a DDOS to your system...
>
> Thanks.
>
> 2016-11-20 12:56 GMT+08:00 Stephen Boesch <[email protected]>:
>
> > Repeating my earlier question:  20*Meg* queries per second??  Just
> checked
> > and *google* does 40*K* queries per second. Now maybe the "queries" are a
> > decomposition of far fewer end-user queries that cause a fanout of
> backend
> > queries. *But still .. *
> >
> > So maybe please check your numbers again.
> >
> > 2016-11-19 17:05 GMT-08:00 Heng Chen <[email protected]>:
> >
> > > The performance looks great!
> > >
> > > 2016-11-19 18:03 GMT+08:00 Ted Yu <[email protected]>:
> > > > Opening a JIRA would be fine.
> > > > This makes it easier for people to obtain the patch(es).
> > > >
> > > > Cheers
> > > >
> > > >> On Nov 18, 2016, at 11:35 PM, Anoop John <[email protected]>
> > wrote:
> > > >>
> > > >> Because of some compatibility issues, we decide that this will be
> done
> > > >> in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
> > > >> backported patches.  Is it a mega patch at ur end?  Or issue by
> issue
> > > >> patches?  Latter would be best.  Pls share patches in some place
> and a
> > > >> list of issues backported. I can help with verifying the issues once
> > > >> so as to make sure we dont miss any...
> > > >>
> > > >> -Anoop-
> > > >>
> > > >>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar <
> [email protected]>
> > > wrote:
> > > >>> Thanks for sharing this. Great work.
> > > >>>
> > > >>> I don't see any reason why we cannot backport to branch-1.
> > > >>>
> > > >>> Enis
> > > >>>
> > > >>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell <
> > > [email protected]>
> > > >>> wrote:
> > > >>>
> > > >>>> Yes, please, the patches will be useful to the community even if
> we
> > > decide
> > > >>>> not to backport into an official 1.x release.
> > > >>>>
> > > >>>>
> > > >>>>>> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
> > > >>>>> [email protected]> wrote:
> > > >>>>>
> > > >>>>> Is the backported patch available anywhere? Not seeing it on the
> > > >>>> referenced
> > > >>>>> JIRA. If it ends up not getting officially backported to branch-1
> > > due to
> > > >>>>> 2.0 around the corner, some of us who build our own deploy may
> want
> > > to
> > > >>>>> integrate into our builds. Thanks! These numbers look great
> > > >>>>>
> > > >>>>>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John <
> > [email protected]>
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>> Hi Yu Li
> > > >>>>>>              Good to see that the off heap work help you..  The
> > perf
> > > >>>>>> numbers looks great.  So this is a compare of on heap L1 cache
> vs
> > > off
> > > >>>> heap
> > > >>>>>> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2
> off
> > > heap
> > > >>>>>> cache ON by default I believe.  Will raise a jira for that we
> can
> > > >>>> discuss
> > > >>>>>> under that.   Seems like L2 off heap cache for data blocks and
> L1
> > > cache
> > > >>>> for
> > > >>>>>> index blocks seems a right choice.
> > > >>>>>>
> > > >>>>>> Thanks for the backport and the help in testing the feature..
> You
> > > were
> > > >>>>>> able to find some corner case bugs and helped community to fix
> > > them..
> > > >>>>>> Thanks goes to ur whole team.
> > > >>>>>>
> > > >>>>>> -Anoop-
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li <[email protected]>
> > wrote:
> > > >>>>>>>
> > > >>>>>>> Sorry guys, let me retry the inline images:
> > > >>>>>>>
> > > >>>>>>> Performance w/o offheap:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Performance w/ offheap:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Peak Get QPS of one single RS during Singles' Day (11/11):
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> And attach the files in case inline still not working:
> > > >>>>>>>
> > > >>>>>>> Performance_without_offheap.png
> > > >>>>>>> <
> > > >>>>>> https://drive.google.com/file/d/0B017Q40_
> > > F5uwbWEzUGktYVIya3JkcXVjRkFvVG
> > > >>>> NtM0VxWC1n/view?usp=drive_web
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Performance_with_offheap.png
> > > >>>>>>> <
> > > >>>>>> https://drive.google.com/file/d/0B017Q40_
> > > F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
> > > >>>> VrcUdPc2ww/view?usp=drive_web
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Peak_Get_QPS_of_Single_RS.png
> > > >>>>>>> <
> > > >>>>>> https://drive.google.com/file/d/0B017Q40_
> > > F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
> > > >>>> F6bHpNYnJz/view?usp=drive_web
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Best Regards,
> > > >>>>>>> Yu
> > > >>>>>>>
> > > >>>>>>>> On 18 November 2016 at 19:29, Ted Yu <[email protected]>
> > wrote:
> > > >>>>>>>>
> > > >>>>>>>> Yu:
> > > >>>>>>>> With positive results, more hbase users would be asking for
> the
> > > >>>> backport
> > > >>>>>>>> of offheap read path patches.
> > > >>>>>>>>
> > > >>>>>>>> Do you think you or your coworker has the bandwidth to publish
> > > >>>> backport
> > > >>>>>>>> for branch-1 ?
> > > >>>>>>>>
> > > >>>>>>>> Thanks
> > > >>>>>>>>
> > > >>>>>>>>> On Nov 18, 2016, at 12:11 AM, Yu Li <[email protected]>
> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Dear all,
> > > >>>>>>>>>
> > > >>>>>>>>> We have backported read path offheap (HBASE-11425) to our
> > > customized
> > > >>>>>>>> hbase-1.1.2 (thanks @Anoop for the help/support) and run it
> > > online for
> > > >>>>>> more
> > > >>>>>>>> than a month, and would like to share our experience, for what
> > > it's
> > > >>>>>> worth
> > > >>>>>>>> (smile).
> > > >>>>>>>>>
> > > >>>>>>>>> Generally speaking, we gained a better and more stable
> > > >>>>>>>> throughput/performance with offheap, and below are some
> details:
> > > >>>>>>>>> 1. QPS become more stable with offheap
> > > >>>>>>>>>
> > > >>>>>>>>> Performance w/o offheap:
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Performance w/ offheap:
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> These data come from our online A/B test cluster (with 450
> > > physical
> > > >>>>>>>> machines, and each with 256G memory + 64 core) with real world
> > > >>>>>> workloads,
> > > >>>>>>>> it shows using offheap we could gain a more stable throughput
> as
> > > well
> > > >>>> as
> > > >>>>>>>> better performance
> > > >>>>>>>>>
> > > >>>>>>>>> Not showing fully online data here because for online we
> > > published
> > > >>>> the
> > > >>>>>>>> version with both offheap and NettyRpcServer together, so no
> > > >>>> standalone
> > > >>>>>>>> comparison data for offheap
> > > >>>>>>>>>
> > > >>>>>>>>> 2. Full GC frequency and cost
> > > >>>>>>>>>
> > > >>>>>>>>> Average Full GC STW time reduce from 11s to 7s with offheap.
> > > >>>>>>>>>
> > > >>>>>>>>> 3. Young GC frequency and cost
> > > >>>>>>>>>
> > > >>>>>>>>> No performance degradation observed with offheap.
> > > >>>>>>>>>
> > > >>>>>>>>> 4. Peak throughput of one single RS
> > > >>>>>>>>>
> > > >>>>>>>>> On Singles Day (11/11), peak throughput of one single RS
> > reached
> > > >>>> 100K,
> > > >>>>>>>> among which 90K from Get. Plus internet in/out data we could
> > know
> > > the
> > > >>>>>>>> average result size of get request is ~1KB
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Offheap are used on all online machines (more than 1600
> nodes)
> > > >>>> instead
> > > >>>>>>>> of LruCache, so the above QPS is gained from offheap
> > bucketcache,
> > > >>>> along
> > > >>>>>>>> with NettyRpcServer(HBASE-15756).
> > > >>>>>>>>>
> > > >>>>>>>>> Just let us know if any comments. Thanks.
> > > >>>>>>>>>
> > > >>>>>>>>> Best Regards,
> > > >>>>>>>>> Yu
> > > >>>>
> > >
> >
>

Re: Use experience and performance data of offheap from Alibaba online cluster

Reply via email to