Re: Use experience and performance data of offheap from Alibaba online cluster

Yu Li Tue, 22 Nov 2016 20:17:15 -0800

Thanks Andrew, actually a blog is coming soon (smile).

And I've opened HBASE-17138
<https://issues.apache.org/jira/browse/HBASE-17138> for the
backport-to-branch-1 discussion, FWIW.


Best Regards,
Yu

On 22 November 2016 at 22:13, Andrew Purtell <andrew.purt...@gmail.com>
wrote:

> > I hope we could strengthen our faith in HBase capability
>
> Us too. Would you be interested in taking the metrics and discussion of
> them that came out in this thread into a post for the HBase project blog (
> https://blogs.apache.org/hbase)? As you can see from the other blog
> entries details about the use case does not need to reveal proprietary
> information, readers would be most interested in the metrics you
> observed/achieved on 11/11 followed by a technical discussion of how
> (roughly) to replicate them. You have good command of the English language
> so that won't be a problem and anyway I offer my services as editor should
> you like to try. Think about it. This would be a great post. I am sure,
> very popular.
>
>
> > On Nov 22, 2016, at 12:51 AM, Yu Li <car...@gmail.com> wrote:
> >
> > bq. If it were not "confidential" might you mention why there is such a
> > large (several orders of magnitude) explosion of end user queries to
> > backend ones?
> > For index building and online machine learning system, there're more
> > information recorded after each visit/trade, such as user query/click
> > history, item stock updates, etc., and multiple user-specific feature
> data
> > will be read/updated for better recommendation. The flow is pretty much
> > like:
> > user visit some items
> > -> put them into shopping cart
> > -> checkout/removing item from shopping cart
> > -> item stock update/recommend new items to user
> > -> user visit new items
> > Not that much details could be supplied but I believe we could imagine
> how
> > many queries/updates there will be at backend for such loops, right?
> (smile)
> >
> > Thanks again for the interest and questions although a little bit derail
> of
> > the thread, and I hope we could strengthen our faith in HBase capability
> > after these discussions. :-)
> >
> > Best Regards,
> > Yu
> >
> >> On 21 November 2016 at 01:26, Stephen Boesch <java...@gmail.com> wrote:
> >>
> >> Thanks Yu - given your apparent direct knowledge of the data that is
> >> helpful (my response earlier had been to  张铎) .   It is important so as
> to
> >> ensure informing colleagues of numbers that are "real".
> >>
> >> If it were not "confidential" might you mention why there is such a
> large
> >> (several orders of magnitude) explosion of end user queries to backend
> >> ones?
> >>
> >>
> >>
> >> 2016-11-20 7:51 GMT-08:00 Yu Li <car...@gmail.com>:
> >>
> >>> Thanks everyone for the feedback/comments, glad this data means
> something
> >>> and have drawn your interesting. Let me answer the questions (and sorry
> >> for
> >>> the lag)
> >>>
> >>> For the backport patches, ours are based on a customized 1.1.2 version
> >> and
> >>> cannot apply directly for any 1.x branches. It would be easy for us to
> >>> upload existing patches somewhere but obviously not that useful... so
> >> maybe
> >>> we still should get them in branch-1 and officially support read-path
> >>> offheap in future 1.x release? Let me create one JIRA about this and
> >> let's
> >>> discuss in the JIRA system. And to be very clear, it's a big YES to
> share
> >>> our patches with all rather than only numbers, just which way is better
> >>> (smile).
> >>>
> >>> And answers for @Stephen Boesch:
> >>>
> >>> bq. In any case the data is marked as 9/25/16 not 11/11/16
> >>> It's specially noted that the data on 9/25 are from our online A/B test
> >>> cluster, and not showing fully online data because we published offheap
> >>> together with NettyRpcServer for online thus no standalone comparison
> >> data
> >>> for offheap. Please check my original email more carefully (smile).
> >>>
> >>> bq. Repeating my earlier question:  20*Meg* queries per second??  Just
> >>> checked and *google* does 40*K* queries per second.
> >>> As you already noticed, the 20M QPS is number from A/B testing cluster
> >> (450
> >>> nodes), and there're much more on 11/11 online cluster (1600+ nodes).
> >>> Please note that this is NOT some cluster directly serves queries from
> >> end
> >>> user, but serving the index building and online machine learning
> system.
> >>> Refer to our talk on hbasecon2016 (slides
> >>> <http://www.slideshare.net/HBaseCon/improvements-to-
> >> apache-hbase-and-its-
> >>> applications-in-alibaba-search>
> >>> /recording
> >>> <https://www.youtube.com/watch?v=UVGDd2JeIMg&list=PLe-
> h9HrA9qfDVOeNh1l_
> >>> T5HvwvkO9raWy&index=10>)
> >>> for more details, if you're interested. And different from google,
> >> there's
> >>> an obvious "hot spot" for us, so I don't think the QPS of these two
> >>> different systems are comparable.
> >>>
> >>> bq. So maybe please check your numbers again.
> >>> The numbers are got from online monitoring system and all real not
> fake,
> >> so
> >>> no need to check. Maybe just need some more time to take and
> understand?
> >>> (smile)
> >>>
> >>> Best Regards,
> >>> Yu
> >>>
> >>>> On 20 November 2016 at 23:03, Stephen Boesch <java...@gmail.com>
> wrote:
> >>>>
> >>>> Your arguments do not reflect direct knowledge of the numbers.  (a)
> >> There
> >>>> is no super-spikiness int he graphs in the data (b) In any case the
> >> data
> >>> is
> >>>> marked as 9/25/16 not 11/11/16.  (c) The number of internet users says
> >>>> little about the number of *concurrent* users.
> >>>>
> >>>> Overall it would be helpful for those who actually collected the data
> >> to
> >>>> comment - not just speculation from someone who does not. As I had
> >>>> mentioned already there *may* be a huge fanout from number of
> >>>> user/application queries to the backend: but *huge* it would seemingly
> >>> need
> >>>> to be to generate the numbers shown.
> >>>>
> >>>> 2016-11-19 22:39 GMT-08:00 张铎 <palomino...@gmail.com>:
> >>>>
> >>>>> 11.11 is something like the Black Friday. Almost every item on
> >> Alibaba
> >>>> will
> >>>>> discount a lot at 11.11. Alibaba earned a 1 billion revenue within 1
> >>>>> minute(52 seconds) and 10 billion revenue within 7 minutes(6 minutes
> >> 58
> >>>>> seconds) at 11.11. The Chinese people had payed more 120 billion
> >>> Chinese
> >>>>> yuan to alibaba at 11.11. And I remember that Jeff Dean used to give
> >> a
> >>>>> slides that for google the amplification from user queries to the
> >>> storage
> >>>>> system queries is also very large(I can not remember the exact
> >> number.
> >>>> The
> >>>>> slides is used to explain that hedge read is very useful for reducing
> >>>>> latency). So I think the peak throughput is true.
> >>>>>
> >>>>> There are more than 600 million people in China that use internet. So
> >>> if
> >>>>> they decide to do something to your system at the same time, it looks
> >>>> like
> >>>>> a DDOS to your system...
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>> 2016-11-20 12:56 GMT+08:00 Stephen Boesch <java...@gmail.com>:
> >>>>>
> >>>>>> Repeating my earlier question:  20*Meg* queries per second??  Just
> >>>>> checked
> >>>>>> and *google* does 40*K* queries per second. Now maybe the "queries"
> >>>> are a
> >>>>>> decomposition of far fewer end-user queries that cause a fanout of
> >>>>> backend
> >>>>>> queries. *But still .. *
> >>>>>>
> >>>>>> So maybe please check your numbers again.
> >>>>>>
> >>>>>> 2016-11-19 17:05 GMT-08:00 Heng Chen <heng.chen.1...@gmail.com>:
> >>>>>>
> >>>>>>> The performance looks great!
> >>>>>>>
> >>>>>>> 2016-11-19 18:03 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
> >>>>>>>> Opening a JIRA would be fine.
> >>>>>>>> This makes it easier for people to obtain the patch(es).
> >>>>>>>>
> >>>>>>>> Cheers
> >>>>>>>>
> >>>>>>>>> On Nov 18, 2016, at 11:35 PM, Anoop John <
> >> anoop.hb...@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Because of some compatibility issues, we decide that this will
> >>> be
> >>>>> done
> >>>>>>>>> in 2.0 only..  Ya as Andy said, it would be great to share the
> >>> 1.x
> >>>>>>>>> backported patches.  Is it a mega patch at ur end?  Or issue
> >> by
> >>>>> issue
> >>>>>>>>> patches?  Latter would be best.  Pls share patches in some
> >> place
> >>>>> and a
> >>>>>>>>> list of issues backported. I can help with verifying the
> >> issues
> >>>> once
> >>>>>>>>> so as to make sure we dont miss any...
> >>>>>>>>>
> >>>>>>>>> -Anoop-
> >>>>>>>>>
> >>>>>>>>>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar <
> >>>>> enis....@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>> Thanks for sharing this. Great work.
> >>>>>>>>>>
> >>>>>>>>>> I don't see any reason why we cannot backport to branch-1.
> >>>>>>>>>>
> >>>>>>>>>> Enis
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell <
> >>>>>>> andrew.purt...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Yes, please, the patches will be useful to the community
> >> even
> >>> if
> >>>>> we
> >>>>>>> decide
> >>>>>>>>>>> not to backport into an official 1.x release.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
> >>>>>>>>>>>> bbeaudrea...@hubspot.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Is the backported patch available anywhere? Not seeing it
> >> on
> >>>> the
> >>>>>>>>>>> referenced
> >>>>>>>>>>>> JIRA. If it ends up not getting officially backported to
> >>>> branch-1
> >>>>>>> due to
> >>>>>>>>>>>> 2.0 around the corner, some of us who build our own deploy
> >>> may
> >>>>> want
> >>>>>>> to
> >>>>>>>>>>>> integrate into our builds. Thanks! These numbers look great
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John <
> >>>>>> anoop.hb...@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Yu Li
> >>>>>>>>>>>>>             Good to see that the off heap work help you..
> >>>> The
> >>>>>> perf
> >>>>>>>>>>>>> numbers looks great.  So this is a compare of on heap L1
> >>> cache
> >>>>> vs
> >>>>>>> off
> >>>>>>>>>>> heap
> >>>>>>>>>>>>> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make
> >>> L2
> >>>>> off
> >>>>>>> heap
> >>>>>>>>>>>>> cache ON by default I believe.  Will raise a jira for that
> >>> we
> >>>>> can
> >>>>>>>>>>> discuss
> >>>>>>>>>>>>> under that.   Seems like L2 off heap cache for data blocks
> >>> and
> >>>>> L1
> >>>>>>> cache
> >>>>>>>>>>> for
> >>>>>>>>>>>>> index blocks seems a right choice.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for the backport and the help in testing the
> >>> feature..
> >>>>> You
> >>>>>>> were
> >>>>>>>>>>>>> able to find some corner case bugs and helped community to
> >>> fix
> >>>>>>> them..
> >>>>>>>>>>>>> Thanks goes to ur whole team.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Anoop-
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li <
> >> car...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sorry guys, let me retry the inline images:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Performance w/o offheap:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Performance w/ offheap:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Peak Get QPS of one single RS during Singles' Day
> >> (11/11):
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> And attach the files in case inline still not working:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Performance_without_offheap.png
> >>>>>>>>>>>>>> <
> >>>>>>>>>>>>> https://drive.google.com/file/d/0B017Q40_
> >>>>>>> F5uwbWEzUGktYVIya3JkcXVjRkFvVG
> >>>>>>>>>>> NtM0VxWC1n/view?usp=drive_web
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Performance_with_offheap.png
> >>>>>>>>>>>>>> <
> >>>>>>>>>>>>> https://drive.google.com/file/d/0B017Q40_
> >>>>>>> F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
> >>>>>>>>>>> VrcUdPc2ww/view?usp=drive_web
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Peak_Get_QPS_of_Single_RS.png
> >>>>>>>>>>>>>> <
> >>>>>>>>>>>>> https://drive.google.com/file/d/0B017Q40_
> >>>>>>> F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
> >>>>>>>>>>> F6bHpNYnJz/view?usp=drive_web
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>> Yu
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 18 November 2016 at 19:29, Ted Yu <
> >> yuzhih...@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Yu:
> >>>>>>>>>>>>>>> With positive results, more hbase users would be asking
> >>> for
> >>>>> the
> >>>>>>>>>>> backport
> >>>>>>>>>>>>>>> of offheap read path patches.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Do you think you or your coworker has the bandwidth to
> >>>> publish
> >>>>>>>>>>> backport
> >>>>>>>>>>>>>>> for branch-1 ?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Nov 18, 2016, at 12:11 AM, Yu Li <car...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Dear all,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We have backported read path offheap (HBASE-11425) to
> >> our
> >>>>>>> customized
> >>>>>>>>>>>>>>> hbase-1.1.2 (thanks @Anoop for the help/support) and run
> >>> it
> >>>>>>> online for
> >>>>>>>>>>>>> more
> >>>>>>>>>>>>>>> than a month, and would like to share our experience,
> >> for
> >>>> what
> >>>>>>> it's
> >>>>>>>>>>>>> worth
> >>>>>>>>>>>>>>> (smile).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Generally speaking, we gained a better and more stable
> >>>>>>>>>>>>>>> throughput/performance with offheap, and below are some
> >>>>> details:
> >>>>>>>>>>>>>>>> 1. QPS become more stable with offheap
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Performance w/o offheap:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Performance w/ offheap:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> These data come from our online A/B test cluster (with
> >>> 450
> >>>>>>> physical
> >>>>>>>>>>>>>>> machines, and each with 256G memory + 64 core) with real
> >>>> world
> >>>>>>>>>>>>> workloads,
> >>>>>>>>>>>>>>> it shows using offheap we could gain a more stable
> >>>> throughput
> >>>>> as
> >>>>>>> well
> >>>>>>>>>>> as
> >>>>>>>>>>>>>>> better performance
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Not showing fully online data here because for online
> >> we
> >>>>>>> published
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> version with both offheap and NettyRpcServer together,
> >> so
> >>> no
> >>>>>>>>>>> standalone
> >>>>>>>>>>>>>>> comparison data for offheap
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 2. Full GC frequency and cost
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Average Full GC STW time reduce from 11s to 7s with
> >>>> offheap.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 3. Young GC frequency and cost
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> No performance degradation observed with offheap.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 4. Peak throughput of one single RS
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Singles Day (11/11), peak throughput of one single
> >> RS
> >>>>>> reached
> >>>>>>>>>>> 100K,
> >>>>>>>>>>>>>>> among which 90K from Get. Plus internet in/out data we
> >>> could
> >>>>>> know
> >>>>>>> the
> >>>>>>>>>>>>>>> average result size of get request is ~1KB
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Offheap are used on all online machines (more than 1600
> >>>>> nodes)
> >>>>>>>>>>> instead
> >>>>>>>>>>>>>>> of LruCache, so the above QPS is gained from offheap
> >>>>>> bucketcache,
> >>>>>>>>>>> along
> >>>>>>>>>>>>>>> with NettyRpcServer(HBASE-15756).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Just let us know if any comments. Thanks.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>> Yu
> >>>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: Use experience and performance data of offheap from Alibaba online cluster

Reply via email to