Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-22 Thread Yu Li
Thanks Andrew, actually a blog is coming soon (smile).

And I've opened HBASE-17138
 for the
backport-to-branch-1 discussion, FWIW.

Best Regards,
Yu

On 22 November 2016 at 22:13, Andrew Purtell 
wrote:

> > I hope we could strengthen our faith in HBase capability
>
> Us too. Would you be interested in taking the metrics and discussion of
> them that came out in this thread into a post for the HBase project blog (
> https://blogs.apache.org/hbase)? As you can see from the other blog
> entries details about the use case does not need to reveal proprietary
> information, readers would be most interested in the metrics you
> observed/achieved on 11/11 followed by a technical discussion of how
> (roughly) to replicate them. You have good command of the English language
> so that won't be a problem and anyway I offer my services as editor should
> you like to try. Think about it. This would be a great post. I am sure,
> very popular.
>
>
> > On Nov 22, 2016, at 12:51 AM, Yu Li  wrote:
> >
> > bq. If it were not "confidential" might you mention why there is such a
> > large (several orders of magnitude) explosion of end user queries to
> > backend ones?
> > For index building and online machine learning system, there're more
> > information recorded after each visit/trade, such as user query/click
> > history, item stock updates, etc., and multiple user-specific feature
> data
> > will be read/updated for better recommendation. The flow is pretty much
> > like:
> > user visit some items
> > -> put them into shopping cart
> > -> checkout/removing item from shopping cart
> > -> item stock update/recommend new items to user
> > -> user visit new items
> > Not that much details could be supplied but I believe we could imagine
> how
> > many queries/updates there will be at backend for such loops, right?
> (smile)
> >
> > Thanks again for the interest and questions although a little bit derail
> of
> > the thread, and I hope we could strengthen our faith in HBase capability
> > after these discussions. :-)
> >
> > Best Regards,
> > Yu
> >
> >> On 21 November 2016 at 01:26, Stephen Boesch  wrote:
> >>
> >> Thanks Yu - given your apparent direct knowledge of the data that is
> >> helpful (my response earlier had been to  张铎) .   It is important so as
> to
> >> ensure informing colleagues of numbers that are "real".
> >>
> >> If it were not "confidential" might you mention why there is such a
> large
> >> (several orders of magnitude) explosion of end user queries to backend
> >> ones?
> >>
> >>
> >>
> >> 2016-11-20 7:51 GMT-08:00 Yu Li :
> >>
> >>> Thanks everyone for the feedback/comments, glad this data means
> something
> >>> and have drawn your interesting. Let me answer the questions (and sorry
> >> for
> >>> the lag)
> >>>
> >>> For the backport patches, ours are based on a customized 1.1.2 version
> >> and
> >>> cannot apply directly for any 1.x branches. It would be easy for us to
> >>> upload existing patches somewhere but obviously not that useful... so
> >> maybe
> >>> we still should get them in branch-1 and officially support read-path
> >>> offheap in future 1.x release? Let me create one JIRA about this and
> >> let's
> >>> discuss in the JIRA system. And to be very clear, it's a big YES to
> share
> >>> our patches with all rather than only numbers, just which way is better
> >>> (smile).
> >>>
> >>> And answers for @Stephen Boesch:
> >>>
> >>> bq. In any case the data is marked as 9/25/16 not 11/11/16
> >>> It's specially noted that the data on 9/25 are from our online A/B test
> >>> cluster, and not showing fully online data because we published offheap
> >>> together with NettyRpcServer for online thus no standalone comparison
> >> data
> >>> for offheap. Please check my original email more carefully (smile).
> >>>
> >>> bq. Repeating my earlier question:  20*Meg* queries per second??  Just
> >>> checked and *google* does 40*K* queries per second.
> >>> As you already noticed, the 20M QPS is number from A/B testing cluster
> >> (450
> >>> nodes), and there're much more on 11/11 online cluster (1600+ nodes).
> >>> Please note that this is NOT some cluster directly serves queries from
> >> end
> >>> user, but serving the index building and online machine learning
> system.
> >>> Refer to our talk on hbasecon2016 (slides
> >>>  >> apache-hbase-and-its-
> >>> applications-in-alibaba-search>
> >>> /recording
> >>>  h9HrA9qfDVOeNh1l_
> >>> T5HvwvkO9raWy=10>)
> >>> for more details, if you're interested. And different from google,
> >> there's
> >>> an obvious "hot spot" for us, so I don't think the QPS of these two
> >>> different systems are comparable.
> >>>
> >>> bq. So maybe please check your numbers again.
> >>> The numbers are got from online monitoring system and 

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-22 Thread Andrew Purtell
> I hope we could strengthen our faith in HBase capability

Us too. Would you be interested in taking the metrics and discussion of them 
that came out in this thread into a post for the HBase project blog 
(https://blogs.apache.org/hbase)? As you can see from the other blog entries 
details about the use case does not need to reveal proprietary information, 
readers would be most interested in the metrics you observed/achieved on 11/11 
followed by a technical discussion of how (roughly) to replicate them. You have 
good command of the English language so that won't be a problem and anyway I 
offer my services as editor should you like to try. Think about it. This would 
be a great post. I am sure, very popular. 


> On Nov 22, 2016, at 12:51 AM, Yu Li  wrote:
> 
> bq. If it were not "confidential" might you mention why there is such a
> large (several orders of magnitude) explosion of end user queries to
> backend ones?
> For index building and online machine learning system, there're more
> information recorded after each visit/trade, such as user query/click
> history, item stock updates, etc., and multiple user-specific feature data
> will be read/updated for better recommendation. The flow is pretty much
> like:
> user visit some items
> -> put them into shopping cart
> -> checkout/removing item from shopping cart
> -> item stock update/recommend new items to user
> -> user visit new items
> Not that much details could be supplied but I believe we could imagine how
> many queries/updates there will be at backend for such loops, right? (smile)
> 
> Thanks again for the interest and questions although a little bit derail of
> the thread, and I hope we could strengthen our faith in HBase capability
> after these discussions. :-)
> 
> Best Regards,
> Yu
> 
>> On 21 November 2016 at 01:26, Stephen Boesch  wrote:
>> 
>> Thanks Yu - given your apparent direct knowledge of the data that is
>> helpful (my response earlier had been to  张铎) .   It is important so as to
>> ensure informing colleagues of numbers that are "real".
>> 
>> If it were not "confidential" might you mention why there is such a large
>> (several orders of magnitude) explosion of end user queries to backend
>> ones?
>> 
>> 
>> 
>> 2016-11-20 7:51 GMT-08:00 Yu Li :
>> 
>>> Thanks everyone for the feedback/comments, glad this data means something
>>> and have drawn your interesting. Let me answer the questions (and sorry
>> for
>>> the lag)
>>> 
>>> For the backport patches, ours are based on a customized 1.1.2 version
>> and
>>> cannot apply directly for any 1.x branches. It would be easy for us to
>>> upload existing patches somewhere but obviously not that useful... so
>> maybe
>>> we still should get them in branch-1 and officially support read-path
>>> offheap in future 1.x release? Let me create one JIRA about this and
>> let's
>>> discuss in the JIRA system. And to be very clear, it's a big YES to share
>>> our patches with all rather than only numbers, just which way is better
>>> (smile).
>>> 
>>> And answers for @Stephen Boesch:
>>> 
>>> bq. In any case the data is marked as 9/25/16 not 11/11/16
>>> It's specially noted that the data on 9/25 are from our online A/B test
>>> cluster, and not showing fully online data because we published offheap
>>> together with NettyRpcServer for online thus no standalone comparison
>> data
>>> for offheap. Please check my original email more carefully (smile).
>>> 
>>> bq. Repeating my earlier question:  20*Meg* queries per second??  Just
>>> checked and *google* does 40*K* queries per second.
>>> As you already noticed, the 20M QPS is number from A/B testing cluster
>> (450
>>> nodes), and there're much more on 11/11 online cluster (1600+ nodes).
>>> Please note that this is NOT some cluster directly serves queries from
>> end
>>> user, but serving the index building and online machine learning system.
>>> Refer to our talk on hbasecon2016 (slides
>>> > apache-hbase-and-its-
>>> applications-in-alibaba-search>
>>> /recording
>>> >> T5HvwvkO9raWy=10>)
>>> for more details, if you're interested. And different from google,
>> there's
>>> an obvious "hot spot" for us, so I don't think the QPS of these two
>>> different systems are comparable.
>>> 
>>> bq. So maybe please check your numbers again.
>>> The numbers are got from online monitoring system and all real not fake,
>> so
>>> no need to check. Maybe just need some more time to take and understand?
>>> (smile)
>>> 
>>> Best Regards,
>>> Yu
>>> 
 On 20 November 2016 at 23:03, Stephen Boesch  wrote:
 
 Your arguments do not reflect direct knowledge of the numbers.  (a)
>> There
 is no super-spikiness int he graphs in the data (b) In any case the
>> data
>>> is
 marked as 9/25/16 not 11/11/16.  (c) The number of internet users says

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-20 Thread Stephen Boesch
Thanks Yu - given your apparent direct knowledge of the data that is
helpful (my response earlier had been to  张铎) .   It is important so as to
ensure informing colleagues of numbers that are "real".

If it were not "confidential" might you mention why there is such a large
(several orders of magnitude) explosion of end user queries to backend ones?



2016-11-20 7:51 GMT-08:00 Yu Li :

> Thanks everyone for the feedback/comments, glad this data means something
> and have drawn your interesting. Let me answer the questions (and sorry for
> the lag)
>
> For the backport patches, ours are based on a customized 1.1.2 version and
> cannot apply directly for any 1.x branches. It would be easy for us to
> upload existing patches somewhere but obviously not that useful... so maybe
> we still should get them in branch-1 and officially support read-path
> offheap in future 1.x release? Let me create one JIRA about this and let's
> discuss in the JIRA system. And to be very clear, it's a big YES to share
> our patches with all rather than only numbers, just which way is better
> (smile).
>
> And answers for @Stephen Boesch:
>
> bq. In any case the data is marked as 9/25/16 not 11/11/16
> It's specially noted that the data on 9/25 are from our online A/B test
> cluster, and not showing fully online data because we published offheap
> together with NettyRpcServer for online thus no standalone comparison data
> for offheap. Please check my original email more carefully (smile).
>
> bq. Repeating my earlier question:  20*Meg* queries per second??  Just
> checked and *google* does 40*K* queries per second.
> As you already noticed, the 20M QPS is number from A/B testing cluster (450
> nodes), and there're much more on 11/11 online cluster (1600+ nodes).
> Please note that this is NOT some cluster directly serves queries from end
> user, but serving the index building and online machine learning system.
> Refer to our talk on hbasecon2016 (slides
>  applications-in-alibaba-search>
> /recording
>  T5HvwvkO9raWy=10>)
> for more details, if you're interested. And different from google, there's
> an obvious "hot spot" for us, so I don't think the QPS of these two
> different systems are comparable.
>
> bq. So maybe please check your numbers again.
> The numbers are got from online monitoring system and all real not fake, so
> no need to check. Maybe just need some more time to take and understand?
> (smile)
>
> Best Regards,
> Yu
>
> On 20 November 2016 at 23:03, Stephen Boesch  wrote:
>
> > Your arguments do not reflect direct knowledge of the numbers.  (a) There
> > is no super-spikiness int he graphs in the data (b) In any case the data
> is
> > marked as 9/25/16 not 11/11/16.  (c) The number of internet users says
> > little about the number of *concurrent* users.
> >
> > Overall it would be helpful for those who actually collected the data to
> > comment - not just speculation from someone who does not. As I had
> > mentioned already there *may* be a huge fanout from number of
> > user/application queries to the backend: but *huge* it would seemingly
> need
> > to be to generate the numbers shown.
> >
> > 2016-11-19 22:39 GMT-08:00 张铎 :
> >
> > > 11.11 is something like the Black Friday. Almost every item on Alibaba
> > will
> > > discount a lot at 11.11. Alibaba earned a 1 billion revenue within 1
> > > minute(52 seconds) and 10 billion revenue within 7 minutes(6 minutes 58
> > > seconds) at 11.11. The Chinese people had payed more 120 billion
> Chinese
> > > yuan to alibaba at 11.11. And I remember that Jeff Dean used to give a
> > > slides that for google the amplification from user queries to the
> storage
> > > system queries is also very large(I can not remember the exact number.
> > The
> > > slides is used to explain that hedge read is very useful for reducing
> > > latency). So I think the peak throughput is true.
> > >
> > > There are more than 600 million people in China that use internet. So
> if
> > > they decide to do something to your system at the same time, it looks
> > like
> > > a DDOS to your system...
> > >
> > > Thanks.
> > >
> > > 2016-11-20 12:56 GMT+08:00 Stephen Boesch :
> > >
> > > > Repeating my earlier question:  20*Meg* queries per second??  Just
> > > checked
> > > > and *google* does 40*K* queries per second. Now maybe the "queries"
> > are a
> > > > decomposition of far fewer end-user queries that cause a fanout of
> > > backend
> > > > queries. *But still .. *
> > > >
> > > > So maybe please check your numbers again.
> > > >
> > > > 2016-11-19 17:05 GMT-08:00 Heng Chen :
> > > >
> > > > > The performance looks great!
> > > > >
> > > > > 2016-11-19 18:03 GMT+08:00 Ted Yu :
> > > > > > Opening a JIRA would be fine.
> > > > > 

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-20 Thread Yu Li
Thanks everyone for the feedback/comments, glad this data means something
and have drawn your interesting. Let me answer the questions (and sorry for
the lag)

For the backport patches, ours are based on a customized 1.1.2 version and
cannot apply directly for any 1.x branches. It would be easy for us to
upload existing patches somewhere but obviously not that useful... so maybe
we still should get them in branch-1 and officially support read-path
offheap in future 1.x release? Let me create one JIRA about this and let's
discuss in the JIRA system. And to be very clear, it's a big YES to share
our patches with all rather than only numbers, just which way is better
(smile).

And answers for @Stephen Boesch:

bq. In any case the data is marked as 9/25/16 not 11/11/16
It's specially noted that the data on 9/25 are from our online A/B test
cluster, and not showing fully online data because we published offheap
together with NettyRpcServer for online thus no standalone comparison data
for offheap. Please check my original email more carefully (smile).

bq. Repeating my earlier question:  20*Meg* queries per second??  Just
checked and *google* does 40*K* queries per second.
As you already noticed, the 20M QPS is number from A/B testing cluster (450
nodes), and there're much more on 11/11 online cluster (1600+ nodes).
Please note that this is NOT some cluster directly serves queries from end
user, but serving the index building and online machine learning system.
Refer to our talk on hbasecon2016 (slides

/recording
)
for more details, if you're interested. And different from google, there's
an obvious "hot spot" for us, so I don't think the QPS of these two
different systems are comparable.

bq. So maybe please check your numbers again.
The numbers are got from online monitoring system and all real not fake, so
no need to check. Maybe just need some more time to take and understand?
(smile)

Best Regards,
Yu

On 20 November 2016 at 23:03, Stephen Boesch  wrote:

> Your arguments do not reflect direct knowledge of the numbers.  (a) There
> is no super-spikiness int he graphs in the data (b) In any case the data is
> marked as 9/25/16 not 11/11/16.  (c) The number of internet users says
> little about the number of *concurrent* users.
>
> Overall it would be helpful for those who actually collected the data to
> comment - not just speculation from someone who does not. As I had
> mentioned already there *may* be a huge fanout from number of
> user/application queries to the backend: but *huge* it would seemingly need
> to be to generate the numbers shown.
>
> 2016-11-19 22:39 GMT-08:00 张铎 :
>
> > 11.11 is something like the Black Friday. Almost every item on Alibaba
> will
> > discount a lot at 11.11. Alibaba earned a 1 billion revenue within 1
> > minute(52 seconds) and 10 billion revenue within 7 minutes(6 minutes 58
> > seconds) at 11.11. The Chinese people had payed more 120 billion Chinese
> > yuan to alibaba at 11.11. And I remember that Jeff Dean used to give a
> > slides that for google the amplification from user queries to the storage
> > system queries is also very large(I can not remember the exact number.
> The
> > slides is used to explain that hedge read is very useful for reducing
> > latency). So I think the peak throughput is true.
> >
> > There are more than 600 million people in China that use internet. So if
> > they decide to do something to your system at the same time, it looks
> like
> > a DDOS to your system...
> >
> > Thanks.
> >
> > 2016-11-20 12:56 GMT+08:00 Stephen Boesch :
> >
> > > Repeating my earlier question:  20*Meg* queries per second??  Just
> > checked
> > > and *google* does 40*K* queries per second. Now maybe the "queries"
> are a
> > > decomposition of far fewer end-user queries that cause a fanout of
> > backend
> > > queries. *But still .. *
> > >
> > > So maybe please check your numbers again.
> > >
> > > 2016-11-19 17:05 GMT-08:00 Heng Chen :
> > >
> > > > The performance looks great!
> > > >
> > > > 2016-11-19 18:03 GMT+08:00 Ted Yu :
> > > > > Opening a JIRA would be fine.
> > > > > This makes it easier for people to obtain the patch(es).
> > > > >
> > > > > Cheers
> > > > >
> > > > >> On Nov 18, 2016, at 11:35 PM, Anoop John 
> > > wrote:
> > > > >>
> > > > >> Because of some compatibility issues, we decide that this will be
> > done
> > > > >> in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
> > > > >> backported patches.  Is it a mega patch at ur end?  Or issue by
> > issue
> > > > >> patches?  Latter would be best.  Pls share patches in some place
> > and a
> > > > >> list of issues backported. I can help with verifying the issues
> 

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-19 Thread 张铎
11.11 is something like the Black Friday. Almost every item on Alibaba will
discount a lot at 11.11. Alibaba earned a 1 billion revenue within 1
minute(52 seconds) and 10 billion revenue within 7 minutes(6 minutes 58
seconds) at 11.11. The Chinese people had payed more 120 billion Chinese
yuan to alibaba at 11.11. And I remember that Jeff Dean used to give a
slides that for google the amplification from user queries to the storage
system queries is also very large(I can not remember the exact number. The
slides is used to explain that hedge read is very useful for reducing
latency). So I think the peak throughput is true.

There are more than 600 million people in China that use internet. So if
they decide to do something to your system at the same time, it looks like
a DDOS to your system...

Thanks.

2016-11-20 12:56 GMT+08:00 Stephen Boesch :

> Repeating my earlier question:  20*Meg* queries per second??  Just checked
> and *google* does 40*K* queries per second. Now maybe the "queries" are a
> decomposition of far fewer end-user queries that cause a fanout of backend
> queries. *But still .. *
>
> So maybe please check your numbers again.
>
> 2016-11-19 17:05 GMT-08:00 Heng Chen :
>
> > The performance looks great!
> >
> > 2016-11-19 18:03 GMT+08:00 Ted Yu :
> > > Opening a JIRA would be fine.
> > > This makes it easier for people to obtain the patch(es).
> > >
> > > Cheers
> > >
> > >> On Nov 18, 2016, at 11:35 PM, Anoop John 
> wrote:
> > >>
> > >> Because of some compatibility issues, we decide that this will be done
> > >> in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
> > >> backported patches.  Is it a mega patch at ur end?  Or issue by issue
> > >> patches?  Latter would be best.  Pls share patches in some place and a
> > >> list of issues backported. I can help with verifying the issues once
> > >> so as to make sure we dont miss any...
> > >>
> > >> -Anoop-
> > >>
> > >>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar 
> > wrote:
> > >>> Thanks for sharing this. Great work.
> > >>>
> > >>> I don't see any reason why we cannot backport to branch-1.
> > >>>
> > >>> Enis
> > >>>
> > >>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell <
> > andrew.purt...@gmail.com>
> > >>> wrote:
> > >>>
> >  Yes, please, the patches will be useful to the community even if we
> > decide
> >  not to backport into an official 1.x release.
> > 
> > 
> > >> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
> > > bbeaudrea...@hubspot.com> wrote:
> > >
> > > Is the backported patch available anywhere? Not seeing it on the
> >  referenced
> > > JIRA. If it ends up not getting officially backported to branch-1
> > due to
> > > 2.0 around the corner, some of us who build our own deploy may want
> > to
> > > integrate into our builds. Thanks! These numbers look great
> > >
> > >> On Fri, Nov 18, 2016 at 12:20 PM Anoop John <
> anoop.hb...@gmail.com>
> >  wrote:
> > >>
> > >> Hi Yu Li
> > >>  Good to see that the off heap work help you..  The
> perf
> > >> numbers looks great.  So this is a compare of on heap L1 cache vs
> > off
> >  heap
> > >> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off
> > heap
> > >> cache ON by default I believe.  Will raise a jira for that we can
> >  discuss
> > >> under that.   Seems like L2 off heap cache for data blocks and L1
> > cache
> >  for
> > >> index blocks seems a right choice.
> > >>
> > >> Thanks for the backport and the help in testing the feature..  You
> > were
> > >> able to find some corner case bugs and helped community to fix
> > them..
> > >> Thanks goes to ur whole team.
> > >>
> > >> -Anoop-
> > >>
> > >>
> > >>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li 
> wrote:
> > >>>
> > >>> Sorry guys, let me retry the inline images:
> > >>>
> > >>> Performance w/o offheap:
> > >>>
> > >>>
> > >>> Performance w/ offheap:
> > >>>
> > >>>
> > >>> Peak Get QPS of one single RS during Singles' Day (11/11):
> > >>>
> > >>>
> > >>>
> > >>> And attach the files in case inline still not working:
> > >>>
> > >>> Performance_without_offheap.png
> > >>> <
> > >> https://drive.google.com/file/d/0B017Q40_
> > F5uwbWEzUGktYVIya3JkcXVjRkFvVG
> >  NtM0VxWC1n/view?usp=drive_web
> > >>>
> > >>>
> > >>> Performance_with_offheap.png
> > >>> <
> > >> https://drive.google.com/file/d/0B017Q40_
> > F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
> >  VrcUdPc2ww/view?usp=drive_web
> > >>>
> > >>>
> > >>> Peak_Get_QPS_of_Single_RS.png
> > >>> <
> > >> https://drive.google.com/file/d/0B017Q40_
> > F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
> >  F6bHpNYnJz/view?usp=drive_web
> > >>>
> > >>>
> > 

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-19 Thread Stephen Boesch
Repeating my earlier question:  20*Meg* queries per second??  Just checked
and *google* does 40*K* queries per second. Now maybe the "queries" are a
decomposition of far fewer end-user queries that cause a fanout of backend
queries. *But still .. *

So maybe please check your numbers again.

2016-11-19 17:05 GMT-08:00 Heng Chen :

> The performance looks great!
>
> 2016-11-19 18:03 GMT+08:00 Ted Yu :
> > Opening a JIRA would be fine.
> > This makes it easier for people to obtain the patch(es).
> >
> > Cheers
> >
> >> On Nov 18, 2016, at 11:35 PM, Anoop John  wrote:
> >>
> >> Because of some compatibility issues, we decide that this will be done
> >> in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
> >> backported patches.  Is it a mega patch at ur end?  Or issue by issue
> >> patches?  Latter would be best.  Pls share patches in some place and a
> >> list of issues backported. I can help with verifying the issues once
> >> so as to make sure we dont miss any...
> >>
> >> -Anoop-
> >>
> >>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar 
> wrote:
> >>> Thanks for sharing this. Great work.
> >>>
> >>> I don't see any reason why we cannot backport to branch-1.
> >>>
> >>> Enis
> >>>
> >>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell <
> andrew.purt...@gmail.com>
> >>> wrote:
> >>>
>  Yes, please, the patches will be useful to the community even if we
> decide
>  not to backport into an official 1.x release.
> 
> 
> >> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
> > bbeaudrea...@hubspot.com> wrote:
> >
> > Is the backported patch available anywhere? Not seeing it on the
>  referenced
> > JIRA. If it ends up not getting officially backported to branch-1
> due to
> > 2.0 around the corner, some of us who build our own deploy may want
> to
> > integrate into our builds. Thanks! These numbers look great
> >
> >> On Fri, Nov 18, 2016 at 12:20 PM Anoop John 
>  wrote:
> >>
> >> Hi Yu Li
> >>  Good to see that the off heap work help you..  The perf
> >> numbers looks great.  So this is a compare of on heap L1 cache vs
> off
>  heap
> >> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off
> heap
> >> cache ON by default I believe.  Will raise a jira for that we can
>  discuss
> >> under that.   Seems like L2 off heap cache for data blocks and L1
> cache
>  for
> >> index blocks seems a right choice.
> >>
> >> Thanks for the backport and the help in testing the feature..  You
> were
> >> able to find some corner case bugs and helped community to fix
> them..
> >> Thanks goes to ur whole team.
> >>
> >> -Anoop-
> >>
> >>
> >>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
> >>>
> >>> Sorry guys, let me retry the inline images:
> >>>
> >>> Performance w/o offheap:
> >>>
> >>>
> >>> Performance w/ offheap:
> >>>
> >>>
> >>> Peak Get QPS of one single RS during Singles' Day (11/11):
> >>>
> >>>
> >>>
> >>> And attach the files in case inline still not working:
> >>>
> >>> Performance_without_offheap.png
> >>> <
> >> https://drive.google.com/file/d/0B017Q40_
> F5uwbWEzUGktYVIya3JkcXVjRkFvVG
>  NtM0VxWC1n/view?usp=drive_web
> >>>
> >>>
> >>> Performance_with_offheap.png
> >>> <
> >> https://drive.google.com/file/d/0B017Q40_
> F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
>  VrcUdPc2ww/view?usp=drive_web
> >>>
> >>>
> >>> Peak_Get_QPS_of_Single_RS.png
> >>> <
> >> https://drive.google.com/file/d/0B017Q40_
> F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
>  F6bHpNYnJz/view?usp=drive_web
> >>>
> >>>
> >>>
> >>>
> >>> Best Regards,
> >>> Yu
> >>>
>  On 18 November 2016 at 19:29, Ted Yu  wrote:
> 
>  Yu:
>  With positive results, more hbase users would be asking for the
>  backport
>  of offheap read path patches.
> 
>  Do you think you or your coworker has the bandwidth to publish
>  backport
>  for branch-1 ?
> 
>  Thanks
> 
> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
> >
> > Dear all,
> >
> > We have backported read path offheap (HBASE-11425) to our
> customized
>  hbase-1.1.2 (thanks @Anoop for the help/support) and run it
> online for
> >> more
>  than a month, and would like to share our experience, for what
> it's
> >> worth
>  (smile).
> >
> > Generally speaking, we gained a better and more stable
>  throughput/performance with offheap, and below are some details:
> > 1. QPS become more stable with offheap
> >
> > Performance 

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-19 Thread Heng Chen
The performance looks great!

2016-11-19 18:03 GMT+08:00 Ted Yu :
> Opening a JIRA would be fine.
> This makes it easier for people to obtain the patch(es).
>
> Cheers
>
>> On Nov 18, 2016, at 11:35 PM, Anoop John  wrote:
>>
>> Because of some compatibility issues, we decide that this will be done
>> in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
>> backported patches.  Is it a mega patch at ur end?  Or issue by issue
>> patches?  Latter would be best.  Pls share patches in some place and a
>> list of issues backported. I can help with verifying the issues once
>> so as to make sure we dont miss any...
>>
>> -Anoop-
>>
>>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar  wrote:
>>> Thanks for sharing this. Great work.
>>>
>>> I don't see any reason why we cannot backport to branch-1.
>>>
>>> Enis
>>>
>>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell 
>>> wrote:
>>>
 Yes, please, the patches will be useful to the community even if we decide
 not to backport into an official 1.x release.


>> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrote:
>
> Is the backported patch available anywhere? Not seeing it on the
 referenced
> JIRA. If it ends up not getting officially backported to branch-1 due to
> 2.0 around the corner, some of us who build our own deploy may want to
> integrate into our builds. Thanks! These numbers look great
>
>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John 
 wrote:
>>
>> Hi Yu Li
>>  Good to see that the off heap work help you..  The perf
>> numbers looks great.  So this is a compare of on heap L1 cache vs off
 heap
>> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
>> cache ON by default I believe.  Will raise a jira for that we can
 discuss
>> under that.   Seems like L2 off heap cache for data blocks and L1 cache
 for
>> index blocks seems a right choice.
>>
>> Thanks for the backport and the help in testing the feature..  You were
>> able to find some corner case bugs and helped community to fix them..
>> Thanks goes to ur whole team.
>>
>> -Anoop-
>>
>>
>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
>>>
>>> Sorry guys, let me retry the inline images:
>>>
>>> Performance w/o offheap:
>>>
>>>
>>> Performance w/ offheap:
>>>
>>>
>>> Peak Get QPS of one single RS during Singles' Day (11/11):
>>>
>>>
>>>
>>> And attach the files in case inline still not working:
>>>
>>> Performance_without_offheap.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVG
 NtM0VxWC1n/view?usp=drive_web
>>>
>>>
>>> Performance_with_offheap.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
 VrcUdPc2ww/view?usp=drive_web
>>>
>>>
>>> Peak_Get_QPS_of_Single_RS.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
 F6bHpNYnJz/view?usp=drive_web
>>>
>>>
>>>
>>>
>>> Best Regards,
>>> Yu
>>>
 On 18 November 2016 at 19:29, Ted Yu  wrote:

 Yu:
 With positive results, more hbase users would be asking for the
 backport
 of offheap read path patches.

 Do you think you or your coworker has the bandwidth to publish
 backport
 for branch-1 ?

 Thanks

> On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
>
> Dear all,
>
> We have backported read path offheap (HBASE-11425) to our customized
 hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
>> more
 than a month, and would like to share our experience, for what it's
>> worth
 (smile).
>
> Generally speaking, we gained a better and more stable
 throughput/performance with offheap, and below are some details:
> 1. QPS become more stable with offheap
>
> Performance w/o offheap:
>
>
>
> Performance w/ offheap:
>
>
>
> These data come from our online A/B test cluster (with 450 physical
 machines, and each with 256G memory + 64 core) with real world
>> workloads,
 it shows using offheap we could gain a more stable throughput as well
 as
 better performance
>
> Not showing fully online data here because for online we published
 the
 version with both offheap and NettyRpcServer together, so no
 standalone
 comparison data for offheap
>
> 2. Full GC frequency and 

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-19 Thread Ted Yu
Opening a JIRA would be fine. 
This makes it easier for people to obtain the patch(es). 

Cheers

> On Nov 18, 2016, at 11:35 PM, Anoop John  wrote:
> 
> Because of some compatibility issues, we decide that this will be done
> in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
> backported patches.  Is it a mega patch at ur end?  Or issue by issue
> patches?  Latter would be best.  Pls share patches in some place and a
> list of issues backported. I can help with verifying the issues once
> so as to make sure we dont miss any...
> 
> -Anoop-
> 
>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar  wrote:
>> Thanks for sharing this. Great work.
>> 
>> I don't see any reason why we cannot backport to branch-1.
>> 
>> Enis
>> 
>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell 
>> wrote:
>> 
>>> Yes, please, the patches will be useful to the community even if we decide
>>> not to backport into an official 1.x release.
>>> 
>>> 
> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
 bbeaudrea...@hubspot.com> wrote:
 
 Is the backported patch available anywhere? Not seeing it on the
>>> referenced
 JIRA. If it ends up not getting officially backported to branch-1 due to
 2.0 around the corner, some of us who build our own deploy may want to
 integrate into our builds. Thanks! These numbers look great
 
> On Fri, Nov 18, 2016 at 12:20 PM Anoop John 
>>> wrote:
> 
> Hi Yu Li
>  Good to see that the off heap work help you..  The perf
> numbers looks great.  So this is a compare of on heap L1 cache vs off
>>> heap
> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
> cache ON by default I believe.  Will raise a jira for that we can
>>> discuss
> under that.   Seems like L2 off heap cache for data blocks and L1 cache
>>> for
> index blocks seems a right choice.
> 
> Thanks for the backport and the help in testing the feature..  You were
> able to find some corner case bugs and helped community to fix them..
> Thanks goes to ur whole team.
> 
> -Anoop-
> 
> 
>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
>> 
>> Sorry guys, let me retry the inline images:
>> 
>> Performance w/o offheap:
>> 
>> 
>> Performance w/ offheap:
>> 
>> 
>> Peak Get QPS of one single RS during Singles' Day (11/11):
>> 
>> 
>> 
>> And attach the files in case inline still not working:
>> 
>> Performance_without_offheap.png
>> <
> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVG
>>> NtM0VxWC1n/view?usp=drive_web
>> 
>> 
>> Performance_with_offheap.png
>> <
> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
>>> VrcUdPc2ww/view?usp=drive_web
>> 
>> 
>> Peak_Get_QPS_of_Single_RS.png
>> <
> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
>>> F6bHpNYnJz/view?usp=drive_web
>> 
>> 
>> 
>> 
>> Best Regards,
>> Yu
>> 
>>> On 18 November 2016 at 19:29, Ted Yu  wrote:
>>> 
>>> Yu:
>>> With positive results, more hbase users would be asking for the
>>> backport
>>> of offheap read path patches.
>>> 
>>> Do you think you or your coworker has the bandwidth to publish
>>> backport
>>> for branch-1 ?
>>> 
>>> Thanks
>>> 
 On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
 
 Dear all,
 
 We have backported read path offheap (HBASE-11425) to our customized
>>> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
> more
>>> than a month, and would like to share our experience, for what it's
> worth
>>> (smile).
 
 Generally speaking, we gained a better and more stable
>>> throughput/performance with offheap, and below are some details:
 1. QPS become more stable with offheap
 
 Performance w/o offheap:
 
 
 
 Performance w/ offheap:
 
 
 
 These data come from our online A/B test cluster (with 450 physical
>>> machines, and each with 256G memory + 64 core) with real world
> workloads,
>>> it shows using offheap we could gain a more stable throughput as well
>>> as
>>> better performance
 
 Not showing fully online data here because for online we published
>>> the
>>> version with both offheap and NettyRpcServer together, so no
>>> standalone
>>> comparison data for offheap
 
 2. Full GC frequency and cost
 
 Average Full GC STW time reduce from 11s to 7s with offheap.
 
 3. Young GC frequency and cost
 
 No performance degradation observed 

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Anoop John
Because of some compatibility issues, we decide that this will be done
in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
backported patches.  Is it a mega patch at ur end?  Or issue by issue
patches?  Latter would be best.  Pls share patches in some place and a
list of issues backported. I can help with verifying the issues once
so as to make sure we dont miss any...

-Anoop-

On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar  wrote:
> Thanks for sharing this. Great work.
>
> I don't see any reason why we cannot backport to branch-1.
>
> Enis
>
> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell 
> wrote:
>
>> Yes, please, the patches will be useful to the community even if we decide
>> not to backport into an official 1.x release.
>>
>>
>> > On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
>> bbeaudrea...@hubspot.com> wrote:
>> >
>> > Is the backported patch available anywhere? Not seeing it on the
>> referenced
>> > JIRA. If it ends up not getting officially backported to branch-1 due to
>> > 2.0 around the corner, some of us who build our own deploy may want to
>> > integrate into our builds. Thanks! These numbers look great
>> >
>> >> On Fri, Nov 18, 2016 at 12:20 PM Anoop John 
>> wrote:
>> >>
>> >> Hi Yu Li
>> >>   Good to see that the off heap work help you..  The perf
>> >> numbers looks great.  So this is a compare of on heap L1 cache vs off
>> heap
>> >> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
>> >> cache ON by default I believe.  Will raise a jira for that we can
>> discuss
>> >> under that.   Seems like L2 off heap cache for data blocks and L1 cache
>> for
>> >> index blocks seems a right choice.
>> >>
>> >> Thanks for the backport and the help in testing the feature..  You were
>> >> able to find some corner case bugs and helped community to fix them..
>> >> Thanks goes to ur whole team.
>> >>
>> >> -Anoop-
>> >>
>> >>
>> >>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
>> >>>
>> >>> Sorry guys, let me retry the inline images:
>> >>>
>> >>> Performance w/o offheap:
>> >>>
>> >>>
>> >>> Performance w/ offheap:
>> >>>
>> >>>
>> >>> Peak Get QPS of one single RS during Singles' Day (11/11):
>> >>>
>> >>>
>> >>>
>> >>> And attach the files in case inline still not working:
>> >>>
>> >>> Performance_without_offheap.png
>> >>> <
>> >> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVG
>> NtM0VxWC1n/view?usp=drive_web
>> >>>
>> >>>
>> >>> Performance_with_offheap.png
>> >>> <
>> >> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
>> VrcUdPc2ww/view?usp=drive_web
>> >>>
>> >>>
>> >>> Peak_Get_QPS_of_Single_RS.png
>> >>> <
>> >> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
>> F6bHpNYnJz/view?usp=drive_web
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Best Regards,
>> >>> Yu
>> >>>
>>  On 18 November 2016 at 19:29, Ted Yu  wrote:
>> 
>>  Yu:
>>  With positive results, more hbase users would be asking for the
>> backport
>>  of offheap read path patches.
>> 
>>  Do you think you or your coworker has the bandwidth to publish
>> backport
>>  for branch-1 ?
>> 
>>  Thanks
>> 
>> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
>> >
>> > Dear all,
>> >
>> > We have backported read path offheap (HBASE-11425) to our customized
>>  hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
>> >> more
>>  than a month, and would like to share our experience, for what it's
>> >> worth
>>  (smile).
>> >
>> > Generally speaking, we gained a better and more stable
>>  throughput/performance with offheap, and below are some details:
>> > 1. QPS become more stable with offheap
>> >
>> > Performance w/o offheap:
>> >
>> >
>> >
>> > Performance w/ offheap:
>> >
>> >
>> >
>> > These data come from our online A/B test cluster (with 450 physical
>>  machines, and each with 256G memory + 64 core) with real world
>> >> workloads,
>>  it shows using offheap we could gain a more stable throughput as well
>> as
>>  better performance
>> >
>> > Not showing fully online data here because for online we published
>> the
>>  version with both offheap and NettyRpcServer together, so no
>> standalone
>>  comparison data for offheap
>> >
>> > 2. Full GC frequency and cost
>> >
>> > Average Full GC STW time reduce from 11s to 7s with offheap.
>> >
>> > 3. Young GC frequency and cost
>> >
>> > No performance degradation observed with offheap.
>> >
>> > 4. Peak throughput of one single RS
>> >
>> > On Singles Day (11/11), peak throughput of one single RS reached
>> 100K,
>>  among which 90K from Get. Plus internet in/out data we could know the
>>  average result size of get 

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread 曾伟展
+ 1

 原始邮件
发件人: 张铎<palomino...@gmail.com>
收件人: d...@hbase.apache.org<d...@hbase.apache.org>; 
user@hbase.apache.org<user@hbase.apache.org>
发送时间: 2016年11月18日(周五) 17:19
主题: Re: Use experience and performance data of offheap from Alibaba online 
cluster

正在载入邮件原文…


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Andrew Purtell
Yes, please, the patches will be useful to the community even if we decide not 
to backport into an official 1.x release.


> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault  
> wrote:
> 
> Is the backported patch available anywhere? Not seeing it on the referenced
> JIRA. If it ends up not getting officially backported to branch-1 due to
> 2.0 around the corner, some of us who build our own deploy may want to
> integrate into our builds. Thanks! These numbers look great
> 
>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John  wrote:
>> 
>> Hi Yu Li
>>   Good to see that the off heap work help you..  The perf
>> numbers looks great.  So this is a compare of on heap L1 cache vs off heap
>> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
>> cache ON by default I believe.  Will raise a jira for that we can discuss
>> under that.   Seems like L2 off heap cache for data blocks and L1 cache for
>> index blocks seems a right choice.
>> 
>> Thanks for the backport and the help in testing the feature..  You were
>> able to find some corner case bugs and helped community to fix them..
>> Thanks goes to ur whole team.
>> 
>> -Anoop-
>> 
>> 
>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
>>> 
>>> Sorry guys, let me retry the inline images:
>>> 
>>> Performance w/o offheap:
>>> 
>>> ​
>>> Performance w/ offheap:
>>> 
>>> ​
>>> Peak Get QPS of one single RS during Singles' Day (11/11):
>>> 
>>> ​
>>> 
>>> And attach the files in case inline still not working:
>>> ​​​
>>> Performance_without_offheap.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVGNtM0VxWC1n/view?usp=drive_web
>>> 
>>> ​​
>>> Performance_with_offheap.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeFVrcUdPc2ww/view?usp=drive_web
>>> 
>>> ​​
>>> Peak_Get_QPS_of_Single_RS.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3F6bHpNYnJz/view?usp=drive_web
>>> 
>>> ​
>>> 
>>> 
>>> Best Regards,
>>> Yu
>>> 
 On 18 November 2016 at 19:29, Ted Yu  wrote:
 
 Yu:
 With positive results, more hbase users would be asking for the backport
 of offheap read path patches.
 
 Do you think you or your coworker has the bandwidth to publish backport
 for branch-1 ?
 
 Thanks
 
> On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
> 
> Dear all,
> 
> We have backported read path offheap (HBASE-11425) to our customized
 hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
>> more
 than a month, and would like to share our experience, for what it's
>> worth
 (smile).
> 
> Generally speaking, we gained a better and more stable
 throughput/performance with offheap, and below are some details:
> 1. QPS become more stable with offheap
> 
> Performance w/o offheap:
> 
> 
> 
> Performance w/ offheap:
> 
> 
> 
> These data come from our online A/B test cluster (with 450 physical
 machines, and each with 256G memory + 64 core) with real world
>> workloads,
 it shows using offheap we could gain a more stable throughput as well as
 better performance
> 
> Not showing fully online data here because for online we published the
 version with both offheap and NettyRpcServer together, so no standalone
 comparison data for offheap
> 
> 2. Full GC frequency and cost
> 
> Average Full GC STW time reduce from 11s to 7s with offheap.
> 
> 3. Young GC frequency and cost
> 
> No performance degradation observed with offheap.
> 
> 4. Peak throughput of one single RS
> 
> On Singles Day (11/11), peak throughput of one single RS reached 100K,
 among which 90K from Get. Plus internet in/out data we could know the
 average result size of get request is ~1KB
> 
> 
> 
> Offheap are used on all online machines (more than 1600 nodes) instead
 of LruCache, so the above QPS is gained from offheap bucketcache, along
 with NettyRpcServer(HBASE-15756).
> 
> Just let us know if any comments. Thanks.
> 
> Best Regards,
> Yu
> 
> 
> 
> 
> 
> 
> 
 
>>> 
>>> 
>> 


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Bryan Beaudreault
Is the backported patch available anywhere? Not seeing it on the referenced
JIRA. If it ends up not getting officially backported to branch-1 due to
2.0 around the corner, some of us who build our own deploy may want to
integrate into our builds. Thanks! These numbers look great

On Fri, Nov 18, 2016 at 12:20 PM Anoop John  wrote:

> Hi Yu Li
>Good to see that the off heap work help you..  The perf
> numbers looks great.  So this is a compare of on heap L1 cache vs off heap
> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
> cache ON by default I believe.  Will raise a jira for that we can discuss
> under that.   Seems like L2 off heap cache for data blocks and L1 cache for
> index blocks seems a right choice.
>
> Thanks for the backport and the help in testing the feature..  You were
> able to find some corner case bugs and helped community to fix them..
> Thanks goes to ur whole team.
>
> -Anoop-
>
>
> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
>
> > Sorry guys, let me retry the inline images:
> >
> > Performance w/o offheap:
> >
> > ​
> > Performance w/ offheap:
> >
> > ​
> > Peak Get QPS of one single RS during Singles' Day (11/11):
> >
> > ​
> >
> > And attach the files in case inline still not working:
> > ​​​
> >  Performance_without_offheap.png
> > <
> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVGNtM0VxWC1n/view?usp=drive_web
> >
> > ​​
> >  Performance_with_offheap.png
> > <
> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeFVrcUdPc2ww/view?usp=drive_web
> >
> > ​​
> >  Peak_Get_QPS_of_Single_RS.png
> > <
> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3F6bHpNYnJz/view?usp=drive_web
> >
> > ​
> >
> >
> > Best Regards,
> > Yu
> >
> > On 18 November 2016 at 19:29, Ted Yu  wrote:
> >
> >> Yu:
> >> With positive results, more hbase users would be asking for the backport
> >> of offheap read path patches.
> >>
> >> Do you think you or your coworker has the bandwidth to publish backport
> >> for branch-1 ?
> >>
> >> Thanks
> >>
> >> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
> >> >
> >> > Dear all,
> >> >
> >> > We have backported read path offheap (HBASE-11425) to our customized
> >> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
> more
> >> than a month, and would like to share our experience, for what it's
> worth
> >> (smile).
> >> >
> >> > Generally speaking, we gained a better and more stable
> >> throughput/performance with offheap, and below are some details:
> >> > 1. QPS become more stable with offheap
> >> >
> >> > Performance w/o offheap:
> >> >
> >> >
> >> >
> >> > Performance w/ offheap:
> >> >
> >> >
> >> >
> >> > These data come from our online A/B test cluster (with 450 physical
> >> machines, and each with 256G memory + 64 core) with real world
> workloads,
> >> it shows using offheap we could gain a more stable throughput as well as
> >> better performance
> >> >
> >> > Not showing fully online data here because for online we published the
> >> version with both offheap and NettyRpcServer together, so no standalone
> >> comparison data for offheap
> >> >
> >> > 2. Full GC frequency and cost
> >> >
> >> > Average Full GC STW time reduce from 11s to 7s with offheap.
> >> >
> >> > 3. Young GC frequency and cost
> >> >
> >> > No performance degradation observed with offheap.
> >> >
> >> > 4. Peak throughput of one single RS
> >> >
> >> > On Singles Day (11/11), peak throughput of one single RS reached 100K,
> >> among which 90K from Get. Plus internet in/out data we could know the
> >> average result size of get request is ~1KB
> >> >
> >> >
> >> >
> >> > Offheap are used on all online machines (more than 1600 nodes) instead
> >> of LruCache, so the above QPS is gained from offheap bucketcache, along
> >> with NettyRpcServer(HBASE-15756).
> >> >
> >> > Just let us know if any comments. Thanks.
> >> >
> >> > Best Regards,
> >> > Yu
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >
> >
>


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Anoop John
Hi Yu Li
   Good to see that the off heap work help you..  The perf
numbers looks great.  So this is a compare of on heap L1 cache vs off heap
L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
cache ON by default I believe.  Will raise a jira for that we can discuss
under that.   Seems like L2 off heap cache for data blocks and L1 cache for
index blocks seems a right choice.

Thanks for the backport and the help in testing the feature..  You were
able to find some corner case bugs and helped community to fix them..
Thanks goes to ur whole team.

-Anoop-


On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:

> Sorry guys, let me retry the inline images:
>
> Performance w/o offheap:
>
> ​
> Performance w/ offheap:
>
> ​
> Peak Get QPS of one single RS during Singles' Day (11/11):
>
> ​
>
> And attach the files in case inline still not working:
> ​​​
>  Performance_without_offheap.png
> 
> ​​
>  Performance_with_offheap.png
> 
> ​​
>  Peak_Get_QPS_of_Single_RS.png
> 
> ​
>
>
> Best Regards,
> Yu
>
> On 18 November 2016 at 19:29, Ted Yu  wrote:
>
>> Yu:
>> With positive results, more hbase users would be asking for the backport
>> of offheap read path patches.
>>
>> Do you think you or your coworker has the bandwidth to publish backport
>> for branch-1 ?
>>
>> Thanks
>>
>> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
>> >
>> > Dear all,
>> >
>> > We have backported read path offheap (HBASE-11425) to our customized
>> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more
>> than a month, and would like to share our experience, for what it's worth
>> (smile).
>> >
>> > Generally speaking, we gained a better and more stable
>> throughput/performance with offheap, and below are some details:
>> > 1. QPS become more stable with offheap
>> >
>> > Performance w/o offheap:
>> >
>> >
>> >
>> > Performance w/ offheap:
>> >
>> >
>> >
>> > These data come from our online A/B test cluster (with 450 physical
>> machines, and each with 256G memory + 64 core) with real world workloads,
>> it shows using offheap we could gain a more stable throughput as well as
>> better performance
>> >
>> > Not showing fully online data here because for online we published the
>> version with both offheap and NettyRpcServer together, so no standalone
>> comparison data for offheap
>> >
>> > 2. Full GC frequency and cost
>> >
>> > Average Full GC STW time reduce from 11s to 7s with offheap.
>> >
>> > 3. Young GC frequency and cost
>> >
>> > No performance degradation observed with offheap.
>> >
>> > 4. Peak throughput of one single RS
>> >
>> > On Singles Day (11/11), peak throughput of one single RS reached 100K,
>> among which 90K from Get. Plus internet in/out data we could know the
>> average result size of get request is ~1KB
>> >
>> >
>> >
>> > Offheap are used on all online machines (more than 1600 nodes) instead
>> of LruCache, so the above QPS is gained from offheap bucketcache, along
>> with NettyRpcServer(HBASE-15756).
>> >
>> > Just let us know if any comments. Thanks.
>> >
>> > Best Regards,
>> > Yu
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Yu Li
Sorry guys, let me retry the inline images:

Performance w/o offheap:

​
Performance w/ offheap:

​
Peak Get QPS of one single RS during Singles' Day (11/11):

​

And attach the files in case inline still not working:
​​​
 Performance_without_offheap.png

​​
 Performance_with_offheap.png

​​
 Peak_Get_QPS_of_Single_RS.png

​


Best Regards,
Yu

On 18 November 2016 at 19:29, Ted Yu  wrote:

> Yu:
> With positive results, more hbase users would be asking for the backport
> of offheap read path patches.
>
> Do you think you or your coworker has the bandwidth to publish backport
> for branch-1 ?
>
> Thanks
>
> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
> >
> > Dear all,
> >
> > We have backported read path offheap (HBASE-11425) to our customized
> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more
> than a month, and would like to share our experience, for what it's worth
> (smile).
> >
> > Generally speaking, we gained a better and more stable
> throughput/performance with offheap, and below are some details:
> > 1. QPS become more stable with offheap
> >
> > Performance w/o offheap:
> >
> >
> >
> > Performance w/ offheap:
> >
> >
> >
> > These data come from our online A/B test cluster (with 450 physical
> machines, and each with 256G memory + 64 core) with real world workloads,
> it shows using offheap we could gain a more stable throughput as well as
> better performance
> >
> > Not showing fully online data here because for online we published the
> version with both offheap and NettyRpcServer together, so no standalone
> comparison data for offheap
> >
> > 2. Full GC frequency and cost
> >
> > Average Full GC STW time reduce from 11s to 7s with offheap.
> >
> > 3. Young GC frequency and cost
> >
> > No performance degradation observed with offheap.
> >
> > 4. Peak throughput of one single RS
> >
> > On Singles Day (11/11), peak throughput of one single RS reached 100K,
> among which 90K from Get. Plus internet in/out data we could know the
> average result size of get request is ~1KB
> >
> >
> >
> > Offheap are used on all online machines (more than 1600 nodes) instead
> of LruCache, so the above QPS is gained from offheap bucketcache, along
> with NettyRpcServer(HBASE-15756).
> >
> > Just let us know if any comments. Thanks.
> >
> > Best Regards,
> > Yu
> >
> >
> >
> >
> >
> >
> >
>


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Ted Yu
Yu:
With positive results, more hbase users would be asking for the backport of 
offheap read path patches. 

Do you think you or your coworker has the bandwidth to publish backport for 
branch-1 ?

Thanks 

> On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
> 
> Dear all,
> 
> We have backported read path offheap (HBASE-11425) to our customized 
> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more 
> than a month, and would like to share our experience, for what it's worth 
> (smile).
> 
> Generally speaking, we gained a better and more stable throughput/performance 
> with offheap, and below are some details:
> 1. QPS become more stable with offheap
> 
> Performance w/o offheap:
> 
> 
> 
> Performance w/ offheap:
> 
> 
> 
> These data come from our online A/B test cluster (with 450 physical machines, 
> and each with 256G memory + 64 core) with real world workloads, it shows 
> using offheap we could gain a more stable throughput as well as better 
> performance
> 
> Not showing fully online data here because for online we published the 
> version with both offheap and NettyRpcServer together, so no standalone 
> comparison data for offheap
> 
> 2. Full GC frequency and cost
> 
> Average Full GC STW time reduce from 11s to 7s with offheap.
> 
> 3. Young GC frequency and cost
> 
> No performance degradation observed with offheap.
> 
> 4. Peak throughput of one single RS
> 
> On Singles Day (11/11), peak throughput of one single RS reached 100K, among 
> which 90K from Get. Plus internet in/out data we could know the average 
> result size of get request is ~1KB
> 
> 
> 
> Offheap are used on all online machines (more than 1600 nodes) instead of 
> LruCache, so the above QPS is gained from offheap bucketcache, along with 
> NettyRpcServer(HBASE-15756).
> 
> Just let us know if any comments. Thanks.
> 
> Best Regards,
> Yu
> 
> 
> 
> 
> 
> 
> 


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread 张铎
I can not see the images either...

Du, Jingcheng 于2016年11月18日 周五16:57写道:

> Thanks Yu for the sharing, great achievements.
> It seems the images cannot be displayed? Maybe just me?
>
> Regards,
> Jingcheng
>
> From: Yu Li [mailto:car...@gmail.com]
> Sent: Friday, November 18, 2016 4:11 PM
> To: user@hbase.apache.org; d...@hbase.apache.org
> Subject: Use experience and performance data of offheap from Alibaba
> online cluster
>
> Dear all,
>
> We have backported read path offheap (HBASE-11425) to our customized
> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more
> than a month, and would like to share our experience, for what it's worth
> (smile).
>
> Generally speaking, we gained a better and more stable
> throughput/performance with offheap, and below are some details:
>
> 1. QPS become more stable with offheap
>
> Performance w/o offheap:
>
> [cid:part1.582d4b6424f071c]
>
> Performance w/ offheap:
>
> [cid:part2.582d4b6424f071c]
>
> These data come from our online A/B test cluster (with 450 physical
> machines, and each with 256G memory + 64 core) with real world workloads,
> it shows using offheap we could gain a more stable throughput as well as
> better performance
>
> Not showing fully online data here because for online we published the
> version with both offheap and NettyRpcServer together, so no standalone
> comparison data for offheap
>
> 2. Full GC frequency and cost
>
> Average Full GC STW time reduce from 11s to 7s with offheap.
>
> 3. Young GC frequency and cost
>
> No performance degradation observed with offheap.
>
> 4. Peak throughput of one single RS
>
> On Singles Day (11/11), peak throughput of one single RS reached 100K,
> among which 90K from Get. Plus internet in/out data we could know the
> average result size of get request is ~1KB
>
> [cid:part3.582d4b6424f071c]
>
> Offheap are used on all online machines (more than 1600 nodes) instead of
> LruCache, so the above QPS is gained from offheap bucketcache, along with
> NettyRpcServer(HBASE-15756).
> Just let us know if any comments. Thanks.
>
> Best Regards,
> Yu
>
>
>
>
>
>
>


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Loïc Chanel
Nope, same here !

Loïc CHANEL
System Big Data engineer
MS - WASABI - Worldline (Villeurbanne, France)

2016-11-18 9:54 GMT+01:00 Du, Jingcheng :

> Thanks Yu for the sharing, great achievements.
> It seems the images cannot be displayed? Maybe just me?
>
> Regards,
> Jingcheng
>
> From: Yu Li [mailto:car...@gmail.com]
> Sent: Friday, November 18, 2016 4:11 PM
> To: user@hbase.apache.org; d...@hbase.apache.org
> Subject: Use experience and performance data of offheap from Alibaba
> online cluster
>
> Dear all,
>
> We have backported read path offheap (HBASE-11425) to our customized
> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more
> than a month, and would like to share our experience, for what it's worth
> (smile).
>
> Generally speaking, we gained a better and more stable
> throughput/performance with offheap, and below are some details:
>
> 1. QPS become more stable with offheap
>
> Performance w/o offheap:
>
> [cid:part1.582d4b6424f071c]
>
> Performance w/ offheap:
>
> [cid:part2.582d4b6424f071c]
>
> These data come from our online A/B test cluster (with 450 physical
> machines, and each with 256G memory + 64 core) with real world workloads,
> it shows using offheap we could gain a more stable throughput as well as
> better performance
>
> Not showing fully online data here because for online we published the
> version with both offheap and NettyRpcServer together, so no standalone
> comparison data for offheap
>
> 2. Full GC frequency and cost
>
> Average Full GC STW time reduce from 11s to 7s with offheap.
>
> 3. Young GC frequency and cost
>
> No performance degradation observed with offheap.
>
> 4. Peak throughput of one single RS
>
> On Singles Day (11/11), peak throughput of one single RS reached 100K,
> among which 90K from Get. Plus internet in/out data we could know the
> average result size of get request is ~1KB
>
> [cid:part3.582d4b6424f071c]
>
> Offheap are used on all online machines (more than 1600 nodes) instead of
> LruCache, so the above QPS is gained from offheap bucketcache, along with
> NettyRpcServer(HBASE-15756).
> Just let us know if any comments. Thanks.
>
> Best Regards,
> Yu
>
>
>
>
>
>
>


RE: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Du, Jingcheng
Thanks Yu for the sharing, great achievements.
It seems the images cannot be displayed? Maybe just me?

Regards,
Jingcheng

From: Yu Li [mailto:car...@gmail.com]
Sent: Friday, November 18, 2016 4:11 PM
To: user@hbase.apache.org; d...@hbase.apache.org
Subject: Use experience and performance data of offheap from Alibaba online 
cluster

Dear all,

We have backported read path offheap (HBASE-11425) to our customized 
hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more 
than a month, and would like to share our experience, for what it's worth 
(smile).

Generally speaking, we gained a better and more stable throughput/performance 
with offheap, and below are some details:

1. QPS become more stable with offheap

Performance w/o offheap:

[cid:part1.582d4b6424f071c]

Performance w/ offheap:

[cid:part2.582d4b6424f071c]

These data come from our online A/B test cluster (with 450 physical machines, 
and each with 256G memory + 64 core) with real world workloads, it shows using 
offheap we could gain a more stable throughput as well as better performance

Not showing fully online data here because for online we published the version 
with both offheap and NettyRpcServer together, so no standalone comparison data 
for offheap

2. Full GC frequency and cost

Average Full GC STW time reduce from 11s to 7s with offheap.

3. Young GC frequency and cost

No performance degradation observed with offheap.

4. Peak throughput of one single RS

On Singles Day (11/11), peak throughput of one single RS reached 100K, among 
which 90K from Get. Plus internet in/out data we could know the average result 
size of get request is ~1KB

[cid:part3.582d4b6424f071c]

Offheap are used on all online machines (more than 1600 nodes) instead of 
LruCache, so the above QPS is gained from offheap bucketcache, along with 
NettyRpcServer(HBASE-15756).
Just let us know if any comments. Thanks.

Best Regards,
Yu