bq. If it were not "confidential" might you mention why there is such a large (several orders of magnitude) explosion of end user queries to backend ones? For index building and online machine learning system, there're more information recorded after each visit/trade, such as user query/click history, item stock updates, etc., and multiple user-specific feature data will be read/updated for better recommendation. The flow is pretty much like: user visit some items -> put them into shopping cart -> checkout/removing item from shopping cart -> item stock update/recommend new items to user -> user visit new items Not that much details could be supplied but I believe we could imagine how many queries/updates there will be at backend for such loops, right? (smile)
Thanks again for the interest and questions although a little bit derail of the thread, and I hope we could strengthen our faith in HBase capability after these discussions. :-) Best Regards, Yu On 21 November 2016 at 01:26, Stephen Boesch <java...@gmail.com> wrote: > Thanks Yu - given your apparent direct knowledge of the data that is > helpful (my response earlier had been to 张铎) . It is important so as to > ensure informing colleagues of numbers that are "real". > > If it were not "confidential" might you mention why there is such a large > (several orders of magnitude) explosion of end user queries to backend > ones? > > > > 2016-11-20 7:51 GMT-08:00 Yu Li <car...@gmail.com>: > > > Thanks everyone for the feedback/comments, glad this data means something > > and have drawn your interesting. Let me answer the questions (and sorry > for > > the lag) > > > > For the backport patches, ours are based on a customized 1.1.2 version > and > > cannot apply directly for any 1.x branches. It would be easy for us to > > upload existing patches somewhere but obviously not that useful... so > maybe > > we still should get them in branch-1 and officially support read-path > > offheap in future 1.x release? Let me create one JIRA about this and > let's > > discuss in the JIRA system. And to be very clear, it's a big YES to share > > our patches with all rather than only numbers, just which way is better > > (smile). > > > > And answers for @Stephen Boesch: > > > > bq. In any case the data is marked as 9/25/16 not 11/11/16 > > It's specially noted that the data on 9/25 are from our online A/B test > > cluster, and not showing fully online data because we published offheap > > together with NettyRpcServer for online thus no standalone comparison > data > > for offheap. Please check my original email more carefully (smile). > > > > bq. Repeating my earlier question: 20*Meg* queries per second?? Just > > checked and *google* does 40*K* queries per second. > > As you already noticed, the 20M QPS is number from A/B testing cluster > (450 > > nodes), and there're much more on 11/11 online cluster (1600+ nodes). > > Please note that this is NOT some cluster directly serves queries from > end > > user, but serving the index building and online machine learning system. > > Refer to our talk on hbasecon2016 (slides > > <http://www.slideshare.net/HBaseCon/improvements-to- > apache-hbase-and-its- > > applications-in-alibaba-search> > > /recording > > <https://www.youtube.com/watch?v=UVGDd2JeIMg&list=PLe-h9HrA9qfDVOeNh1l_ > > T5HvwvkO9raWy&index=10>) > > for more details, if you're interested. And different from google, > there's > > an obvious "hot spot" for us, so I don't think the QPS of these two > > different systems are comparable. > > > > bq. So maybe please check your numbers again. > > The numbers are got from online monitoring system and all real not fake, > so > > no need to check. Maybe just need some more time to take and understand? > > (smile) > > > > Best Regards, > > Yu > > > > On 20 November 2016 at 23:03, Stephen Boesch <java...@gmail.com> wrote: > > > > > Your arguments do not reflect direct knowledge of the numbers. (a) > There > > > is no super-spikiness int he graphs in the data (b) In any case the > data > > is > > > marked as 9/25/16 not 11/11/16. (c) The number of internet users says > > > little about the number of *concurrent* users. > > > > > > Overall it would be helpful for those who actually collected the data > to > > > comment - not just speculation from someone who does not. As I had > > > mentioned already there *may* be a huge fanout from number of > > > user/application queries to the backend: but *huge* it would seemingly > > need > > > to be to generate the numbers shown. > > > > > > 2016-11-19 22:39 GMT-08:00 张铎 <palomino...@gmail.com>: > > > > > > > 11.11 is something like the Black Friday. Almost every item on > Alibaba > > > will > > > > discount a lot at 11.11. Alibaba earned a 1 billion revenue within 1 > > > > minute(52 seconds) and 10 billion revenue within 7 minutes(6 minutes > 58 > > > > seconds) at 11.11. The Chinese people had payed more 120 billion > > Chinese > > > > yuan to alibaba at 11.11. And I remember that Jeff Dean used to give > a > > > > slides that for google the amplification from user queries to the > > storage > > > > system queries is also very large(I can not remember the exact > number. > > > The > > > > slides is used to explain that hedge read is very useful for reducing > > > > latency). So I think the peak throughput is true. > > > > > > > > There are more than 600 million people in China that use internet. So > > if > > > > they decide to do something to your system at the same time, it looks > > > like > > > > a DDOS to your system... > > > > > > > > Thanks. > > > > > > > > 2016-11-20 12:56 GMT+08:00 Stephen Boesch <java...@gmail.com>: > > > > > > > > > Repeating my earlier question: 20*Meg* queries per second?? Just > > > > checked > > > > > and *google* does 40*K* queries per second. Now maybe the "queries" > > > are a > > > > > decomposition of far fewer end-user queries that cause a fanout of > > > > backend > > > > > queries. *But still .. * > > > > > > > > > > So maybe please check your numbers again. > > > > > > > > > > 2016-11-19 17:05 GMT-08:00 Heng Chen <heng.chen.1...@gmail.com>: > > > > > > > > > > > The performance looks great! > > > > > > > > > > > > 2016-11-19 18:03 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > > > > > > > Opening a JIRA would be fine. > > > > > > > This makes it easier for people to obtain the patch(es). > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > >> On Nov 18, 2016, at 11:35 PM, Anoop John < > anoop.hb...@gmail.com > > > > > > > > wrote: > > > > > > >> > > > > > > >> Because of some compatibility issues, we decide that this will > > be > > > > done > > > > > > >> in 2.0 only.. Ya as Andy said, it would be great to share the > > 1.x > > > > > > >> backported patches. Is it a mega patch at ur end? Or issue > by > > > > issue > > > > > > >> patches? Latter would be best. Pls share patches in some > place > > > > and a > > > > > > >> list of issues backported. I can help with verifying the > issues > > > once > > > > > > >> so as to make sure we dont miss any... > > > > > > >> > > > > > > >> -Anoop- > > > > > > >> > > > > > > >>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar < > > > > enis....@gmail.com> > > > > > > wrote: > > > > > > >>> Thanks for sharing this. Great work. > > > > > > >>> > > > > > > >>> I don't see any reason why we cannot backport to branch-1. > > > > > > >>> > > > > > > >>> Enis > > > > > > >>> > > > > > > >>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell < > > > > > > andrew.purt...@gmail.com> > > > > > > >>> wrote: > > > > > > >>> > > > > > > >>>> Yes, please, the patches will be useful to the community > even > > if > > > > we > > > > > > decide > > > > > > >>>> not to backport into an official 1.x release. > > > > > > >>>> > > > > > > >>>> > > > > > > >>>>>> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault < > > > > > > >>>>> bbeaudrea...@hubspot.com> wrote: > > > > > > >>>>> > > > > > > >>>>> Is the backported patch available anywhere? Not seeing it > on > > > the > > > > > > >>>> referenced > > > > > > >>>>> JIRA. If it ends up not getting officially backported to > > > branch-1 > > > > > > due to > > > > > > >>>>> 2.0 around the corner, some of us who build our own deploy > > may > > > > want > > > > > > to > > > > > > >>>>> integrate into our builds. Thanks! These numbers look great > > > > > > >>>>> > > > > > > >>>>>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John < > > > > > anoop.hb...@gmail.com> > > > > > > >>>> wrote: > > > > > > >>>>>> > > > > > > >>>>>> Hi Yu Li > > > > > > >>>>>> Good to see that the off heap work help you.. > > > The > > > > > perf > > > > > > >>>>>> numbers looks great. So this is a compare of on heap L1 > > cache > > > > vs > > > > > > off > > > > > > >>>> heap > > > > > > >>>>>> L2 cache(HBASE-11425 enabled). So for 2.0 we should make > > L2 > > > > off > > > > > > heap > > > > > > >>>>>> cache ON by default I believe. Will raise a jira for that > > we > > > > can > > > > > > >>>> discuss > > > > > > >>>>>> under that. Seems like L2 off heap cache for data blocks > > and > > > > L1 > > > > > > cache > > > > > > >>>> for > > > > > > >>>>>> index blocks seems a right choice. > > > > > > >>>>>> > > > > > > >>>>>> Thanks for the backport and the help in testing the > > feature.. > > > > You > > > > > > were > > > > > > >>>>>> able to find some corner case bugs and helped community to > > fix > > > > > > them.. > > > > > > >>>>>> Thanks goes to ur whole team. > > > > > > >>>>>> > > > > > > >>>>>> -Anoop- > > > > > > >>>>>> > > > > > > >>>>>> > > > > > > >>>>>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li < > car...@gmail.com> > > > > > wrote: > > > > > > >>>>>>> > > > > > > >>>>>>> Sorry guys, let me retry the inline images: > > > > > > >>>>>>> > > > > > > >>>>>>> Performance w/o offheap: > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> Performance w/ offheap: > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> Peak Get QPS of one single RS during Singles' Day > (11/11): > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> And attach the files in case inline still not working: > > > > > > >>>>>>> > > > > > > >>>>>>> Performance_without_offheap.png > > > > > > >>>>>>> < > > > > > > >>>>>> https://drive.google.com/file/d/0B017Q40_ > > > > > > F5uwbWEzUGktYVIya3JkcXVjRkFvVG > > > > > > >>>> NtM0VxWC1n/view?usp=drive_web > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> Performance_with_offheap.png > > > > > > >>>>>>> < > > > > > > >>>>>> https://drive.google.com/file/d/0B017Q40_ > > > > > > F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF > > > > > > >>>> VrcUdPc2ww/view?usp=drive_web > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> Peak_Get_QPS_of_Single_RS.png > > > > > > >>>>>>> < > > > > > > >>>>>> https://drive.google.com/file/d/0B017Q40_ > > > > > > F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3 > > > > > > >>>> F6bHpNYnJz/view?usp=drive_web > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> Best Regards, > > > > > > >>>>>>> Yu > > > > > > >>>>>>> > > > > > > >>>>>>>> On 18 November 2016 at 19:29, Ted Yu < > yuzhih...@gmail.com > > > > > > > > wrote: > > > > > > >>>>>>>> > > > > > > >>>>>>>> Yu: > > > > > > >>>>>>>> With positive results, more hbase users would be asking > > for > > > > the > > > > > > >>>> backport > > > > > > >>>>>>>> of offheap read path patches. > > > > > > >>>>>>>> > > > > > > >>>>>>>> Do you think you or your coworker has the bandwidth to > > > publish > > > > > > >>>> backport > > > > > > >>>>>>>> for branch-1 ? > > > > > > >>>>>>>> > > > > > > >>>>>>>> Thanks > > > > > > >>>>>>>> > > > > > > >>>>>>>>> On Nov 18, 2016, at 12:11 AM, Yu Li <car...@gmail.com> > > > > wrote: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Dear all, > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> We have backported read path offheap (HBASE-11425) to > our > > > > > > customized > > > > > > >>>>>>>> hbase-1.1.2 (thanks @Anoop for the help/support) and run > > it > > > > > > online for > > > > > > >>>>>> more > > > > > > >>>>>>>> than a month, and would like to share our experience, > for > > > what > > > > > > it's > > > > > > >>>>>> worth > > > > > > >>>>>>>> (smile). > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Generally speaking, we gained a better and more stable > > > > > > >>>>>>>> throughput/performance with offheap, and below are some > > > > details: > > > > > > >>>>>>>>> 1. QPS become more stable with offheap > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Performance w/o offheap: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Performance w/ offheap: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> These data come from our online A/B test cluster (with > > 450 > > > > > > physical > > > > > > >>>>>>>> machines, and each with 256G memory + 64 core) with real > > > world > > > > > > >>>>>> workloads, > > > > > > >>>>>>>> it shows using offheap we could gain a more stable > > > throughput > > > > as > > > > > > well > > > > > > >>>> as > > > > > > >>>>>>>> better performance > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Not showing fully online data here because for online > we > > > > > > published > > > > > > >>>> the > > > > > > >>>>>>>> version with both offheap and NettyRpcServer together, > so > > no > > > > > > >>>> standalone > > > > > > >>>>>>>> comparison data for offheap > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> 2. Full GC frequency and cost > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Average Full GC STW time reduce from 11s to 7s with > > > offheap. > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> 3. Young GC frequency and cost > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> No performance degradation observed with offheap. > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> 4. Peak throughput of one single RS > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> On Singles Day (11/11), peak throughput of one single > RS > > > > > reached > > > > > > >>>> 100K, > > > > > > >>>>>>>> among which 90K from Get. Plus internet in/out data we > > could > > > > > know > > > > > > the > > > > > > >>>>>>>> average result size of get request is ~1KB > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Offheap are used on all online machines (more than 1600 > > > > nodes) > > > > > > >>>> instead > > > > > > >>>>>>>> of LruCache, so the above QPS is gained from offheap > > > > > bucketcache, > > > > > > >>>> along > > > > > > >>>>>>>> with NettyRpcServer(HBASE-15756). > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Just let us know if any comments. Thanks. > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Best Regards, > > > > > > >>>>>>>>> Yu > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > >