Many of us have been saying for a while that the client needs love (i.e. needs to be rewritten) and that a new client should follow an async API (maybe with a thin synchronous veneer of top of it).
The client is a big piece of HBase. And implementing all the aspects including security is a major task and nobody has committed the necessary resources for it, yet. asynchbase is a start, but it does not support many of the HBase features (coprocessors, security, etc). -- Lars ________________________________ From: Andrew Purtell <[email protected]> To: "[email protected]" <[email protected]>; lars hofhansl <[email protected]> Sent: Thursday, August 30, 2012 2:41 PM Subject: Re: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management I do want to take a closer look at it. Not with the intent to replace the PB RPC with it but its odd to have two RPC stacks. What refactoring and code simplification/removal opportunities are here? Don't know (yet). More generally, to experiment with simple native async clients. On Thursday, August 30, 2012, lars hofhansl wrote: 0.94+ has the option to run a thrift-server-thread inside the RegionServers. Maybe we should improve upon that? > > > >________________________________ > From: Andrew Purtell <[email protected]> >To: Andrew Purtell <[email protected]> >Cc: "[email protected]" <[email protected]> >Sent: Thursday, August 30, 2012 9:41 AM >Subject: Re: [maybe off-topic?] article: Solving Big Data Challenges for >Enterprise Application Performance Management > >Just want to clarify I mean experimenting with the approach of the Thrift >client work not use of Thrift particularly. > >On Thursday, August 30, 2012, Andrew Purtell wrote: > >> This paper could very well have benchmarked the relative performance of >> the YCSB drivers. Some take aways for me here are: >> >> - Cluster setup is too difficult still >> >> - There are opportunities for autotuning that would make it easier for >> users to get it right the first time and for academics and casual >> benchmarkers alike to get a good result without becoming experts with HBase >> configuration >> >> - The client library has been evolving toward fully async dispatch, we >> should focus on this, perhaps even consider reimplementing sync client on a >> refactored async core. And look at making the Thrift based stuff FB put in >> front and center, because then native clients are possible. >> >> - Given the above client work, the YCSB HBase driver should have a >> rewrite. >> >> On Thu, Aug 30, 2012 at 4:49 PM, Dave Wang >> <[email protected]<javascript:_e({}, 'cvml', '[email protected]');> >> > wrote: >> >>> My reading of the paper is that they are actually not clear about whether >>> or not HMasters were deployed on datanodes. >>> >>> I'm going to guess that they just used default configurations for HBase >>> and >>> YCSB, but the paper again is not specific enough. >>> >>> Why were they using 0.90.4 in 2012? Would have been nice to see some of >>> the more recent work done in the area of performance. >>> >>> One thing the paper does touch on is the relative difficulty of standing >>> up >>> the cluster, which has not changed since 0.90.4. I think that's >>> definitely >>> something that could be improved upon. >>> >>> - Dave >>> >>> On Thu, Aug 30, 2012 at 6:27 AM, Cristofer Weber < >>> [email protected] <javascript:_e({}, 'cvml', >>> '[email protected]');>> wrote: >>> >>> > Just read this article, "Solving Big Data Challenges for Enterprise >>> > Application Performance Management." published this month @ Volume 5, >>> No.12 >>> > of Proceedings of the VLDB Endowment, where they measured 6 different >>> > databases - Project Voldemort, Redis, HBase, Cassandra, MySQL Cluster >>> and >>> > VoltDB - with YCSB on two different kind of clusters, Memory-bound and >>> > Disk-bound, and I'm in doubt about results for HBase since: >>> > >>> > >>> > * HBase version was 0.90.4 >>> > >>> > * Master nodes were deployed together with data nodes >>> > >>> > * They didn't reported tuning parameters >>> > >>> > There's also a paragraph where they reported that HBase failed >>> frequently >>> > in non-deterministic ways while running YCSB. >>> > >>> > My intention with this e-mail is to look for opinions from you, who are >>> > more experienced with HBase, on where this experiment's setup could be >>> > changed to improve read operations, since in this setup HBase did not >>> > performed as well as Cassandra and Project Voldemort. >>> > >>> > Here's the article: >>> > http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf and Volume 5 >>> > home: http://vldb.org/pvldb/vol5.html >>> > >>> > >>> > >>> > >>> >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> > >-- >Best regards, > > - Andy > >Problems worthy of attack prove their worth by hitting back. - Piet Hein >(via Tom White) -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
