Re: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management

lars hofhansl Thu, 30 Aug 2012 15:01:59 -0700

Many of us have been saying for a while that the client needs love (i.e. needs 
to be rewritten) and that a new client should follow an async API (maybe with a 
thin synchronous veneer of top of it).


The client is a big piece of HBase. And implementing all the aspects including 
security is a major task and nobody has committed the necessary resources for 
it, yet.
asynchbase is a start, but it does not support many of the HBase features 
(coprocessors, security, etc).


-- Lars



________________________________
 From: Andrew Purtell <[email protected]>
To: "[email protected]" <[email protected]>; lars hofhansl 
<[email protected]> 
Sent: Thursday, August 30, 2012 2:41 PM
Subject: Re: [maybe off-topic?] article: Solving Big Data Challenges for 
Enterprise Application Performance Management
 

I do want to take a closer look at it. Not with the intent to replace the PB 
RPC with it but its odd to have two RPC stacks. What refactoring and code 
simplification/removal opportunities are here? Don't know (yet). More 
generally, to experiment with simple native async clients. 

On Thursday, August 30, 2012, lars hofhansl  wrote:

0.94+ has the option to run a thrift-server-thread inside the RegionServers. 
Maybe we should improve upon that?
>
>
>
>________________________________
> From: Andrew Purtell <[email protected]>
>To: Andrew Purtell <[email protected]>
>Cc: "[email protected]" <[email protected]>
>Sent: Thursday, August 30, 2012 9:41 AM
>Subject: Re: [maybe off-topic?] article: Solving Big Data Challenges for 
>Enterprise Application Performance Management
>
>Just want to clarify I mean experimenting with the approach of the Thrift
>client work not use of Thrift particularly.
>
>On Thursday, August 30, 2012, Andrew Purtell wrote:
>
>> This paper could very well have benchmarked the relative performance of
>> the YCSB drivers. Some take aways for me here are:
>>
>>     - Cluster setup is too difficult still
>>
>>     - There are opportunities for autotuning that would make it easier for
>> users to get it right the first time and for academics and casual
>> benchmarkers alike to get a good result without becoming experts with HBase
>> configuration
>>
>>     - The client library has been evolving toward fully async dispatch, we
>> should focus on this, perhaps even consider reimplementing sync client on a
>> refactored async core. And look at making the Thrift based stuff FB put in
>> front and center, because then native clients are possible.
>>
>>     - Given the above client work, the YCSB HBase driver should have a
>> rewrite.
>>
>> On Thu, Aug 30, 2012 at 4:49 PM, Dave Wang 
>> <[email protected]<javascript:_e({}, 'cvml', '[email protected]');>
>> > wrote:
>>
>>> My reading of the paper is that they are actually not clear about whether
>>> or not HMasters were deployed on datanodes.
>>>
>>> I'm going to guess that they just used default configurations for HBase
>>> and
>>> YCSB, but the paper again is not specific enough.
>>>
>>> Why were they using 0.90.4 in 2012?  Would have been nice to see some of
>>> the more recent work done in the area of performance.
>>>
>>> One thing the paper does touch on is the relative difficulty of standing
>>> up
>>> the cluster, which has not changed since 0.90.4.  I think that's
>>> definitely
>>> something that could be improved upon.
>>>
>>> - Dave
>>>
>>> On Thu, Aug 30, 2012 at 6:27 AM, Cristofer Weber <
>>> [email protected] <javascript:_e({}, 'cvml',
>>> '[email protected]');>> wrote:
>>>
>>> > Just read this article, "Solving Big Data Challenges for Enterprise
>>> > Application Performance Management." published this month @ Volume 5,
>>> No.12
>>> > of Proceedings of the VLDB Endowment, where they measured 6 different
>>> > databases - Project Voldemort, Redis, HBase, Cassandra, MySQL Cluster
>>> and
>>> > VoltDB - with YCSB on two different kind of clusters, Memory-bound and
>>> > Disk-bound,  and I'm in doubt about results for HBase since:
>>> >
>>> >
>>> > *         HBase version was 0.90.4
>>> >
>>> > *         Master nodes were deployed together with data nodes
>>> >
>>> > *         They didn't reported tuning parameters
>>> >
>>> > There's also a paragraph where they reported that HBase failed
>>> frequently
>>> > in non-deterministic ways while running YCSB.
>>> >
>>> > My intention with this e-mail is to look for opinions from you, who are
>>> > more experienced with HBase, on where this experiment's setup could be
>>> > changed to improve read operations, since in this setup HBase did not
>>> > performed as well as Cassandra and Project Voldemort.
>>> >
>>> > Here's the article:
>>> > http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf and Volume 5
>>> > home: http://vldb.org/pvldb/vol5.html
>>> >
>>> >
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>>
>
>--
>Best regards,
>
>   - Andy
>
>Problems worthy of attack prove their worth by hitting back. - Piet Hein
>(via Tom White)

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)

Re: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management

Reply via email to