About HMasters, yes, it's not clear. 

In section 6.1 they say that “Since we focused on a setup with a maximum of 12 
nodes, we did not assign the master node and jobtracker to separate nodes 
instead we deployed them with data nodes." 

But in section 4.1 they say that "The configuration was done using a dedicated 
node for the running master processes (NameNode and SecondaryNameNode), 
therefore for all the benchmarks the specified number of servers correspond to 
nodes running slave processes (DataNodes and TaskTrackers) as well as HBase’s 
region server processes."

About configurations, the first paragraph on "6. EXPERIENCES" contains this: 
"In our initial test runs, we ran every system with the default configuration, 
and then tried to improve the performance by changing various tuning 
parameters. We dedicated at least a week for configuring and tuning each system 
(concentrating on one system at a time) to get a fair comparison." 

I agree that would be nice to see this experiment with 0.94.1, but 0.90.4 was 
released a year ago, so I understand that this version was the official version 
when these experiments were conducted. 

Best regards,
Cristofer


-----Mensagem original-----
De: Dave Wang [mailto:[email protected]] 
Enviada em: quinta-feira, 30 de agosto de 2012 10:49
Para: [email protected]
Assunto: Re: [maybe off-topic?] article: Solving Big Data Challenges for 
Enterprise Application Performance Management

My reading of the paper is that they are actually not clear about whether or 
not HMasters were deployed on datanodes.

I'm going to guess that they just used default configurations for HBase and 
YCSB, but the paper again is not specific enough.

Why were they using 0.90.4 in 2012?  Would have been nice to see some of the 
more recent work done in the area of performance.

One thing the paper does touch on is the relative difficulty of standing up the 
cluster, which has not changed since 0.90.4.  I think that's definitely 
something that could be improved upon.

- Dave

On Thu, Aug 30, 2012 at 6:27 AM, Cristofer Weber < [email protected]> 
wrote:

> Just read this article, "Solving Big Data Challenges for Enterprise 
> Application Performance Management." published this month @ Volume 5, 
> No.12 of Proceedings of the VLDB Endowment, where they measured 6 
> different databases - Project Voldemort, Redis, HBase, Cassandra, 
> MySQL Cluster and VoltDB - with YCSB on two different kind of 
> clusters, Memory-bound and Disk-bound,  and I'm in doubt about results for 
> HBase since:
>
>
> *         HBase version was 0.90.4
>
> *         Master nodes were deployed together with data nodes
>
> *         They didn't reported tuning parameters
>
> There's also a paragraph where they reported that HBase failed 
> frequently in non-deterministic ways while running YCSB.
>
> My intention with this e-mail is to look for opinions from you, who 
> are more experienced with HBase, on where this experiment's setup 
> could be changed to improve read operations, since in this setup HBase 
> did not performed as well as Cassandra and Project Voldemort.
>
> Here's the article:
> http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf and Volume 5
> home: http://vldb.org/pvldb/vol5.html
>
>
>
>

Reply via email to