About HMasters, yes, it's not clear. In section 6.1 they say that “Since we focused on a setup with a maximum of 12 nodes, we did not assign the master node and jobtracker to separate nodes instead we deployed them with data nodes."
But in section 4.1 they say that "The configuration was done using a dedicated node for the running master processes (NameNode and SecondaryNameNode), therefore for all the benchmarks the specified number of servers correspond to nodes running slave processes (DataNodes and TaskTrackers) as well as HBase’s region server processes." About configurations, the first paragraph on "6. EXPERIENCES" contains this: "In our initial test runs, we ran every system with the default configuration, and then tried to improve the performance by changing various tuning parameters. We dedicated at least a week for configuring and tuning each system (concentrating on one system at a time) to get a fair comparison." I agree that would be nice to see this experiment with 0.94.1, but 0.90.4 was released a year ago, so I understand that this version was the official version when these experiments were conducted. Best regards, Cristofer -----Mensagem original----- De: Dave Wang [mailto:[email protected]] Enviada em: quinta-feira, 30 de agosto de 2012 10:49 Para: [email protected] Assunto: Re: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management My reading of the paper is that they are actually not clear about whether or not HMasters were deployed on datanodes. I'm going to guess that they just used default configurations for HBase and YCSB, but the paper again is not specific enough. Why were they using 0.90.4 in 2012? Would have been nice to see some of the more recent work done in the area of performance. One thing the paper does touch on is the relative difficulty of standing up the cluster, which has not changed since 0.90.4. I think that's definitely something that could be improved upon. - Dave On Thu, Aug 30, 2012 at 6:27 AM, Cristofer Weber < [email protected]> wrote: > Just read this article, "Solving Big Data Challenges for Enterprise > Application Performance Management." published this month @ Volume 5, > No.12 of Proceedings of the VLDB Endowment, where they measured 6 > different databases - Project Voldemort, Redis, HBase, Cassandra, > MySQL Cluster and VoltDB - with YCSB on two different kind of > clusters, Memory-bound and Disk-bound, and I'm in doubt about results for > HBase since: > > > * HBase version was 0.90.4 > > * Master nodes were deployed together with data nodes > > * They didn't reported tuning parameters > > There's also a paragraph where they reported that HBase failed > frequently in non-deterministic ways while running YCSB. > > My intention with this e-mail is to look for opinions from you, who > are more experienced with HBase, on where this experiment's setup > could be changed to improve read operations, since in this setup HBase > did not performed as well as Cassandra and Project Voldemort. > > Here's the article: > http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf and Volume 5 > home: http://vldb.org/pvldb/vol5.html > > > >
