Being one of the guys who are "selling" the HBase idea at work (I've presented a PoC this week, by the way!), I know that sometime I will have to explain the conclusions from articles like this one, and this kind of conclusion probably will be really hard to explain. I will try to reach the authors to check which kind of failures they faced and the performance improvements that they made in their clusters, but this will not change the publication, sadly.
On the other hand, I think that I can help in a way or another, documenting undocumented features, collecting more data on effects of changes over default values and relating this changes to different HBase use cases, etc. It's hard to start contributing to Open Source projects sophisticated as HBase is, but can be a bit easer to contribute documenting features and running experiments, and I think that there are other ones wondering if they can contribute to HBase as well, but - speaking about myself - a lot of guidance is needed. Hope to get this guidance here ;-) Best regards, Cristofer ________________________________________ De: [email protected] [[email protected]] em Nome de Stack [[email protected]] Enviado: quinta-feira, 30 de agosto de 2012 19:04 Para: [email protected] Assunto: Re: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management On Thu, Aug 30, 2012 at 7:51 AM, Cristofer Weber <[email protected]> wrote: > About HMasters, yes, it's not clear. > > In section 6.1 they say that “Since we focused on a setup with a maximum of > 12 nodes, we did not assign the master node and jobtracker to separate nodes > instead we deployed them with data nodes." > > But in section 4.1 they say that "The configuration was done using a dedicated > node for the running master processes (NameNode and SecondaryNameNode), > therefore for all the benchmarks the specified number of servers correspond to > nodes running slave processes (DataNodes and TaskTrackers) as well as HBase’s > region server processes." > > About configurations, the first paragraph on "6. EXPERIENCES" contains this: > "In our initial test runs, we ran every system with the default configuration, > and then tried to improve the performance by changing various tuning > parameters. We dedicated at least a week for configuring and tuning each > system (concentrating on one system at a time) to get a fair comparison." > > I agree that would be nice to see this experiment with 0.94.1, but 0.90.4 was > released a year ago, so I understand that this version was the official > version when these experiments were conducted. > Its a bit tough going back in time fixing 0.90.4 results. The "...failed frequently in non-deterministic ways..." is an ugly mark to have hanging over hbase in a paper like this that will probably be around a while. I wonder what the cause was (I don't think that typical of 0.90.4 IIRC). On how to improve read performance, if its not in here, http://hbase.apache.org/book.html#performance, in the refguide, then the tuning option might as well not exist (Anyone see anything missing). We consistently do bad in these tests though our operational, actual experience seems much better than what is shown in these benchmarks. As has been said elsewhere on this thread, the takeaway is improved defaults and auto-tuning but the only time we get interested in addressing these issues is the once a year when one of these reports come out; otherwise, we seem to have other priorities when messing in hbase code base. St.Ack
