RES: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management

Cristofer Weber Thu, 30 Aug 2012 16:29:16 -0700

Being one of the guys who are "selling" the HBase idea at work (I've presented 
a PoC this week, by the way!), I know that sometime I will have to explain the 
conclusions from articles like this one, and this kind of conclusion probably 
will be really hard to explain. I will try to reach the authors to check which 
kind of failures they faced and the performance improvements that they made in 
their clusters, but this will not change the publication, sadly.

On the other hand, I think that I can help in a way or another, documenting 
undocumented features, collecting more data on effects of changes over default 
values and relating this changes to different HBase use cases, etc. It's hard 
to start contributing to Open Source projects sophisticated as HBase is, but 
can be a bit easer to contribute documenting features and running experiments, 
and I think that there are other ones wondering if they can contribute to HBase 
as well, but - speaking about myself - a lot of guidance is needed. Hope to get 
this guidance here ;-)

Best regards,
Cristofer
________________________________________
De: [email protected] [[email protected]] em Nome de Stack 
[[email protected]]
Enviado: quinta-feira, 30 de agosto de 2012 19:04
Para: [email protected]
Assunto: Re: [maybe off-topic?] article: Solving Big Data Challenges for 
Enterprise Application Performance Management

On Thu, Aug 30, 2012 at 7:51 AM, Cristofer Weber
<[email protected]> wrote:
> About HMasters, yes, it's not clear.
>
> In section 6.1 they say that “Since we focused on a setup with a maximum of 
> 12 nodes, we did not assign the master node and jobtracker to separate nodes 
> instead we deployed them with data nodes."
>
> But in section 4.1 they say that "The conﬁguration was done using a dedicated 
> node for the running master processes (NameNode and SecondaryNameNode), 
> therefore for all the benchmarks the speciﬁed number of servers correspond to 
> nodes running slave processes (DataNodes and TaskTrackers) as well as HBase’s 
> region server processes."
>
> About configurations, the first paragraph on "6. EXPERIENCES" contains this: 
> "In our initial test runs, we ran every system with the default conﬁguration, 
> and then tried to improve the performance by changing various tuning 
> parameters. We dedicated at least a week for conﬁguring and tuning each 
> system (concentrating on one system at a time) to get a fair comparison."
>
> I agree that would be nice to see this experiment with 0.94.1, but 0.90.4 was 
> released a year ago, so I understand that this version was the official 
> version when these experiments were conducted.
>

Its a bit tough going back in time fixing 0.90.4 results.  The
"...failed frequently in non-deterministic ways..." is an ugly mark to
have hanging over hbase in a paper like this that will probably be
around a while.  I wonder what the cause was (I don't think that
typical of 0.90.4 IIRC).

On how to improve read performance, if its not in here,
http://hbase.apache.org/book.html#performance, in the refguide, then
the tuning option might as well not exist (Anyone see anything
missing).

We consistently do bad in these tests though our operational, actual
experience seems much better than what is shown in these benchmarks.
As has been said elsewhere on this thread, the takeaway is improved
defaults and auto-tuning but the only time we get interested in
addressing these issues is the once a year when one of these reports
come out; otherwise, we seem to have other priorities when messing in
hbase code base.

St.Ack

RES: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management

Reply via email to