Re: Feedback on ACM SOCC paper about elasticity and scalability

Jean-Daniel Cryans Sat, 14 May 2011 08:17:14 -0700

On Sat, May 14, 2011 at 6:40 AM, Thibault Dory <[email protected]> wrote:
> I'm wondering what are the possible bottlenecks of an HBase cluster, even if
> there are cache mechanism, the fact that some data are centralized could
> lead to a bottleneck (even if its quite theoretical given the load needed to
> achieve it).


Isn't that what your paper is about?

> Would it be right to say the following ?
>
>   - The namenode is storing all the meta data and must scale vertically if
> the cluster becomes very big

The fact that there's only 1 namenode is bad in multiple ways,
generally people will be more bothered by the fact that it's a single
point of failure. Larger companies do hit the limits of that single
machine so Y! worked on "Federated Namenodes" as a way to circumvent
that. See http://www.slideshare.net/huguk/hdfs-federation-hadoop-summit2011

This work is already available in hadoop's svn trunk.

>   - There is only one node storing the -ROOT- table and only one node
> storing the .META. table, if I'm doing a lot of random accesses and that my
> dataset is VERY large, could I overload those node?

Again, I believe this is the subject of your paper right? Anyways so
in general in -ROOT- has 1 row, and that row is cached. Even if you
have thousands of clients that need to update their .META. location
(this would only happen at the beginning of a MR job or if .META.
moves), serving from memory is fast.

Next you have .META., again the clients cache their region locations
so once they have it they don't need to talk to .META. until a region
moves or gets split. Also .META. isn't that big and is usually served
directly from memory.

The BT paper mentions they allow the splitting of .META. when it grows
a bit too much and this is something we've blocked for the moment in
HBase.

J-D

Re: Feedback on ACM SOCC paper about elasticity and scalability

Reply via email to