I'm wondering what are the possible bottlenecks of an HBase cluster, even if there are cache mechanism, the fact that some data are centralized could lead to a bottleneck (even if its quite theoretical given the load needed to achieve it). Would it be right to say the following ?
- The namenode is storing all the meta data and must scale vertically if the cluster becomes very big - There is only one node storing the -ROOT- table and only one node storing the .META. table, if I'm doing a lot of random accesses and that my dataset is VERY large, could I overload those node? On Sat, May 14, 2011 at 3:12 PM, Thibault Dory <[email protected]>wrote: > > > On Fri, May 13, 2011 at 10:57 PM, Jean-Daniel Cryans > <[email protected]>wrote: > >> It says: >> >> "The master and namenode are the entry points of >> their respective levels, meaning that if an HBase client wants >> a specific data, it first has to ask to the master that knows >> which is the region server that stores it." >> >> Which is wrong, quoting the Bigtable paper (which your team should >> consider reading): >> > > Yes I know, this is exactly what tsuna previously pointed out. This error > is now corrected. > > >> >> "As with many single-master distributed storage sys- >> tems [17, 21], client data does not move through the mas- >> ter: clients communicate directly with tablet servers for >> reads and writes. Because Bigtable clients do not rely on >> the master for tablet location information, most clients >> never communicate with the master. As a result, the mas- >> ter is lightly loaded in practice." >> >> Which also impacts your conclusion: >> >> "For example it can be interesting to >> see when a system based on an architecture using a single >> point of entry, such as HBase and its master, would be overload" >> >> > Indeed, I'm going to change that as well. > > >> J-D >> >> On Fri, May 13, 2011 at 4:06 AM, Thibault Dory <[email protected]> >> wrote: >> > Hello, >> > >> > I have written with a few other people a paper for the ACM Symposium >> > On Cloud Computing. This paper describes the methodology, >> > infrastructure and configuration used as well as the results obtained >> > for elasticity and scalability of three noSQL databases, of wich >> > HBase. The paper can be downloaded here : >> > http://www.nosqlbenchmarking.com/wp-content/uploads/2011/05/paper.pdf< >> http://www.google.com/url?sa=D&q=http://www.nosqlbenchmarking.com/wp-content/uploads/2011/05/paper.pdf >> > >> > >> > >> > Any feedback on the methodology used would be appreciated, we would >> > like to know if HBase is used in a "fair" way in those tests. >> > >> > We also encountered a problem with the distribution of requests among >> region >> > servers. This problem is described in section 5.4.2 and any hints on how >> to >> > solve this problem would be appreciated. Please note that the request >> > generation is independent of the specific database layer and that we did >> not >> > observe this problem for the two other databases. >> > >> > Regards, >> > >> > Thibault Dory >> > >> > >
