I'm wondering what are the possible bottlenecks of an HBase cluster, even if
there are cache mechanism, the fact that some data are centralized could
lead to a bottleneck (even if its quite theoretical given the load needed to
achieve it).
Would it be right to say the following ?

   - The namenode is storing all the meta data and must scale vertically if
the cluster becomes very big
   - There is only one node storing the -ROOT- table and only one node
storing the .META. table, if I'm doing a lot of random accesses and that my
dataset is VERY large, could I overload those node?



On Sat, May 14, 2011 at 3:12 PM, Thibault Dory <[email protected]>wrote:

>
>
> On Fri, May 13, 2011 at 10:57 PM, Jean-Daniel Cryans 
> <[email protected]>wrote:
>
>> It says:
>>
>> "The master and namenode are the entry points of
>> their respective levels, meaning that if an HBase client wants
>> a specific data, it first has to ask to the master that knows
>> which is the region server that stores it."
>>
>> Which is wrong, quoting the Bigtable paper (which your team should
>> consider reading):
>>
>
> Yes I know, this is exactly what tsuna previously pointed out. This error
> is now corrected.
>
>
>>
>> "As with many single-master distributed storage sys-
>> tems [17, 21], client data does not move through the mas-
>> ter: clients communicate directly with tablet servers for
>> reads and writes. Because Bigtable clients do not rely on
>> the master for tablet location information, most clients
>> never communicate with the master. As a result, the mas-
>> ter is lightly loaded in practice."
>>
>> Which also impacts your conclusion:
>>
>> "For example it can be interesting to
>> see when a system based on an architecture using a single
>> point of entry, such as HBase and its master, would be overload"
>>
>>
> Indeed, I'm going to change that as well.
>
>
>> J-D
>>
>> On Fri, May 13, 2011 at 4:06 AM, Thibault Dory <[email protected]>
>> wrote:
>> > Hello,
>> >
>> > I have written with a few other people a paper for the ACM Symposium
>> > On Cloud Computing. This paper describes the methodology,
>> > infrastructure and configuration used as well as the results obtained
>> > for elasticity and scalability of three noSQL databases, of wich
>> > HBase. The paper can be downloaded here :
>> > http://www.nosqlbenchmarking.com/wp-content/uploads/2011/05/paper.pdf<
>> http://www.google.com/url?sa=D&q=http://www.nosqlbenchmarking.com/wp-content/uploads/2011/05/paper.pdf
>> >
>> >
>> >
>> > Any feedback on the methodology used would be appreciated, we would
>> > like to know if HBase is used in a "fair" way in those tests.
>> >
>> > We also encountered a problem with the distribution of requests among
>> region
>> > servers. This problem is described in section 5.4.2 and any hints on how
>> to
>> > solve this problem would be appreciated. Please note that the request
>> > generation is independent of the specific database layer and that we did
>> not
>> > observe this problem for the two other databases.
>> >
>> > Regards,
>> >
>> > Thibault Dory
>> >
>>
>
>

Reply via email to