On Sat, May 14, 2011 at 9:48 PM, tsuna <[email protected]> wrote:

> On Sat, May 14, 2011 at 9:02 AM, Thibault Dory <[email protected]>
> wrote:
> > Going back to my original problem, the fact that one region server was
> > always overloaded with requests while the others were only serving a few
> > requests despite of my requests generated using a uniform distribution, I
> > would like to know what you think about the idea of Ted Yu saying that it
> > may be related to the fact that the overloaded region server could be the
> > one storing the .META. table.
>
> I haven't looked at your benchmarking code yet, but if that's the case
> then it generally indicates an implementation issue in your code
> rather than a problem in HBase.  Long-lived HBase clients tend to do a
> lot of .META. lookups at the beginning and after a while, even with
> large working sets, they need to consult .META. only occasionally.  If
> your code is doing something that causes the client to start with a
> fresh .META. every once in a while, then you could potentially end up
> bottlenecked on .META. – but then it's a performance bug in your code,
> not in HBase's.
>

In fact this is not a bug in my implementation but more of a problem in the
design of the test itself given that I restart my clients after each
benchmark run. It explains why there are a lot of lookup in the .META. table
and I will mention that this is not an optimal way of using an HBase client.


>
> As a data point, I'm currently inserting about 4000 items per second
> into thousands of different rows, in a table with almost 4000 regions,
> and my current working set is about 400 regions, and I only see .META.
> lookups every once in a while when a region gets split.
>
> > At that point in tests, the cluster was made of 24 nodes and was storing
> 40
> > millions rows in HBase. As my requests are fully random, there is a high
> > probability given the total number of entries, that a lot requests issued
> by
> > a client are for entries they did not requested before, leading to a
> lookup
> > to the .META. table for almost each request.
>
> No, not for each request.
>
> > Of course this is valid only if the client does not know that an entry it
> > never asked for is in a region it has already accessed before. Is it the
> > case? For example if a client ask for the row 10 and sees that it is in
> the
> > region 2, will it know that the row 15 is also in the region 2 without
> > making a new lookup into the .META. table?
>
> The .META. table contains key ranges, so when you do a .META. lookup,
> you discover where a whole key range is.  If row 10 and row 15 are in
> the same region, accessing row 15 after accessing row 10 will not
> require a .META. lookup.
>
> As JD said, I also strongly recommend that you read the Bigtable
> paper.  It will help clear up many misunderstandings.


Indeed, I've read it again as it was quite a long time and now that's more
clear.


> Also if you can
> explain what led you to believe that the master is in the data path of
> clients, maybe we can help address the source of this common
> misconception.
>
>
Well I don't really remember what lead me to this confusion, maybe the wrong
idea that it would be similar to HDFS. Also the fact that the HBase book is
quite sparse is not helping. If you look at
http://hbase.apache.org/book.html#architecture

"The HBase client
HTable<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html>
is
responsible for finding RegionServers that are serving the particular row
range of interest. It does this by querying the .META. and -ROOT catalog
tables (TODO: Explain)"

There is nothing said about the fact that the location of the -ROOT- table
is found in a znode. Moreover the "Master" part is empty so it's not easy
for HBase noobs to figure it out.

Anyway, thank you all for your help.


> --
> Benoit "tsuna" Sigoure
> Software Engineer @ www.StumbleUpon.com
>

Reply via email to