Oh I see, I was looking at the jstack and thought that you must be
having a ton of families, and you confirm that.

The fact that we store the table schema with EVERY meta row isn't
usually such a bad thing, but in your case I guess it's becoming huge
and it's taking a long time to deserialize!

I think you should review your schema to use at most a handful of families.

> Seems the client-side metaCache (for region infos) is not work, and then
> every submit of puts will do metaScan.

My guess is that you're splitting a lot since you're inserting a lot
of data? If you still wish to continue with your current schema, maybe
pre-splitting the table would help a lot (checkout HBaseAdmin)? Also
the prefetching of .META. rows is killing your client performance, so
instead set hbase.client.prefetch.limit to like 1 or 2 instead of the
default of 10.

J-D

On Thu, Feb 24, 2011 at 12:37 AM, Schubert Zhang <[email protected]> wrote:
> New clues:
>
> Seems the client-side metaCache (for region infos) is not work, and then
> every submit of puts will do metaScan.
> The specific of my test is:
>
> The table have many column family (366 cfs for every day of a year), but
> only one column family is active now for writing data, so the memory usage
> for memstore is ok.
>
> Then, when do metaScan for regioninfos, the code will run into large loop to
> get and deserialize every column family info.
>
> When the number of regions increase (64 in my test), the loop will be 366*64
> for each put submit. Then the client thread become very busy.
>
>
> Now, we should determine why to do metaScan for each submit of puts.
>
>
> On Thu, Feb 24, 2011 at 11:53 AM, Schubert Zhang <[email protected]> wrote:
>
>> Currently, with 0.90.1, this issue happen when there is only 8 regions in
>> each RS, and totally 64 regions in all totally 8 RS.
>>
>> Ths CPU% of the client is very high.
>>
>>   On Thu, Feb 24, 2011 at 10:55 AM, Schubert Zhang <[email protected]>wrote:
>>
>>> Now, I am trying the 0.90.1, but this issue is still there.
>>>
>>> I attach the jstack output. Coud you please help me analyze it.
>>>
>>> Seems all the 8 client threads are doing metaScan!
>>>
>>>   On Sat, Jan 29, 2011 at 1:02 AM, Stack <[email protected]> wrote:
>>>
>>>> On Thu, Jan 27, 2011 at 10:33 PM, Schubert Zhang <[email protected]>
>>>> wrote:
>>>> > 1. The .META. table seems ok
>>>> >     I can read my data table (one thread for reading).
>>>> >     I can use hbase shell to scan my data table.
>>>> >     And I can use 1~4 threads to put more data into my data table.
>>>> >
>>>>
>>>> Good.  This would seem to say that .META. is not locked out (You are
>>>> doing these scans while your 8+client process is hung?).
>>>>
>>>>
>>>> >    Before this issue happen, about 800 millions entities (column) have
>>>> been
>>>> > put into the table successfully, and there are 253 regions for this
>>>> table.
>>>> >
>>>>
>>>>
>>>> So, you were running fine with 8+ clients until you hit the 800million
>>>> entries?
>>>>
>>>>
>>>> > 3. All clients use HBaseConfiguration.create() for a new Configuration
>>>> > instance.
>>>> >
>>>>
>>>> Do you do this for each new instance of HTable or do you pass them all
>>>> the same Configuration instance?
>>>>
>>>>
>>>> > 4. The 8+ client threads running on a single machine and a single JVM.
>>>> >
>>>>
>>>> How many instances of this process?  One or many?
>>>>
>>>>
>>>> > 5. Seems all 8+ threads are blocked in same location waiting on call to
>>>> > return.
>>>> >
>>>>
>>>> If you want to paste a thread dump of your client, some one of us will
>>>> give it a gander.
>>>>
>>>> St.Ack
>>>>
>>>
>>>
>>
>

Reply via email to