Re: Performance at large number of regions/node

Stack Mon, 31 May 2010 08:37:50 -0700

On Sun, May 30, 2010 at 9:22 AM, Jacob Isaac <ja...@ebrary.com> wrote:
> On Sun, May 30, 2010 at 7:04 AM, Stack <st...@duboce.net> wrote:
> Our writes and reads are pretty random (we rely on HBase handling the
> distribution)
> except  that we read a set almost immediately after it written.
>
> Since our gets is for a set  - we are scanning a bunch of rows at a time.
> working on multiple sets at a time - don't know whether that would help?
>


So, you are scanning (looks like you can given your key type assuming
the sha-1 is the set identifier).


> The Failed openScanner messages seems to suggest  some region name cache is
> getting stale with so many splits  taking place.
>

Paste the exception.


> Do see 'Forced flushing of XXXX because global memstore limit of 1.6g ...."
>  every 3-4 min
>
Do these periods last a while or are they short?

You think the scenario described by Jon Gray over in HBASE-2375?


> We are trying to size up our capacity handling metrics and
> wanted to get a sense that we not way off the mark.
>

Well, you seem to have the basics right and you seem to have a good
handle on how the systems interact.  All that is left, it would seem
is to try lzo as J-D suggests.

Good stuff Jacob,
St.Ack

> Also was looking for ideas and suggestions that we may have missed.
>
> ~Jacob
>
> St.Ack
>>
>>
>> > ~jacob
>> >
>> >
>> > On Sat, May 29, 2010 at 12:04 PM, Stack <st...@duboce.net> wrote:
>> >> On Sat, May 29, 2010 at 10:53 AM, Stack <st...@duboce.net> wrote:
>> >>> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <ja...@ebrary.com> wrote:
>> >>>> Here is the summary of the runs
>> >>>>
>> >>>> puts (~4-5k per row)
>> >>>> regionsize #rows       Total time (ms)
>> >>>> 1G 82282053*2      301943742
>> >>>> 512M 82287593*2      313119378
>> >>>> 256M 82246314*2      433200105
>> >>>>
>> >>>
>> >>> So about 0.3ms per 5k write (presuming 100M writes?)?
>> >>>
>> >>
>> >> I just tried loading 100M 1k rows into a 4 regionserver cluster where
>> >> each node had two clients writing at any one time and it took just
>> >> over an hour. If you tell me more about your loading job and if
>> >> reading is happening concurrently, I can try and mock it here so we
>> >> can compare (no lzo and all defaults on my cluster).
>> >>
>> >> St.Ack
>> >>
>> >
>>
>

Re: Performance at large number of regions/node

Reply via email to