Re: Performance at large number of regions/node

Jacob Isaac Sat, 29 May 2010 17:53:00 -0700

Wow !! That's almost twice the throughput I got with less that 1/4 the
cluster size.


The general flow of the loading program is

1. Reading/processing data from source (a local file on the machine)
2. Writing data to HBase
3. Reading the data from HBase and processing it.

steps  1 and 2 happen on the same node
step 3 may or may not be on the same machine that wrote it.

Yes the reads and writes are happening concurrently
and another thing to note is that the read for a particular set is
almost immediately after it is written

In the master UI - there is steady # of request  (typically around ~
500 request/RS).
I must admit we have not monitored it to say that's the steady rate
throughout the 9 hr run -
we have manually refresh the UI during the first two hrs and that's
been the observation.

The average load on these machines ~5 as reported by  top/htop and
datacenter monitoring UI .

The typical messages I see in the RS logs are -

and the typical pattern is few of them in a sudden burst and
periodically every 1-3 min

Finished snapshotting, commencing flushing stores -
Started memstore flush for region
Finished memstore flush
Starting compaction on region
compaction completed on region
Failed openScanner
removing old hlog file
hlogs to remove  out of total
Updates disabled for region,

~jacob


On Sat, May 29, 2010 at 12:04 PM, Stack <st...@duboce.net> wrote:
> On Sat, May 29, 2010 at 10:53 AM, Stack <st...@duboce.net> wrote:
>> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <ja...@ebrary.com> wrote:
>>> Here is the summary of the runs
>>>
>>> puts (~4-5k per row)
>>> regionsize #rows       Total time (ms)
>>> 1G 82282053*2      301943742
>>> 512M 82287593*2      313119378
>>> 256M 82246314*2      433200105
>>>
>>
>> So about 0.3ms per 5k write (presuming 100M writes?)?
>>
>
> I just tried loading 100M 1k rows into a 4 regionserver cluster where
> each node had two clients writing at any one time and it took just
> over an hour. If you tell me more about your loading job and if
> reading is happening concurrently, I can try and mock it here so we
> can compare (no lzo and all defaults on my cluster).
>
> St.Ack
>

Re: Performance at large number of regions/node

Reply via email to