On Sat, May 29, 2010 at 5:52 PM, Jacob Isaac <ja...@ebrary.com> wrote:
> Wow !! That's almost twice the throughput I got with less that 1/4 the
> cluster size.
>
I'm just writing.

> The general flow of the loading program is
>
> 1. Reading/processing data from source (a local file on the machine)
> 2. Writing data to HBase
> 3. Reading the data from HBase and processing it.
>
> steps  1 and 2 happen on the same node

OK.  So all 17 nodes have a file local?

The data is keyed?  Are the keys sorted?  The writing is not
necessarily to the local node, right?  We'll write to the region
responsible for the key which could be anywhere out on the cluster.

> step 3 may or may not be on the same machine that wrote it.
>
This is is probably whats taking the time.

When you read, its a random accesss?  Does the processing take much
time?  You can't scan and process a batch of documents at a time?

> Yes the reads and writes are happening concurrently
> and another thing to note is that the read for a particular set is
> almost immediately after it is written
>
You'd think then that the data would be up in the memstore still, or
at least, it would be ideal that if when most of the reads came in,
that they'd find the data in memstore and not have to go to the
filesystem (Reading from our memstore is not the best apparantly,
speed-wise -- it needs some work -- but still better than going to the
filesystem).


> In the master UI - there is steady # of request  (typically around ~
> 500 request/RS).
> I must admit we have not monitored it to say that's the steady rate
> throughout the 9 hr run -
> we have manually refresh the UI during the first two hrs and that's
> been the observation.
>
OK.  Steady is good.

> The average load on these machines ~5 as reported by  top/htop and
> datacenter monitoring UI .
>

OK.  Can you figure more about the load.  Is it mostly cpu or is it i/o?

> The typical messages I see in the RS logs are -
>
> and the typical pattern is few of them in a sudden burst and
> periodically every 1-3 min
>
> Finished snapshotting, commencing flushing stores -
> Started memstore flush for region
> Finished memstore flush
> Starting compaction on region
> compaction completed on region
> Failed openScanner
> removing old hlog file
> hlogs to remove  out of total
> Updates disabled for region,
>
You see any blocking because too many storefiles or because
regionserver has hit the global memory limit?

If not, it might help upping your storefile size from 96M.  Perhaps
double it so less frequent flushes (more likely the reads will find
the data out of memory).

What rate would make you happy?
St.Ack


> ~jacob
>
>
> On Sat, May 29, 2010 at 12:04 PM, Stack <st...@duboce.net> wrote:
>> On Sat, May 29, 2010 at 10:53 AM, Stack <st...@duboce.net> wrote:
>>> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <ja...@ebrary.com> wrote:
>>>> Here is the summary of the runs
>>>>
>>>> puts (~4-5k per row)
>>>> regionsize #rows       Total time (ms)
>>>> 1G 82282053*2      301943742
>>>> 512M 82287593*2      313119378
>>>> 256M 82246314*2      433200105
>>>>
>>>
>>> So about 0.3ms per 5k write (presuming 100M writes?)?
>>>
>>
>> I just tried loading 100M 1k rows into a 4 regionserver cluster where
>> each node had two clients writing at any one time and it took just
>> over an hour. If you tell me more about your loading job and if
>> reading is happening concurrently, I can try and mock it here so we
>> can compare (no lzo and all defaults on my cluster).
>>
>> St.Ack
>>
>

Reply via email to