Re: Performance at large number of regions/node

Jacob Isaac Sun, 30 May 2010 09:22:46 -0700

On Sun, May 30, 2010 at 7:04 AM, Stack <st...@duboce.net> wrote:

> On Sat, May 29, 2010 at 5:52 PM, Jacob Isaac <ja...@ebrary.com> wrote:
> > Wow !! That's almost twice the throughput I got with less that 1/4 the
> > cluster size.
> >
> I'm just writing.
>
>
That is true. And I hear reading is not as efficient as writing?



> > The general flow of the loading program is
> >
> > 1. Reading/processing data from source (a local file on the machine)
> > 2. Writing data to HBase
> > 3. Reading the data from HBase and processing it.
> >
> > steps  1 and 2 happen on the same node
>
> OK.  So all 17 nodes have a file local?
>
> Our source data (files) is uniformly distributed across all 20*8 disks.


> The data is keyed?  Are the keys sorted?  The writing is not
> necessarily to the local node, right?  We'll write to the region
> responsible for the key which could be anywhere out on the cluster.
>
>
As explained in one of my earlier emails - we do gets and puts on a given
set
a set can contain anywhere from 1~20k elements (but 95% < 1000 elements)
Key is a composite-key <SHA1>:<element #>
So it is pretty random and we see good distribution happening very soon.



> > step 3 may or may not be on the same machine that wrote it.
> >
> This is is probably whats taking the time.
>
> When you read, its a random accesss?  Does the processing take much
> time?  You can't scan and process a batch of documents at a time?
>
>
Our writes and reads are pretty random (we rely on HBase handling the
distribution)
except  that we read a set almost immediately after it written.

Since our gets is for a set  - we are scanning a bunch of rows at a time.
working on multiple sets at a time - don't know whether that would help?



> > Yes the reads and writes are happening concurrently
> > and another thing to note is that the read for a particular set is
> > almost immediately after it is written
> >
> You'd think then that the data would be up in the memstore still, or
> at least, it would be ideal that if when most of the reads came in,
> that they'd find the data in memstore and not have to go to the
> filesystem (Reading from our memstore is not the best apparantly,
> speed-wise -- it needs some work -- but still better than going to the
> filesystem).
>
>
The Failed openScanner messages seems to suggest  some region name cache is
getting stale with so many splits  taking place.


>
> > In the master UI - there is steady # of request  (typically around ~
> > 500 request/RS).
> > I must admit we have not monitored it to say that's the steady rate
> > throughout the 9 hr run -
> > we have manually refresh the UI during the first two hrs and that's
> > been the observation.
> >
> OK.  Steady is good.
>
> > The average load on these machines ~5 as reported by  top/htop and
> > datacenter monitoring UI .
> >
>
> OK.  Can you figure more about the load.  Is it mostly cpu or is it i/o?
>
> > The typical messages I see in the RS logs are -
> >
> > and the typical pattern is few of them in a sudden burst and
> > periodically every 1-3 min
> >
> > Finished snapshotting, commencing flushing stores -
> > Started memstore flush for region
> > Finished memstore flush
> > Starting compaction on region
> > compaction completed on region
> > Failed openScanner
> > removing old hlog file
> > hlogs to remove  out of total
> > Updates disabled for region,
> >
> You see any blocking because too many storefiles or because
> regionserver has hit the global memory limit?
>
>
Do see 'Forced flushing of XXXX because global memstore limit of 1.6g ...."
 every 3-4 min


> If not, it might help upping your storefile size from 96M.  Perhaps
> double it so less frequent flushes (more likely the reads will find
> the data out of memory).
>
> What rate would make you happy?
>

:-) I think from an acceptable threshold - we are good!!

We are trying to size up our capacity handling metrics and
wanted to get a sense that we not way off the mark.

Also was looking for ideas and suggestions that we may have missed.

~Jacob

St.Ack
>
>
> > ~jacob
> >
> >
> > On Sat, May 29, 2010 at 12:04 PM, Stack <st...@duboce.net> wrote:
> >> On Sat, May 29, 2010 at 10:53 AM, Stack <st...@duboce.net> wrote:
> >>> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <ja...@ebrary.com> wrote:
> >>>> Here is the summary of the runs
> >>>>
> >>>> puts (~4-5k per row)
> >>>> regionsize #rows       Total time (ms)
> >>>> 1G 82282053*2      301943742
> >>>> 512M 82287593*2      313119378
> >>>> 256M 82246314*2      433200105
> >>>>
> >>>
> >>> So about 0.3ms per 5k write (presuming 100M writes?)?
> >>>
> >>
> >> I just tried loading 100M 1k rows into a 4 regionserver cluster where
> >> each node had two clients writing at any one time and it took just
> >> over an hour. If you tell me more about your loading job and if
> >> reading is happening concurrently, I can try and mock it here so we
> >> can compare (no lzo and all defaults on my cluster).
> >>
> >> St.Ack
> >>
> >
>

Re: Performance at large number of regions/node

Reply via email to