On Sun, May 30, 2010 at 7:04 AM, Stack <st...@duboce.net> wrote: > On Sat, May 29, 2010 at 5:52 PM, Jacob Isaac <ja...@ebrary.com> wrote: > > Wow !! That's almost twice the throughput I got with less that 1/4 the > > cluster size. > > > I'm just writing. > > That is true. And I hear reading is not as efficient as writing?
> > The general flow of the loading program is > > > > 1. Reading/processing data from source (a local file on the machine) > > 2. Writing data to HBase > > 3. Reading the data from HBase and processing it. > > > > steps 1 and 2 happen on the same node > > OK. So all 17 nodes have a file local? > > Our source data (files) is uniformly distributed across all 20*8 disks. > The data is keyed? Are the keys sorted? The writing is not > necessarily to the local node, right? We'll write to the region > responsible for the key which could be anywhere out on the cluster. > > As explained in one of my earlier emails - we do gets and puts on a given set a set can contain anywhere from 1~20k elements (but 95% < 1000 elements) Key is a composite-key <SHA1>:<element #> So it is pretty random and we see good distribution happening very soon. > > step 3 may or may not be on the same machine that wrote it. > > > This is is probably whats taking the time. > > When you read, its a random accesss? Does the processing take much > time? You can't scan and process a batch of documents at a time? > > Our writes and reads are pretty random (we rely on HBase handling the distribution) except that we read a set almost immediately after it written. Since our gets is for a set - we are scanning a bunch of rows at a time. working on multiple sets at a time - don't know whether that would help? > > Yes the reads and writes are happening concurrently > > and another thing to note is that the read for a particular set is > > almost immediately after it is written > > > You'd think then that the data would be up in the memstore still, or > at least, it would be ideal that if when most of the reads came in, > that they'd find the data in memstore and not have to go to the > filesystem (Reading from our memstore is not the best apparantly, > speed-wise -- it needs some work -- but still better than going to the > filesystem). > > The Failed openScanner messages seems to suggest some region name cache is getting stale with so many splits taking place. > > > In the master UI - there is steady # of request (typically around ~ > > 500 request/RS). > > I must admit we have not monitored it to say that's the steady rate > > throughout the 9 hr run - > > we have manually refresh the UI during the first two hrs and that's > > been the observation. > > > OK. Steady is good. > > > The average load on these machines ~5 as reported by top/htop and > > datacenter monitoring UI . > > > > OK. Can you figure more about the load. Is it mostly cpu or is it i/o? > > > The typical messages I see in the RS logs are - > > > > and the typical pattern is few of them in a sudden burst and > > periodically every 1-3 min > > > > Finished snapshotting, commencing flushing stores - > > Started memstore flush for region > > Finished memstore flush > > Starting compaction on region > > compaction completed on region > > Failed openScanner > > removing old hlog file > > hlogs to remove out of total > > Updates disabled for region, > > > You see any blocking because too many storefiles or because > regionserver has hit the global memory limit? > > Do see 'Forced flushing of XXXX because global memstore limit of 1.6g ...." every 3-4 min > If not, it might help upping your storefile size from 96M. Perhaps > double it so less frequent flushes (more likely the reads will find > the data out of memory). > > What rate would make you happy? > :-) I think from an acceptable threshold - we are good!! We are trying to size up our capacity handling metrics and wanted to get a sense that we not way off the mark. Also was looking for ideas and suggestions that we may have missed. ~Jacob St.Ack > > > > ~jacob > > > > > > On Sat, May 29, 2010 at 12:04 PM, Stack <st...@duboce.net> wrote: > >> On Sat, May 29, 2010 at 10:53 AM, Stack <st...@duboce.net> wrote: > >>> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <ja...@ebrary.com> wrote: > >>>> Here is the summary of the runs > >>>> > >>>> puts (~4-5k per row) > >>>> regionsize #rows Total time (ms) > >>>> 1G 82282053*2 301943742 > >>>> 512M 82287593*2 313119378 > >>>> 256M 82246314*2 433200105 > >>>> > >>> > >>> So about 0.3ms per 5k write (presuming 100M writes?)? > >>> > >> > >> I just tried loading 100M 1k rows into a 4 regionserver cluster where > >> each node had two clients writing at any one time and it took just > >> over an hour. If you tell me more about your loading job and if > >> reading is happening concurrently, I can try and mock it here so we > >> can compare (no lzo and all defaults on my cluster). > >> > >> St.Ack > >> > > >