Wow !! That's almost twice the throughput I got with less that 1/4 the cluster size.
The general flow of the loading program is 1. Reading/processing data from source (a local file on the machine) 2. Writing data to HBase 3. Reading the data from HBase and processing it. steps 1 and 2 happen on the same node step 3 may or may not be on the same machine that wrote it. Yes the reads and writes are happening concurrently and another thing to note is that the read for a particular set is almost immediately after it is written In the master UI - there is steady # of request (typically around ~ 500 request/RS). I must admit we have not monitored it to say that's the steady rate throughout the 9 hr run - we have manually refresh the UI during the first two hrs and that's been the observation. The average load on these machines ~5 as reported by top/htop and datacenter monitoring UI . The typical messages I see in the RS logs are - and the typical pattern is few of them in a sudden burst and periodically every 1-3 min Finished snapshotting, commencing flushing stores - Started memstore flush for region Finished memstore flush Starting compaction on region compaction completed on region Failed openScanner removing old hlog file hlogs to remove out of total Updates disabled for region, ~jacob On Sat, May 29, 2010 at 12:04 PM, Stack <st...@duboce.net> wrote: > On Sat, May 29, 2010 at 10:53 AM, Stack <st...@duboce.net> wrote: >> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <ja...@ebrary.com> wrote: >>> Here is the summary of the runs >>> >>> puts (~4-5k per row) >>> regionsize #rows Total time (ms) >>> 1G 82282053*2 301943742 >>> 512M 82287593*2 313119378 >>> 256M 82246314*2 433200105 >>> >> >> So about 0.3ms per 5k write (presuming 100M writes?)? >> > > I just tried loading 100M 1k rows into a 4 regionserver cluster where > each node had two clients writing at any one time and it took just > over an hour. If you tell me more about your loading job and if > reading is happening concurrently, I can try and mock it here so we > can compare (no lzo and all defaults on my cluster). > > St.Ack >