On Sun, May 30, 2010 at 9:22 AM, Jacob Isaac <ja...@ebrary.com> wrote: > On Sun, May 30, 2010 at 7:04 AM, Stack <st...@duboce.net> wrote: > Our writes and reads are pretty random (we rely on HBase handling the > distribution) > except that we read a set almost immediately after it written. > > Since our gets is for a set - we are scanning a bunch of rows at a time. > working on multiple sets at a time - don't know whether that would help? >
So, you are scanning (looks like you can given your key type assuming the sha-1 is the set identifier). > The Failed openScanner messages seems to suggest some region name cache is > getting stale with so many splits taking place. > Paste the exception. > Do see 'Forced flushing of XXXX because global memstore limit of 1.6g ...." > every 3-4 min > Do these periods last a while or are they short? You think the scenario described by Jon Gray over in HBASE-2375? > We are trying to size up our capacity handling metrics and > wanted to get a sense that we not way off the mark. > Well, you seem to have the basics right and you seem to have a good handle on how the systems interact. All that is left, it would seem is to try lzo as J-D suggests. Good stuff Jacob, St.Ack > Also was looking for ideas and suggestions that we may have missed. > > ~Jacob > > St.Ack >> >> >> > ~jacob >> > >> > >> > On Sat, May 29, 2010 at 12:04 PM, Stack <st...@duboce.net> wrote: >> >> On Sat, May 29, 2010 at 10:53 AM, Stack <st...@duboce.net> wrote: >> >>> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <ja...@ebrary.com> wrote: >> >>>> Here is the summary of the runs >> >>>> >> >>>> puts (~4-5k per row) >> >>>> regionsize #rows Total time (ms) >> >>>> 1G 82282053*2 301943742 >> >>>> 512M 82287593*2 313119378 >> >>>> 256M 82246314*2 433200105 >> >>>> >> >>> >> >>> So about 0.3ms per 5k write (presuming 100M writes?)? >> >>> >> >> >> >> I just tried loading 100M 1k rows into a 4 regionserver cluster where >> >> each node had two clients writing at any one time and it took just >> >> over an hour. If you tell me more about your loading job and if >> >> reading is happening concurrently, I can try and mock it here so we >> >> can compare (no lzo and all defaults on my cluster). >> >> >> >> St.Ack >> >> >> > >> >