Is your insert path multi-threaded? On Thu, Mar 3, 2011 at 10:51 PM, Aditya Sharma <[email protected]>wrote:
> It was quite variable, as I said earlier, but in one sort of representative > READs only benchmark, it was 115 READs per second. For a READ + WRITE > benchmark, it was 90 operations per second (with some primitive caching > thrown in). > > Aditya > > > > On Fri, Mar 4, 2011 at 11:54 AM, Ted Dunning <[email protected]>wrote: > >> What kinds of speeds are you seeing? >> >> >> On Thu, Mar 3, 2011 at 10:19 PM, Aditya Sharma >> <[email protected]>wrote: >> >>> Hi All, >>> >>> I am working on benchmarking different data stores to find the best fit >>> for >>> our use case. I would like to know views and suggestions of the HBase >>> user >>> and developer community on some of my findings as the results I am >>> getting >>> are highly variable. >>> >>> My HBase Setup has two EC2 Large hosts (each one has 7.5 GB memory, 4 CPU >>> cores etc), on which both the HBase master and slaves reside. HDFS master >>> slave and Zookeeper instances are also split between these two hosts. I >>> have >>> three tables with one column family each and they have 100 million, 75 >>> million and 500 million rows respectively. The actual data consists of a >>> String key and Long, String columns. The usual access patterns is to have >>> GETs on individual keys and have periodical batch PUTs. >>> >>> I ran my benchmark application on HBase for different scenarios to >>> measure >>> pure GET performance, mixed GET and PUT performance etc. This was >>> actually >>> without enabling the HTable APIs writeBuffer or any BloomFilters. The >>> results I got were quite unimpressive, compared to similar benchmarking >>> done >>> using MySQL, Cassandra etc. The performance was anywhere from 40% to 100% >>> worse. So I started using writeBuffers in my code and also enabled >>> BloomFilters at ROW level. However I started seeing a lot of variance in >>> the >>> benchmarking results (though I would not be too sure about correlating >>> this >>> with Bloomfilters/WriteBuffering). Another fact causing concern was that >>> the >>> results were actually worse than earlier results. >>> >>> Since we are using EC2 Large instances, it seems unlikely that network or >>> some other virtualization related resources crunch are affecting our >>> performance measurement. >>> >>> What I would want to know is whether this rings a bell for anyone else >>> here. >>> Could I be missing out on some configuration knob which would result in >>> background compaction or some such process to start at the wrong time >>> which >>> might be affecting my benchmarks? Any comments or feedback are welcome. >>> >>> Thanks, >>> Aditya >>> >> >> >
