It was quite variable, as I said earlier, but in one sort of representative READs only benchmark, it was 115 READs per second. For a READ + WRITE benchmark, it was 90 operations per second (with some primitive caching thrown in).
Aditya On Fri, Mar 4, 2011 at 11:54 AM, Ted Dunning <[email protected]> wrote: > What kinds of speeds are you seeing? > > > On Thu, Mar 3, 2011 at 10:19 PM, Aditya Sharma <[email protected]>wrote: > >> Hi All, >> >> I am working on benchmarking different data stores to find the best fit >> for >> our use case. I would like to know views and suggestions of the HBase user >> and developer community on some of my findings as the results I am getting >> are highly variable. >> >> My HBase Setup has two EC2 Large hosts (each one has 7.5 GB memory, 4 CPU >> cores etc), on which both the HBase master and slaves reside. HDFS master >> slave and Zookeeper instances are also split between these two hosts. I >> have >> three tables with one column family each and they have 100 million, 75 >> million and 500 million rows respectively. The actual data consists of a >> String key and Long, String columns. The usual access patterns is to have >> GETs on individual keys and have periodical batch PUTs. >> >> I ran my benchmark application on HBase for different scenarios to measure >> pure GET performance, mixed GET and PUT performance etc. This was actually >> without enabling the HTable APIs writeBuffer or any BloomFilters. The >> results I got were quite unimpressive, compared to similar benchmarking >> done >> using MySQL, Cassandra etc. The performance was anywhere from 40% to 100% >> worse. So I started using writeBuffers in my code and also enabled >> BloomFilters at ROW level. However I started seeing a lot of variance in >> the >> benchmarking results (though I would not be too sure about correlating >> this >> with Bloomfilters/WriteBuffering). Another fact causing concern was that >> the >> results were actually worse than earlier results. >> >> Since we are using EC2 Large instances, it seems unlikely that network or >> some other virtualization related resources crunch are affecting our >> performance measurement. >> >> What I would want to know is whether this rings a bell for anyone else >> here. >> Could I be missing out on some configuration knob which would result in >> background compaction or some such process to start at the wrong time >> which >> might be affecting my benchmarks? Any comments or feedback are welcome. >> >> Thanks, >> Aditya >> > >
