I thought it was Doug Miel who said that HBase doesn't start to shine until you had at least 5 nodes. (Apologies if I misspelled Doug's name.)
I happen to concur and if you want to start testing scalability, you will want to build a bigger test rig. Just saying! Oh and you're going to have a hot spot on that row key. Maybe do a hashed UUID ? I would suggest that you consider the following: Create N number of rows... where N is a very large number of rows. Then to generate your random access, do a full table scan to get the N row keys in to memory. Using a random number generator, generate a random number and pop that row off the stack so that the next iteration is between 1 and (N-1). Do this 200K times. Now time your 200K random fetches. It would be interesting to see how it performs getting an average of a 'couple' of runs... then increase the key space by an order of magnitude. (Start w 1 million rows, 10 million rows, 100 million rows.... ) In theory... if properly tuned. One should expect near linear results . That is to say the time it takes to get() a row across the data space should be consistent. Although I wonder if you would have to somehow clear the cache? Sorry, just a random thought... -Mike On Dec 22, 2012, at 10:06 AM, Ted Yu <[email protected]> wrote: > By '3 datanodes', did you mean that you also increased the number of region > servers to 3 ? > > When your test was running, did you look at Web UI to see whether load was > balanced ? You can also use Ganglia for such purpose. > > What version of HBase are you using ? > > Thanks > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy > <[email protected]>wrote: > >> Dear all, >> >> I am testing a simple hbase application on a cluster of multiple nodes. >> >> I am especially testing the scalability performance, by measuring the time >> taken for random reads >> >> Data size: 200,000 row >> Row key : 0,1,2 very simple row key incremental >> >> But i don't know why by increasing the cluster size, I see the same time. >> >> For ex: >> 2 Datanodes: 1000 random read: 1.757 sec >> 3 datanodes: 1000 random read: 1.7 sec >> >> So any help plzzz ?? >> >>
