Hi, I'm running some performance tests on a cluster with 5 member servers (not counting the masters of all kinds), each node running a data node, a region server and a thrift server. Each server has 2 quad core CPUs and 16GB of RAM. The data set I'm using is built of 50 sets of consecutive keys with 100 columns in each row all under a single CF. Each column has 1KB of random data. The entire data set is around 40GB so it should fit in RAM for caching purposes. I've enabled LZO for the table. I'm running the test through Thrift because that's the way it is going to be used in our production environment.
I started with a basic insert operation. Inserting rows with one column with 1KB of data each. Initially, when the table was empty I was getting around 300 inserts per second with 50 writing threads. Then, when the region split and a second server was added the rate suddenly jumped to 3000 inserts/sec per server, so ~6000 for the two servers. Over time as more servers were added the rate actually went down, and stabilized on around 2000 inserts/sec per server. I'm not sure what is the reason for the jump on inserts per server after the region was split? Maybe local splits on the same RS that allowed more open files? I also conducted a random column read test, where I read different number of columns from randomly selected rows. First I tested reading only one specific column (the first in each row). It started at around 60r/s per server and gradually (I assume as more data was loaded into the cache) increased to ~800 r/s per server. When reading 5 random columns from each row the rate dropped to around 400 rows/sec and when fetching full rows (each with 100 columns) the rate remained about the same, at 400 rows/sec per server. I'm not sure exactly what should I expect but I was hoping for much higher numbers. I read somewhere that for small data it is reasonable to expect 10K inserts per core per second. I know 1KB isn't small but these are 8 core machines and they are doing about 2K inserts. Also the read rate is very low considering all the data should fit in RAM. The interesting thing is that there doesn't seem to be any resource bottleneck. IO utilization on the servers is negligible and CPU is around 40-50% utilization. The client generating the load is not loaded either (around 5% CPU utilization). Client network was at 30% utilization when reading full rows. So the only reason for flat-lining is some sort of lock contention. Does this make sense? Any way to improve it, especially the read performance? -eran
