Re: How to improve HBase throughput with YCSB?

Harold Lim Wed, 01 Jun 2011 00:43:08 -0700

Hi Ted,

> You appear to be running on about 10 disks total. 
> Each disk should be
> capable of about 100 ops per second but they appear to be
> doing about 70.
>  This is plausible overhead.



Each c1.xlarge instance has 4 ephemeral disk. However, I forgot to modify my 
script to mount the other 2 ephemeral disk and add them to dfs.data.dir. So, it 
should be running on 20 disks total. That would make it 100 ops per second vs 
35 ops per second? Is that still a plausible overhead?

Is there a difference to the performance if I add the 4 disks to the 
dfs.data.dir vs. setting a raid-0 of the 4 ephemeral disks and have a single 
location for dfs.data.dir?


I'll also try your suggestion of using multiple ebs stores.


> 
> Is your actual load going to be completely uniformly
> random?  Or will there
> be a Zipf distribution?  Will there be burst of
> repeated accesses?
> 
> Uniform random can be a reasonably good approximation if
> you are running
> behind a cache large enough to cache all repeated
> accesses.  If you aren't
> behind a cache, uniform access might be very unrealistic
> (and pessimistic).
> 
> Do you have logs that you can use to model your actual read
> behaviors?
> 

Right now, I'm just playing with completely uniformly random. However, I have 
also tried a Zipf distribution and the throughput seems to saturate at around 
1.2k ops per second.

I actually don't have logs to model my read behaviors. I'm using HBase as part 
of my research project.

Thanks,
Harold

Re: How to improve HBase throughput with YCSB?

Reply via email to