Re: How to improve HBase throughput with YCSB?

Ted Dunning Wed, 01 Jun 2011 07:42:25 -0700

Answers in-line.

On Wed, Jun 1, 2011 at 12:42 AM, Harold Lim <[email protected]> wrote:


> Hi Ted,
>
> > You appear to be running on about 10 disks total.
> > Each disk should be
> > capable of about 100 ops per second but they appear to be
> > doing about 70.
> >  This is plausible overhead.
>
>
> Each c1.xlarge instance has 4 ephemeral disk. However, I forgot to modify
> my script to mount the other 2 ephemeral disk and add them to dfs.data.dir.
> So, it should be running on 20 disks total. That would make it 100 ops per
> second vs 35 ops per second? Is that still a plausible overhead?
>

Potentially.  Hbase may need to read several locations to access your data
since it effectively overlays multiple hfiles.


> Is there a difference to the performance if I add the 4 disks to the
> dfs.data.dir vs. setting a raid-0 of the 4 ephemeral disks and have a single
> location for dfs.data.dir?
>

I would avoid raid-0

> > Uniform random can be a reasonably good approximation if
> > you are running
> > behind a cache large enough to cache all repeated
> > accesses.  If you aren't
> > behind a cache, uniform access might be very unrealistic
> > (and pessimistic).
> >
> > Do you have logs that you can use to model your actual read
> > behaviors?
> >
>
> Right now, I'm just playing with completely uniformly random. However, I
> have also tried a Zipf distribution and the throughput seems to saturate at
> around 1.2k ops per second.
>

Harumph.

What about data that prefers recently accessed keys?

Re: How to improve HBase throughput with YCSB?

Reply via email to