Re: High throughput input, low latency output?

Anthony Urso Sat, 08 Oct 2011 12:18:52 -0700

On Fri, Oct 7, 2011 at 8:58 PM, Stack <[email protected]> wrote:
> On Fri, Oct 7, 2011 at 12:43 PM, Anthony Urso <[email protected]> wrote:
>> We have a use case that will require a ten to twenty EC2 node HBase
>> cluster to take several hundred million rows of input from a larger
>> number of EMR instances in daily bursts, and then serve those rows via
>> low latency random reads, say on the order of 300 or so rows per
>> second. Before we start coding, I thought it best to ask the experts
>> for their advice.
>>
>> 1) Is this something that HBase will be able to handle gracefully?
>
> You might have some chance if you were not on EC2.
>


Is that because of the slow disk I/O?

> Any chance of caching working?  Are the reads totally random or will
> there be 'hot' areas?  If so, you might have some hope.
>

Hopefully.  Do you mean external caching like memcache or OS-level disk caching?

>
>> 2) Does anyone have any pointers on how to tune HBase for performance
>> and stability under this load?
>
> See performance section on book up on hbase.org (though there should
> probably be EC2 caveats...)

TY.

>
>> 3) Would HBase perform better under this sort of load on twelve large
>> EC2 instances, six xlarge or three xxlarge?
>>
>
> The more nodes the better.  And if those nodes are not virtualized,
> better still.  But then there is the network and if its saturated....
>
>
> Can you run some tests before you start coding?

Good idea.

> St.Ack
>

Re: High throughput input, low latency output?

Reply via email to