On Fri, Oct 19, 2012 at 5:22 PM, Amandeep Khurana <ama...@gmail.com> wrote:
> Answers inline
>
> On Fri, Oct 19, 2012 at 4:31 PM, Dave Latham <lat...@davelink.net> wrote:
>
>> I need to scale an internal service / datastore that is currently hosted on
>> an HBase cluster and wanted to ask for advice from anyone out there who may
>> have some to share.  The service does simple key value lookups on 20 byte
>> keys to 20-40 byte values.  It currently has about 5 billion entries
>> (200GB), and processes about 40k random reads per second, and about 2k
>> random writes per second.  It currently delivers a median response at 2ms,
>> 90% at 20ms, 99% at 200ms, 99.5% at 5000ms - but the mean is 58ms which is
>> no longer meeting our needs very well.  It is persistent and highly
>> available.  I need to measure its working set more closely, but I believe
>> that around 20-30% (randomly distributed) of the data is accessed each
>> day.  I want a system that can scale to at least 10x current levels (50
>> billion entries - 2TB, 400k requests per second) and achieve a mean < 5ms
>> (ideally 1-2ms) and 99.5% < 50ms response time for reads while maintaining
>> persistence and reasonably high availability (99.9%).  Writes would ideally
>> be in the same as range but we could probably tolerate a mean more in the
>> 20-30ms range.
>>
>> Clearly for that latency, spinning disks won't cut it.  The current service
>> is running out of an hbase cluster that is shared with many other things
>> and when those other things hit the disk and network hard is when it
>> degrades.  The cluster has hundreds of nodes and this data is fits in a
>> small slice of block cache across most of them.  The concerns are that its
>> performance is impacted by other loads and that as it continues to grow
>> there may not be enough space in the current cluster's shared block cache.
>>
>> So I'm looking for something that will serve out of memory (backed by disk
>> for persistence) or from SSDs.  A few questions that I would love to hear
>> answers for:
>>
>>  - Does HBase sound like a good match as this grows?
>>
>
> Yes. The key to get more predictable performance is to separate out
> workloads. What are the other things that are using the same physical
> hardware and impacting performance? Have you measure performance when
> nothing else is running on the cluster?

There are several other things sharing the cluster and using it more
heavily than this service - both online request handling as well as
some large batch map reduce jobs.  When the large jobs aren't running
the performance is acceptable and typically in the 1-2ms mean reads
range.  (Served out of block cache).

>
>
>>  - Does anyone have experience running HBase over SSDs?  What sort of
>> latency and requests per second have you been able to achieve?
>>
>
> I don't believe many people are actually running this in production yet.
> Some folks have done some research on this topic and posted blogs (eg:
> http://hadoopblog.blogspot.com/2012/05/hadoop-and-solid-state-drives.html)
> but there's not a whole lot more than that to go by at this point.

Thanks, that's a really helpful reference.  It sounds like it could be
a big gain over disks already but that the bottleneck would move from
IO to CPU and that there would be significant work to be done.

>
>
>>  - Is anyone using a row cache on top of (or built into) HBase?  I think
>> there's been a bit of discussion on occasion but it hasn't gone very far.
>> There would be some overhead for each row.  It seems that if we were to
>> continue to rely on memory + disks this could reduce the memory required.
>>  - Does anyone have alternate suggestions for such a service?
>>
>
> The biggest recommendation is to separate out the workloads and then start
> planning for more hardware or additional components to get better
> performance.

Right, that's why I'm looking to separate this service out.  However,
I'd like to go with a much smaller set of nodes for this particular
service rather than duplicating the large, expensive cluster.

Reply via email to