1.) We use a different cluster for each app. I don't know if this is
best practice or not to be honest. We just wanted to separate downtime
and potential damage for each application.
2.) We usually use the HBase APIs directly.  Having said that, we
recently started working on a new service. We were looking at Redis and
Hazelcast. We chose Hazelcast because we could do a MapStore
implementation easily that would persist the data to HBase.

Here's the implementation that we have so far:
https://github.com/xstevens/bagheera/blob/master/src/java/com/mozilla/bagheera/hazelcast/persistence/HBaseMapStore.java

We are really only using this as a memory staging area before we do
batch puts into HBase.

3.) The StumbleUpon guys have helped us a lot and we have a setup
similar to them. As of last night we're now using HBase replication to a
secondary "analysis" cluster.
4.) Can't really speak for MSLAB yet. We're using it on our analysis
cluster as a testing ground before using it in production. Todd's blog
posts were pretty thorough on the topic though.

Cheers,

-Xavier

On 5/5/11 2:26 PM, Matt Davies wrote:
> Afternoon everyone,
>
> I am researching what the best practice is for using HBase in user facing
> applications.  I do not know all of the applications that will be ported to
> use HBase, but they do share common characteristics such as
>
> - simple key / value data.  Not serving large files ATM. Perhaps a couple
> columns in a single column family
> - very tall tables
> - hundreds of millions of rows
> - need millisecond access times for a single row
> - random access
> - maintain very, very good query times while loading in new data
>
>
> The quick choice would be to use something like memcache or Redis, but the
> data is growing faster than the memory of a single box or even few boxes.
>  We also have a significant investment in Hadoop technologies so keeping
> HBase prime seems to make a lot of sense.
>
> So, some questions:
>
> 1. do you find that having a single HBase cluster to serve all applications
> vs smaller clusters to serve application specific data is better?
> 2. In the real world do people hook API's directly to HBase or is there some
> caching layer that is used?
> 3. I remember hearing people like StumbleUpon use different clusters for
> analytics vs customer apps.  Is this still best practice?
> 4. Anyone using MSLAB's to reduce GC pauses in production? Experiences /
> landmines?
> 5. What other considerations have you found when hooking HBase up for
> user-facing applications?
>
> Thanks in advance and I'd love to hear some bragging!
>
> -Matt
>

Reply via email to