At StumbleUpon we use two clusters because high throughput (like
MapReduce) will always kill low latency (like serving random reads).

J-D

On Mon, Nov 28, 2011 at 11:37 PM, jingguo yao <[email protected]> wrote:
> I want to set up Hadoop clusters. There are two workloads. One is log
> analysis which is using MapReduce to process big log files in HDFS.
> The other is HBase which is used to serve random table queries.
>
> I have two choices to set up my Hadoop clusters. One is to use one
> Hadoop cluster. Log analysis and HBase use the same cluster. Its
> advantages are:
>
> 1 There is only one Hadoop cluster which I need to manage.
> 2 Both MapReduce and HBase can use this big cluster which has more
>  storage and more powerful computation capability.
>
> Its disadvantages:
>
> 1 Running MapReduce jobs may slow down the random HBase table
>  queries.
>
> The other choice is to use two clusters. Cluster A is for log analysis.
> Cluster B is for HBase. Its advantages are:
>
> 1 There are no interferences between log analysis and HBase table
>  queries.
>
> Its disadvantages:
>
> 1. There are two Hadoop clusters which need to be managed.
> 2. Both log analysis and HBase queries can only use a small Hadoop
>   cluster which has less storage and less powerful computation
>   capability.
>
> I don't know which choice is better. Can anybody give me some advice
> on this? Thanks.
>
> --
> Jingguo
>

Reply via email to