I want to set up Hadoop clusters. There are two workloads. One is log analysis which is using MapReduce to process big log files in HDFS. The other is HBase which is used to serve random table queries.
I have two choices to set up my Hadoop clusters. One is to use one Hadoop cluster. Log analysis and HBase use the same cluster. Its advantages are: 1 There is only one Hadoop cluster which I need to manage. 2 Both MapReduce and HBase can use this big cluster which has more storage and more powerful computation capability. Its disadvantages: 1 Running MapReduce jobs may slow down the random HBase table queries. The other choice is to use two clusters. Cluster A is for log analysis. Cluster B is for HBase. Its advantages are: 1 There are no interferences between log analysis and HBase table queries. Its disadvantages: 1. There are two Hadoop clusters which need to be managed. 2. Both log analysis and HBase queries can only use a small Hadoop cluster which has less storage and less powerful computation capability. I don't know which choice is better. Can anybody give me some advice on this? Thanks. -- Jingguo
