It depends... There are some reasons to do this however in general, you don't need to do this...
The course is wrong to suggest this as a best practice. Sent from my iPhone On Jun 5, 2012, at 5:00 PM, "Atif Khan" <[email protected]> wrote: > > During a recent Cloudera course we were told that it is "Best practice" to > isolate a MapReduce/HDFS cluster from an HBase/HDFS cluster as the two when > sharing the same HDFS cluster could lead to performance problems. I am not > sure if this is entirely true given the fact that the main concept behind > Hadoop is to export computation to the data and not import data to the > computation. If I were to segregate HBase and MapReduce clusters, then when > using MapReduce on HBase data would I not have to transfer large amounts of > data from HBase/HDFS cluster to MapReduce/HDFS cluster? > > Cloudera on their best practice page > (http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/) has the > following: > "Be careful when running mixed workloads on an HBase cluster. When you have > SLAs on HBase access independent of any MapReduce jobs (for example, a > transformation in Pig and serving data from HBase) run them on separate > clusters. HBase is CPU and Memory intensive with sporadic large sequential > I/O access while MapReduce jobs are primarily I/O bound with fixed memory > and sporadic CPU. Combined these can lead to unpredictable latencies for > HBase and CPU contention between the two. A shared cluster also requires > fewer task slots per node to accommodate for HBase CPU requirements > (generally half the slots on each node that you would allocate without > HBase). Also keep an eye on memory swap. If HBase starts to swap there is a > good chance it will miss a heartbeat and get dropped from the cluster. On a > busy cluster this may overload another region, causing it to swap and a > cascade of failures." > > All my initial investigation/reading lead me believe that I should a create > a common HDFS cluster and then I can run MapReduce and HBase against the > common HDFS cluster. But from the above Cloudera best practice it seems > like I should create two HDFS clusters, one for MapReduce and one for HBase > and then move data around when required. Something does not make sense with > this best practice recommendation. > > Any thoughts and/or feedback will be much appreciated. > > -- > View this message in context: > http://old.nabble.com/Shared-Cluster-between-HBase-and-MapReduce-tp33967219p33967219.html > Sent from the HBase User mailing list archive at Nabble.com. >
