Try to confirm my understanding of HBase and MapReduce behavior.

2020-06-05 Thread Brian Hsu
I'm trying to do some process on my HBase dataset in our company. But I'm pretty new to the HBase and Hadoop ecosystem. I would like to get some feedback from this community, to see if my understanding of HBase and the MapReduce operation on it is correct. Some backgrounds here: 1. We have a

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Atif Khan
Thanks for the confirmation. There is also a good/detailed discussion thread on this issue found at http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-td4018856.html http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-td4018856.html . Michael

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Michael Segel
s to export computation to the data and not import data to the > computation. If I were to segregate HBase and MapReduce clusters, then when > using MapReduce on HBase data would I not have to transfer large amounts of > data from HBase/HDFS cluster to MapReduce/HDFS cluster? > > Cloud

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Andrew Purtell
On Wed, Jun 6, 2012 at 2:20 AM, Amandeep Khurana wrote: >> These are general recommendations and definitely change based on the >> access patterns and the way you will be using HBase and MapReduce. In >> general, if you are building a latency sensitive application on top of &

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Tim Robertson
github.com/lfrancke/puppet-cdh but there are others needed too On Wed, Jun 6, 2012 at 2:20 AM, Amandeep Khurana wrote: > Atif, > > These are general recommendations and definitely change based on the > access patterns and the way you will be using HBase and MapReduce. In > gener

Re: Shared Cluster between HBase and MapReduce

2012-06-05 Thread Amandeep Khurana
Atif, These are general recommendations and definitely change based on the access patterns and the way you will be using HBase and MapReduce. In general, if you are building a latency sensitive application on top of HBase, running a MapReduce job at the same time will impact performance due to

Re: Shared Cluster between HBase and MapReduce

2012-06-05 Thread Paul Mackles
>isolate a MapReduce/HDFS cluster from an HBase/HDFS cluster as the two >when >sharing the same HDFS cluster could lead to performance problems. I am >not >sure if this is entirely true given the fact that the main concept behind >Hadoop is to export computation to the data and not im

Shared Cluster between HBase and MapReduce

2012-06-05 Thread Atif Khan
in concept behind Hadoop is to export computation to the data and not import data to the computation. If I were to segregate HBase and MapReduce clusters, then when using MapReduce on HBase data would I not have to transfer large amounts of data from HBase/HDFS cluster to MapReduce/HDFS cluster? C

Re: HBase and MapReduce

2012-05-23 Thread Dave Revell
> > 1. HBase guarantees data locality of store files and Regionserver only if > it stays up for long. If there are too many region movements or the server > has been recycled recently, there is a high probability that store file > blocks are not local to the region server. But the getSplits comman

HBase and MapReduce

2012-05-23 Thread Hemant Bhanawat
I have couple of questions related to MapReduce over HBase 1. HBase guarantees data locality of store files and Regionserver only if it stays up for long. If there are too many region movements or the server has been recycled recently, there is a high probability that store file blocks are not