Mohit,

I have written the blogpost.

EMR is nothing but map reduce. HBase provides TableInputFormat. With
TableInputFormat and TableMapReduceUtil (
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html)
class, you can specify your source as HBase - hosted anywhere as long as
it's accessible through internet. In doing so if the HBase is not hosted on
the same Hadoop cluster (which it won't be in case of an EMR job), you will
be sacficing data locality (We are okay with that).

Regards,
Vaibhav

On Sat, Mar 3, 2012 at 9:21 AM, Andrew Purtell <[email protected]> wrote:

> I think there are a couple of things conflated here. Let me make four
> brief points and then feel free to follow up where you would like more
> information.
>
> 1) Many run HBase (and self-hosted Hadoop) on EC2. These clusters have
> their own HDFS on EBS or instance store volumes.
>
> 2) You cannot run HBase backed by S3. Search on other HBase user list
> emails on the subject.  But this of course does not mean you cannot run
> HBase on EC2. (See point 1.)
>
> 3) Your EMR jobs can talk to your other EC2 resources, such as a HBase
> cluster running off to the side.
>
> 4) You can perform custom setup time actions for your EMR clusters, which
> can set up HBase to run (using the cluster's HDFS file system). Then your
> EMR job had a transient HBase for doing things like holding large
> intermediate representations (sparse matrix or whatever) that require
> random access. Of course here when the EMR job is complete, everything will
> be torn down.
>
> Best regards,
>
>    - Andy
>
>
> On Mar 3, 2012, at 3:45 AM, Mohit Gupta <[email protected]>
> wrote:
>
> > Hi,
> >
> > I am a bit confused about using HBase with EMR. In one of the previous
> > thread ( and in EMR Documentation
> > http://aws.amazon.com/elasticmapreduce/), it is said that S3 is the
> > only option available to be used as
> > source/destination at the moment. But I have come around a couple of
> blogs
> > saying that those people are actually using HBase with EMR. ( one is
> > http://whynosql.com/why-we-run-our-hbase-on-ec2/ ).
> > I have a scenario where running EMR with Hbase would be really useful.
> > Please let me know if its possible or if there is any workaround
> available
> > for this( like first transferring the data to s3 and then to EMR).
> >
> >
> > --
> > Best Regards,
> >
> > Mohit Gupta
> > Software Engineer at Vdopia Inc.
>

Reply via email to