I think there are a couple of things conflated here. Let me make four brief 
points and then feel free to follow up where you would like more information. 

1) Many run HBase (and self-hosted Hadoop) on EC2. These clusters have their 
own HDFS on EBS or instance store volumes. 

2) You cannot run HBase backed by S3. Search on other HBase user list emails on 
the subject.  But this of course does not mean you cannot run HBase on EC2. 
(See point 1.)

3) Your EMR jobs can talk to your other EC2 resources, such as a HBase cluster 
running off to the side. 

4) You can perform custom setup time actions for your EMR clusters, which can 
set up HBase to run (using the cluster's HDFS file system). Then your EMR job 
had a transient HBase for doing things like holding large intermediate 
representations (sparse matrix or whatever) that require random access. Of 
course here when the EMR job is complete, everything will be torn down. 

Best regards,

    - Andy


On Mar 3, 2012, at 3:45 AM, Mohit Gupta <[email protected]> wrote:

> Hi,
> 
> I am a bit confused about using HBase with EMR. In one of the previous
> thread ( and in EMR Documentation
> http://aws.amazon.com/elasticmapreduce/), it is said that S3 is the
> only option available to be used as
> source/destination at the moment. But I have come around a couple of blogs
> saying that those people are actually using HBase with EMR. ( one is
> http://whynosql.com/why-we-run-our-hbase-on-ec2/ ).
> I have a scenario where running EMR with Hbase would be really useful.
> Please let me know if its possible or if there is any workaround available
> for this( like first transferring the data to s3 and then to EMR).
> 
> 
> -- 
> Best Regards,
> 
> Mohit Gupta
> Software Engineer at Vdopia Inc.

Reply via email to