Re: Best practices for HBase in EC2?

Himanshu Vashishtha Sat, 04 Jun 2011 12:16:58 -0700

should add the disclaimer: That this is not the best possible way! :))
There are some ruby scripts too (in the same repo, look for recipes
directory), and your cluster is up and running just with 1 rb file. I didn't
use it because ruby is an unknown territory for me and I was not entirely
clear about it's working.


Himanshu

On Sat, Jun 4, 2011 at 1:02 PM, Himanshu Vashishtha <[email protected]
> wrote:

> I used ec2, but just for experiments. Here is what I did:
> a) used the ephemeral disks. My experiment datasets were persisted on S3,
> and I  copied them onto the cluster.
> b) Use the hbase-ec2 scripts. get this repo
> https://github.com/ekoontz/hbase-ec2.git.
> c) Consult Andrew's pdf: hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
>
> For the AMI, there is a create-hbase-image script in the above git repo. I
> did create for my stuff and it's public, search "himanshu-hbase" and you
> should get it. But it's always good to have your own AMI (I learned it the
> hard way).
>
> Consult the run scripts, like bin/launch-hbase-cluster,
> bin/launch-hbase-master etc.
> One thing was when you run the launch-cluster, the cluster is all set but I
> needed to manually add the regionserver's internal ip in the master's
> conf/regionserver list. And also the datanode's entry in the conf/slaves if
> hadoop directory. This can be done by a script though.
>
> Hope this helps.
> Himanshu
>
>
> On Sat, Jun 4, 2011 at 12:49 PM, Jim R. Wilson <[email protected]>wrote:
>
> Thanks Sean,
>>
>> That's helpful.  I probably should have added some contextual info.  In my
>> case, I'm interested in providing instructions on how one can fire up an
>> HBase cluster in EC2 order to experiment with it.  That is, load data,
>> practice administration, etc.  In that context, it's unlikely that the
>> person following the instructions would start more that 5 nodes, and would
>> also not likely keep them on longer than an hour.
>>
>> I saw archived email threads where people recommended not running on EC2
>> for
>> any length of time since you can get better performance-per-cost
>> characteristics from dedicated hardware (for example from Rackspace).
>>
>> So I guess my real question is this: What is the easiest possible way to
>> start a 5-node HBase 0.90.x cluster in EC2?  I'm thinking that S3 is
>> better
>> for storage, but I'm open to whatever is genuinely the easiest thing to
>> do.
>>
>> Thanks again,
>>
>> -- Jim
>>
>> On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun
>> <[email protected]>wrote:
>>
>> > Here is my thoughts:
>> >
>> > If your datastorage is used for long-term, then you may consider
>> attaching
>> > HDFS storage device onto EBS rather than local disk (Attaching Namenode
>> > storage device onto EBS as well). But for this setup, I think we should
>> > think of dfs.replication.factor=2 (even 1) because EBS itself has
>> already
>> > provided enough reliability.
>> >
>> > If your datastore is used for ephemeral purpose (say EMR computation),
>> you
>> > may consider just using the EC2 provided ephemeral disks.
>> >
>> >
>> >
>> >
>> > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <[email protected]
>> > >wrote:
>> >
>> > > Hi HBase community,
>> > >
>> > > What are the current best-practices with respect to starting up an
>> HBase
>> > > cluster in EC2?  I don't see any public AMI's newer than 0.89.xxx, and
>> > > starting up that one it's, clear that it's not configured for HDFS or
>> > > clustering (empty hbase-site.xml).
>> > >
>> > > Do people generally keep data in S3 or HDFS?  If the latter, is it
>> > > persisted
>> > > via EBS?  Do the hadoop nodes have more than one EBS attached to
>> > > distinguish
>> > > HDFS from the OS?
>> > >
>> > > Any help is much appreciated.  Thanks in advance!
>> > >
>> > > -- Jim R. Wilson (jimbojw)
>> > >
>> >
>> >
>> >
>> > --
>> > --Sean
>> >
>>
>
>

Re: Best practices for HBase in EC2?

Reply via email to