Re: Best practices for HBase in EC2?

Jim R. Wilson Sat, 04 Jun 2011 12:26:02 -0700

Thanks Himanshu,

Sounds like I'll need to make my own AMI's :/


It's been a really long time since I've rolled HBase AMI's - last time I did
it though, one of the reasons was so I wouldn't have to deal with manual IP
configs.  I'll see if my AMI's can be flexible enough to join a cluster
through startup data alone.

-- Jim

On Sat, Jun 4, 2011 at 3:16 PM, Himanshu Vashishtha <[email protected]
> wrote:

> should add the disclaimer: That this is not the best possible way! :))
> There are some ruby scripts too (in the same repo, look for recipes
> directory), and your cluster is up and running just with 1 rb file. I
> didn't
> use it because ruby is an unknown territory for me and I was not entirely
> clear about it's working.
>
> Himanshu
>
> On Sat, Jun 4, 2011 at 1:02 PM, Himanshu Vashishtha <
> [email protected]
> > wrote:
>
> > I used ec2, but just for experiments. Here is what I did:
> > a) used the ephemeral disks. My experiment datasets were persisted on S3,
> > and I  copied them onto the cluster.
> > b) Use the hbase-ec2 scripts. get this repo
> > https://github.com/ekoontz/hbase-ec2.git.
> > c) Consult Andrew's pdf: hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
> >
> > For the AMI, there is a create-hbase-image script in the above git repo.
> I
> > did create for my stuff and it's public, search "himanshu-hbase" and you
> > should get it. But it's always good to have your own AMI (I learned it
> the
> > hard way).
> >
> > Consult the run scripts, like bin/launch-hbase-cluster,
> > bin/launch-hbase-master etc.
> > One thing was when you run the launch-cluster, the cluster is all set but
> I
> > needed to manually add the regionserver's internal ip in the master's
> > conf/regionserver list. And also the datanode's entry in the conf/slaves
> if
> > hadoop directory. This can be done by a script though.
> >
> > Hope this helps.
> > Himanshu
> >
> >
> > On Sat, Jun 4, 2011 at 12:49 PM, Jim R. Wilson <[email protected]
> >wrote:
> >
> > Thanks Sean,
> >>
> >> That's helpful.  I probably should have added some contextual info.  In
> my
> >> case, I'm interested in providing instructions on how one can fire up an
> >> HBase cluster in EC2 order to experiment with it.  That is, load data,
> >> practice administration, etc.  In that context, it's unlikely that the
> >> person following the instructions would start more that 5 nodes, and
> would
> >> also not likely keep them on longer than an hour.
> >>
> >> I saw archived email threads where people recommended not running on EC2
> >> for
> >> any length of time since you can get better performance-per-cost
> >> characteristics from dedicated hardware (for example from Rackspace).
> >>
> >> So I guess my real question is this: What is the easiest possible way to
> >> start a 5-node HBase 0.90.x cluster in EC2?  I'm thinking that S3 is
> >> better
> >> for storage, but I'm open to whatever is genuinely the easiest thing to
> >> do.
> >>
> >> Thanks again,
> >>
> >> -- Jim
> >>
> >> On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun
> >> <[email protected]>wrote:
> >>
> >> > Here is my thoughts:
> >> >
> >> > If your datastorage is used for long-term, then you may consider
> >> attaching
> >> > HDFS storage device onto EBS rather than local disk (Attaching
> Namenode
> >> > storage device onto EBS as well). But for this setup, I think we
> should
> >> > think of dfs.replication.factor=2 (even 1) because EBS itself has
> >> already
> >> > provided enough reliability.
> >> >
> >> > If your datastore is used for ephemeral purpose (say EMR computation),
> >> you
> >> > may consider just using the EC2 provided ephemeral disks.
> >> >
> >> >
> >> >
> >> >
> >> > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <
> [email protected]
> >> > >wrote:
> >> >
> >> > > Hi HBase community,
> >> > >
> >> > > What are the current best-practices with respect to starting up an
> >> HBase
> >> > > cluster in EC2?  I don't see any public AMI's newer than 0.89.xxx,
> and
> >> > > starting up that one it's, clear that it's not configured for HDFS
> or
> >> > > clustering (empty hbase-site.xml).
> >> > >
> >> > > Do people generally keep data in S3 or HDFS?  If the latter, is it
> >> > > persisted
> >> > > via EBS?  Do the hadoop nodes have more than one EBS attached to
> >> > > distinguish
> >> > > HDFS from the OS?
> >> > >
> >> > > Any help is much appreciated.  Thanks in advance!
> >> > >
> >> > > -- Jim R. Wilson (jimbojw)
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > --Sean
> >> >
> >>
> >
> >
>

Re: Best practices for HBase in EC2?

Reply via email to