Thanks Himanshu, Sounds like I'll need to make my own AMI's :/
It's been a really long time since I've rolled HBase AMI's - last time I did it though, one of the reasons was so I wouldn't have to deal with manual IP configs. I'll see if my AMI's can be flexible enough to join a cluster through startup data alone. -- Jim On Sat, Jun 4, 2011 at 3:16 PM, Himanshu Vashishtha <[email protected] > wrote: > should add the disclaimer: That this is not the best possible way! :)) > There are some ruby scripts too (in the same repo, look for recipes > directory), and your cluster is up and running just with 1 rb file. I > didn't > use it because ruby is an unknown territory for me and I was not entirely > clear about it's working. > > Himanshu > > On Sat, Jun 4, 2011 at 1:02 PM, Himanshu Vashishtha < > [email protected] > > wrote: > > > I used ec2, but just for experiments. Here is what I did: > > a) used the ephemeral disks. My experiment datasets were persisted on S3, > > and I copied them onto the cluster. > > b) Use the hbase-ec2 scripts. get this repo > > https://github.com/ekoontz/hbase-ec2.git. > > c) Consult Andrew's pdf: hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf > > > > For the AMI, there is a create-hbase-image script in the above git repo. > I > > did create for my stuff and it's public, search "himanshu-hbase" and you > > should get it. But it's always good to have your own AMI (I learned it > the > > hard way). > > > > Consult the run scripts, like bin/launch-hbase-cluster, > > bin/launch-hbase-master etc. > > One thing was when you run the launch-cluster, the cluster is all set but > I > > needed to manually add the regionserver's internal ip in the master's > > conf/regionserver list. And also the datanode's entry in the conf/slaves > if > > hadoop directory. This can be done by a script though. > > > > Hope this helps. > > Himanshu > > > > > > On Sat, Jun 4, 2011 at 12:49 PM, Jim R. Wilson <[email protected] > >wrote: > > > > Thanks Sean, > >> > >> That's helpful. I probably should have added some contextual info. In > my > >> case, I'm interested in providing instructions on how one can fire up an > >> HBase cluster in EC2 order to experiment with it. That is, load data, > >> practice administration, etc. In that context, it's unlikely that the > >> person following the instructions would start more that 5 nodes, and > would > >> also not likely keep them on longer than an hour. > >> > >> I saw archived email threads where people recommended not running on EC2 > >> for > >> any length of time since you can get better performance-per-cost > >> characteristics from dedicated hardware (for example from Rackspace). > >> > >> So I guess my real question is this: What is the easiest possible way to > >> start a 5-node HBase 0.90.x cluster in EC2? I'm thinking that S3 is > >> better > >> for storage, but I'm open to whatever is genuinely the easiest thing to > >> do. > >> > >> Thanks again, > >> > >> -- Jim > >> > >> On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun > >> <[email protected]>wrote: > >> > >> > Here is my thoughts: > >> > > >> > If your datastorage is used for long-term, then you may consider > >> attaching > >> > HDFS storage device onto EBS rather than local disk (Attaching > Namenode > >> > storage device onto EBS as well). But for this setup, I think we > should > >> > think of dfs.replication.factor=2 (even 1) because EBS itself has > >> already > >> > provided enough reliability. > >> > > >> > If your datastore is used for ephemeral purpose (say EMR computation), > >> you > >> > may consider just using the EC2 provided ephemeral disks. > >> > > >> > > >> > > >> > > >> > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson < > [email protected] > >> > >wrote: > >> > > >> > > Hi HBase community, > >> > > > >> > > What are the current best-practices with respect to starting up an > >> HBase > >> > > cluster in EC2? I don't see any public AMI's newer than 0.89.xxx, > and > >> > > starting up that one it's, clear that it's not configured for HDFS > or > >> > > clustering (empty hbase-site.xml). > >> > > > >> > > Do people generally keep data in S3 or HDFS? If the latter, is it > >> > > persisted > >> > > via EBS? Do the hadoop nodes have more than one EBS attached to > >> > > distinguish > >> > > HDFS from the OS? > >> > > > >> > > Any help is much appreciated. Thanks in advance! > >> > > > >> > > -- Jim R. Wilson (jimbojw) > >> > > > >> > > >> > > >> > > >> > -- > >> > --Sean > >> > > >> > > > > >
