should add the disclaimer: That this is not the best possible way! :)) There are some ruby scripts too (in the same repo, look for recipes directory), and your cluster is up and running just with 1 rb file. I didn't use it because ruby is an unknown territory for me and I was not entirely clear about it's working.
Himanshu On Sat, Jun 4, 2011 at 1:02 PM, Himanshu Vashishtha <[email protected] > wrote: > I used ec2, but just for experiments. Here is what I did: > a) used the ephemeral disks. My experiment datasets were persisted on S3, > and I copied them onto the cluster. > b) Use the hbase-ec2 scripts. get this repo > https://github.com/ekoontz/hbase-ec2.git. > c) Consult Andrew's pdf: hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf > > For the AMI, there is a create-hbase-image script in the above git repo. I > did create for my stuff and it's public, search "himanshu-hbase" and you > should get it. But it's always good to have your own AMI (I learned it the > hard way). > > Consult the run scripts, like bin/launch-hbase-cluster, > bin/launch-hbase-master etc. > One thing was when you run the launch-cluster, the cluster is all set but I > needed to manually add the regionserver's internal ip in the master's > conf/regionserver list. And also the datanode's entry in the conf/slaves if > hadoop directory. This can be done by a script though. > > Hope this helps. > Himanshu > > > On Sat, Jun 4, 2011 at 12:49 PM, Jim R. Wilson <[email protected]>wrote: > > Thanks Sean, >> >> That's helpful. I probably should have added some contextual info. In my >> case, I'm interested in providing instructions on how one can fire up an >> HBase cluster in EC2 order to experiment with it. That is, load data, >> practice administration, etc. In that context, it's unlikely that the >> person following the instructions would start more that 5 nodes, and would >> also not likely keep them on longer than an hour. >> >> I saw archived email threads where people recommended not running on EC2 >> for >> any length of time since you can get better performance-per-cost >> characteristics from dedicated hardware (for example from Rackspace). >> >> So I guess my real question is this: What is the easiest possible way to >> start a 5-node HBase 0.90.x cluster in EC2? I'm thinking that S3 is >> better >> for storage, but I'm open to whatever is genuinely the easiest thing to >> do. >> >> Thanks again, >> >> -- Jim >> >> On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun >> <[email protected]>wrote: >> >> > Here is my thoughts: >> > >> > If your datastorage is used for long-term, then you may consider >> attaching >> > HDFS storage device onto EBS rather than local disk (Attaching Namenode >> > storage device onto EBS as well). But for this setup, I think we should >> > think of dfs.replication.factor=2 (even 1) because EBS itself has >> already >> > provided enough reliability. >> > >> > If your datastore is used for ephemeral purpose (say EMR computation), >> you >> > may consider just using the EC2 provided ephemeral disks. >> > >> > >> > >> > >> > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <[email protected] >> > >wrote: >> > >> > > Hi HBase community, >> > > >> > > What are the current best-practices with respect to starting up an >> HBase >> > > cluster in EC2? I don't see any public AMI's newer than 0.89.xxx, and >> > > starting up that one it's, clear that it's not configured for HDFS or >> > > clustering (empty hbase-site.xml). >> > > >> > > Do people generally keep data in S3 or HDFS? If the latter, is it >> > > persisted >> > > via EBS? Do the hadoop nodes have more than one EBS attached to >> > > distinguish >> > > HDFS from the OS? >> > > >> > > Any help is much appreciated. Thanks in advance! >> > > >> > > -- Jim R. Wilson (jimbojw) >> > > >> > >> > >> > >> > -- >> > --Sean >> > >> > >
