Before I joined my current project and team, my colleagues did some trials with HBase on ec2. Apart from the financial side of things, I have been told that the internode networking is not what you'd hope for, both in terms of latency and bandwidth. Depending on your cluster size, experiencing high latency can really become a problem (because zookeeper decisions will take more time). For Hadoop this should be less of a problem, because you'd typically deal with s3 storage and leave the lower level details of persistence up to the provider.
Amazon is nowadays offering a special type of instance which actually has 10GBit internode networking, but this comes at a price, of course... Like I said, I did not experience these thing myself, but I trust my colleagues on this. When doing trials, I think you should do some benchmarking on the networking before relying on it... Right now we run a six node cluster for development (primary and secondary master node and four worker nodes). We have experienced network outages, processes crashing or shutting down (due to the network outages) and other unanticipated software errors. However, in all cases the cluster remained operational enough to respond to requests, so the problems have never been blocking (we setup notifications, error reporting, etc. like we would in a production environment to get familiar with that side of running a cluster during development; I advise anyone starting out with this kind of setup to do this as well). Generally, all the bad things could safely be attended next day or on monday when they happened at night or during the weekend. So in conclusion I would say managing a small cluster is not that much of a hassle (say up to 15 nodes or so). Making it too small obviously brings the disadvantage of less redundancy (losing one worker node of a four node cluster means losing 25% capacity, while losing a node in a larger cluster has less impact, while the chance of failure per node is just the same). Also, you will have to put some effort into making sure that all of the tasks that you need to do frequently are automated or at least can be done by a single command. If you find yourself SSH'ing into more than one machine, you're doing something wrong. We run on server grade hardware, not junk. RAIDed OS disks (not the data disks), dual power supply, etc. The master nodes we will use for production will come with battery backed RAID controllers (the kind that disable their write cache if they don't trust the battery level anymore) and there is highly available network storage option to also store namenode data. I could not think of an argument for running on junk in a business setting, unless you have absolutely nothing more than a 'reasonable effort' requirement. Getting decent hardware will always be cheaper when you have to take cost for labour into consideration. In the end availability requirements will also determine the amount of work that goes into this. Friso On 21 jul 2010, at 22:12, Hegner, Travis wrote: > The biggest issue you'll likely have is hardware, so if you are running ec2, > that is out the window. I run my datanodes on 'old' desktop grade hardware... > Single Power Supply, 2GB RAM, Single HT P4 procs, and single 250GB disks. I > know, it's bad, but for my current purposes, it works pretty well. Once the > cluster is up and running, and I'm not changing configs and constantly > restarting, it will run for weeks without intervention. > > If you run on server grade hardware, built to tighter specs, the chances of > failure (therefore intervention for repair or replacement) are lower. > > If you run on ec2, then someone else is dealing with the hardware, and you > can just use the cluster... > > Travis Hegner > http://www.travishegner.com/ > > -----Original Message----- > From: S Ahmed [mailto:[email protected]] > Sent: Wednesday, July 21, 2010 3:36 PM > To: [email protected] > Subject: Re: operations, how 'hard' is it to keep the servers humming > > Can you define what you mean by 'complete junk' ? > > I plan on using ec2. > > On Wed, Jul 21, 2010 at 3:23 PM, Hegner, Travis <[email protected]>wrote: > >> That question is completely dependant on the size of the cluster you are >> looking at setting up, which is then dependant on how much data you want to >> store and/or process. >> >> A one man show should be able to handle 10-20 machines without too much >> trouble, unless you run complete junk. I run a 6 node cluster on complete >> junk, and I rarely have had to tinker with it since setting it up. >> >> Travis Hegner >> http://www.travishegner.com/ >> >> -----Original Message----- >> From: S Ahmed [mailto:[email protected]] >> Sent: Wednesday, July 21, 2010 2:59 PM >> To: [email protected] >> Subject: operations, how 'hard' is it to keep the servers humming >> >> From a operations standpoint, is setup a hbase cluster and keeping them >> running a fairly complex task? >> >> i.e. if I am a 1-man show, would it be a smart choice to build on top of >> hbase or is a crazy idea? >> >> The information contained in this communication is confidential and is >> intended only for the use of the named recipient. Unauthorized use, >> disclosure, or copying is strictly prohibited and may be unlawful. If you >> have received this communication in error, you should know that you are >> bound to confidentiality, and should please immediately notify the sender or >> our IT Department at 866.459.4599. >> > > The information contained in this communication is confidential and is > intended only for the use of the named recipient. Unauthorized use, > disclosure, or copying is strictly prohibited and may be unlawful. If you > have received this communication in error, you should know that you are bound > to confidentiality, and should please immediately notify the sender or our IT > Department at 866.459.4599.
