Re: operations, how 'hard' is it to keep the servers humming

Friso van Vollenhoven Wed, 21 Jul 2010 14:54:31 -0700

Before I joined my current project and team, my colleagues did some trials with 
HBase on ec2. Apart from the financial side of things, I have been told that 
the internode networking is not what you'd hope for, both in terms of latency 
and bandwidth. Depending on your cluster size, experiencing high latency can 
really become a problem (because zookeeper decisions will take more time). For 
Hadoop this should be less of a problem, because you'd typically deal with s3 
storage and leave the lower level details of persistence up to the provider.

Amazon is nowadays offering a special type of instance which actually has 
10GBit internode networking, but this comes at a price, of course...

Like I said, I did not experience these thing myself, but I trust my colleagues 
on this. When doing trials, I think you should do some benchmarking on the 
networking before relying on it...

Right now we run a six node cluster for development (primary and secondary 
master node and four worker nodes). We have experienced network outages, 
processes crashing or shutting down (due to the network outages) and other 
unanticipated software errors. However, in all cases the cluster remained 
operational enough to respond to requests, so the problems have never been 
blocking (we setup notifications, error reporting, etc. like we would in a 
production environment to get familiar with that side of running a cluster 
during development; I advise anyone starting out with this kind of setup to do 
this as well). Generally, all the bad things could safely be attended next day 
or on monday when they happened at night or during the weekend. So in 
conclusion I would say managing a small cluster is not that much of a hassle 
(say up to 15 nodes or so). Making it too small obviously brings the 
disadvantage of less redundancy (losing one worker node of a four node cluster 
means losing 25% capacity, while losing a node in a larger cluster has less 
impact, while the chance of failure per node is just the same). Also, you will 
have to put some effort into making sure that all of the tasks that you need to 
do frequently are automated or at least can be done by a single command. If you 
find yourself SSH'ing into more than one machine, you're doing something wrong.

We run on server grade hardware, not junk. RAIDed OS disks (not the data 
disks), dual power supply, etc. The master nodes we will use for production 
will come with battery backed RAID controllers (the kind that disable their 
write cache if they don't trust the battery level anymore) and there is highly 
available network storage option to also store namenode data. I could not think 
of an argument for running on junk in a business setting, unless you have 
absolutely nothing more than a 'reasonable effort' requirement. Getting decent 
hardware will always be cheaper when you have to take cost for labour into 
consideration. In the end availability requirements will also determine the 
amount of work that goes into this.

Friso

On 21 jul 2010, at 22:12, Hegner, Travis wrote:

> The biggest issue you'll likely have is hardware, so if you are running ec2, 
> that is out the window. I run my datanodes on 'old' desktop grade hardware... 
> Single Power Supply, 2GB RAM, Single HT P4 procs, and single 250GB disks. I 
> know, it's bad, but for my current purposes, it works pretty well. Once the 
> cluster is up and running, and I'm not changing configs and constantly 
> restarting, it will run for weeks without intervention.
> 
> If you run on server grade hardware, built to tighter specs, the chances of 
> failure (therefore intervention for repair or replacement) are lower.
> 
> If you run on ec2, then someone else is dealing with the hardware, and you 
> can just use the cluster...
> 
> Travis Hegner
> http://www.travishegner.com/
> 
> -----Original Message-----
> From: S Ahmed [mailto:[email protected]]
> Sent: Wednesday, July 21, 2010 3:36 PM
> To: [email protected]
> Subject: Re: operations, how 'hard' is it to keep the servers humming
> 
> Can you define what you mean by 'complete junk' ?
> 
> I plan on using ec2.
> 
> On Wed, Jul 21, 2010 at 3:23 PM, Hegner, Travis <[email protected]>wrote:
> 
>> That question is completely dependant on the size of the cluster you are
>> looking at setting up, which is then dependant on how much data you want to
>> store and/or process.
>> 
>> A one man show should be able to handle 10-20 machines without too much
>> trouble, unless you run complete junk. I run a 6 node cluster on complete
>> junk, and I rarely have had to tinker with it since setting it up.
>> 
>> Travis Hegner
>> http://www.travishegner.com/
>> 
>> -----Original Message-----
>> From: S Ahmed [mailto:[email protected]]
>> Sent: Wednesday, July 21, 2010 2:59 PM
>> To: [email protected]
>> Subject: operations, how 'hard' is it to keep the servers humming
>> 
>> From a operations standpoint, is setup a hbase cluster and keeping them
>> running a fairly complex task?
>> 
>> i.e. if I am a 1-man show, would it be a smart choice to build on top of
>> hbase or is a crazy idea?
>> 
>> The information contained in this communication is confidential and is
>> intended only for the use of the named recipient.  Unauthorized use,
>> disclosure, or copying is strictly prohibited and may be unlawful.  If you
>> have received this communication in error, you should know that you are
>> bound to confidentiality, and should please immediately notify the sender or
>> our IT Department at  866.459.4599.
>> 
> 
> The information contained in this communication is confidential and is 
> intended only for the use of the named recipient.  Unauthorized use, 
> disclosure, or copying is strictly prohibited and may be unlawful.  If you 
> have received this communication in error, you should know that you are bound 
> to confidentiality, and should please immediately notify the sender or our IT 
> Department at  866.459.4599.

Re: operations, how 'hard' is it to keep the servers humming

Reply via email to