HBase on EMR is fairly reliable but is still subject to hardware failures (which has happened to me before). Is there a best practice for adding backup masters to an EMR cluster?

I know this isn't technically a supported feature from AWS but we're already heavily invested into HBase on EMR and would like to investigate options on mitigating the risk of a master failure. In EMR if the master dies the entire cluster is terminated so we need fail over for HBase, Hadoop/HDFS and Zookeeper. The one idea that I've had is to create a second (or third) EMR cluster with its HBase, Zookeeper and Hadoop/HDFS configuration pointed to the primary cluster. This would in effect add the RegionServers and Datanodes to the primary cluster. I know that loosing 1/3 to 1/2 of your Datanodes would most likely mean you would loose some WALs but re-ingesting the last days worth of data is acceptable trade off for us in exchange for not having downtime.

I realize this is a slightly crazy idea and using something like Kubernetes is the 'correct' solution but I have to work with what we have and mitigate possible issues. My question is are there any big issues that anyone would foresee us having with this idea?

Thanks for the feedback,
Austin

Reply via email to