Hi all, Could anyone direct me to a resource (or perhaps give me their own thoughts) on best practice for deploying a robust, resilient Hadoop (specifically CDH in this case) cluster to AWS? The data is important to us and we expect to store it for a long time. We want to make sure we are not impacted by outages in single availability zones and we want to implement a sensible backup/disaster recovery plan.
We are using HBase in addition to HDFS but not Hive, at present. Your thoughts much appreciated. Regards, Trevor Smith
