Hi everyone, Is it possible for the spark EC2 scripts to deploy clusters set up with Cloudera's CDH4 hadoop distribution, as opposed to the default hadoop distributions?
As well, if an existing cluster is running Hadoop with CDH4, and Spark is compiled against the (non-Cloudera) Hadoop 2 to run the Spark daemons on that cluster, will there be any problems getting Spark to communicate with HDFS? I'm fairly new to Hadoop versioning, and my team is trying to plan out our deployment strategy. We want our users to be able to use existing clusters backed by CDH4, or allow them to easily spawn clusters with the spark-ec2 scripts – but we want Spark to be built against the same Hadoop jars in both cases. Thanks, -Matt Cheah
