Hi everyone,

Is it possible for the spark EC2 scripts to deploy clusters set up with 
Cloudera's CDH4 hadoop distribution, as opposed to the default hadoop 
distributions?

As well, if an existing cluster is running Hadoop with CDH4, and Spark is 
compiled against the (non-Cloudera) Hadoop 2 to run the Spark daemons on that 
cluster, will there be any problems getting Spark to communicate with HDFS?

I'm fairly new to Hadoop versioning, and my team is trying to plan out our 
deployment strategy. We want our users to be able to use existing clusters 
backed by CDH4, or allow them to easily spawn clusters with the spark-ec2 
scripts – but we want Spark to be built against the same Hadoop jars in both 
cases.

Thanks,

-Matt Cheah

Reply via email to