I want to understand how best to deploy spark close to a data source or sink.

Let's say, I have a vertica cluster that I need to run spark job on. In that
case how should spark cluster be setup? 

1. Should we run a spark worker node on each vertica cluster node? 
2. How about when shuffling plays out?
3. Also how would the deployment look like in a managed cluster deployement
such as kubernetes?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to