I want to understand how best to deploy spark close to a data source or sink.
Let's say, I have a vertica cluster that I need to run spark job on. In that case how should spark cluster be setup? 1. Should we run a spark worker node on each vertica cluster node? 2. How about when shuffling plays out? 3. Also how would the deployment look like in a managed cluster deployement such as kubernetes? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org