You said your hdfs cluster and spark cluster is running on different
cluster.This is not a good idea,because you should consider data
locality.Your spark node need config hdfs client configuration.
Spark Job is composed of stages,each stage have one or more
partitions。Parallelism of job decided by these partitions.
Shuffle process is decided by your operator,like
reduceByKey、repartition、sortBy and so on.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to