Re: Read Time from a remote data source

jiaan.geng Tue, 18 Dec 2018 19:34:29 -0800

You said your hdfs cluster and spark cluster is running on different
cluster.This is not a good idea,because you should consider data
locality.Your spark node need config hdfs client configuration.
Spark Job is composed of stages，each stage have one or more
partitions。Parallelism of job decided by these partitions.
Shuffle process is decided by your operator，like
reduceByKey、repartition、sortBy and so on.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Read Time from a remote data source

Reply via email to