Let’s say I have a RDD that represents user’s behavior data. I can shard the RDD to several partitions on user id by HashPartitioner. Is there any way that I can control to which machine each partition goes to? Or how does Spark distribute partitions onto each machine? Thanks!
Yishu