Dear all,

About the index of each partition of an RDD, I am wondering whether we
can keep their numbering on each physical machine in a hash
partitioning process. For example, a cluster containing three physical
machines A,B,C (all are workers), for an RDD with six partitions,
assume that the two partitions with index 0 and 3 are in A, partitions
with index 1 and 4 are in B and the ones with index 2 and 5 are in C.
Then, if I hash partition the RDD using "partitionBy(new
HashPartitioner(6))", will the new created RDD still have the same
partition index on each machine? Is it possible that the partitions
with index 0 and 3 are now on B but not A? If it is, is there any
method that we can use to keep both the RDDs having the same numbering
on each physical machine?

Thanks in advance.


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to