I am finding that partitionBy is hanging - and it is not clear whether the custom partitioner is even being invoked (i put an exception in there and can not see it in the worker logs).
The structure is similar to the following: inputPairedRdd = sc.parallelize([{0:"Entry1",1,"Entry2"}]) def identityPartitioner(key): # just use the id as the partition number # I am uncertain how to code this partedRdd = inputPairedRdd.partitionBy(newNumPartitions, identityPartitioner)