I am finding that partitionBy is hanging - and it is not clear whether the
custom partitioner is even being invoked (i put an exception in there and
can not see it in the worker logs).

The structure is similar to the following:

inputPairedRdd = sc.parallelize([{0:"Entry1",1,"Entry2"}])

def identityPartitioner(key):
   # just use the id as the partition number
   # I am uncertain how to code this

partedRdd = inputPairedRdd.partitionBy(newNumPartitions,
identityPartitioner)

Reply via email to