Example of partitionBy in pyspark

Stephen Boesch Wed, 11 Mar 2015 01:43:04 -0700

I am finding that partitionBy is hanging - and it is not clear whether the
custom partitioner is even being invoked (i put an exception in there and
can not see it in the worker logs).


The structure is similar to the following:

inputPairedRdd = sc.parallelize([{0:"Entry1",1,"Entry2"}])

def identityPartitioner(key):
   # just use the id as the partition number
   # I am uncertain how to code this

partedRdd = inputPairedRdd.partitionBy(newNumPartitions,
identityPartitioner)

Example of partitionBy in pyspark

Reply via email to