Hi All, I got the past the first problem where now I am able to create a partition with keys only having same sub-strings in one partition. I was able to get that by adjusting the worker thread numbers to greater than 1 as I am running the application from eclipse on localhost.
But the issue with sorting still remains. So after I have partitioned the RDD, I invoke partitionedRdd.sortByKey(), but now each partition only has pairs which have same key. one thing I wanted to mention that I am using CassandraJavaRDD for this. Thanks - Ankur On Wed, Oct 1, 2014 at 10:12 PM, Ankur Srivastava < ankur.srivast...@gmail.com> wrote: > Hi, > > I am using custom partitioner to partition my JavaPairRDD where key is a > String. > > I use hashCode of the sub-string of the key to derive the partition index > but I have noticed that my partition contains keys which have a different > partitionIndex returned by the partitioner. > > Another issue I am facing is that when I sort the rdd further after > partitioning, my partition has only keys which are equal. > > My Partitioner is as below: > > public class BlockPartitioner extends Partitioner { > > private int numPartitions = 8; > > @Override > > public int numPartitions() { > > return numPartitions; > > } > > > @Override > > public int getPartition(Object key) { > > String dept = key.subString(0,7); > > int partitionId = dept.hashCode(); > > return partitionId % numPartitions; > > } > > } > > I am using "foreachPartition" of the java pair rddd to verify my > partitions. > > Thanks > Ankur >