Hi All,

I got the past the first problem where now I am able to create a partition
with keys only having same sub-strings in one partition. I was able to get
that by adjusting the worker thread numbers to greater than 1 as I am
running the application from eclipse on localhost.

But the issue with sorting still remains.

So after I have partitioned the RDD, I invoke partitionedRdd.sortByKey(),
but now each partition only has pairs which have same key.

one thing I wanted to mention that I am using CassandraJavaRDD for this.

Thanks
- Ankur

On Wed, Oct 1, 2014 at 10:12 PM, Ankur Srivastava <
ankur.srivast...@gmail.com> wrote:

> Hi,
>
> I am using custom partitioner to partition my JavaPairRDD where key is a
> String.
>
> I use hashCode of the sub-string of the key to derive the partition index
> but I have noticed that my partition contains keys which have a different
> partitionIndex returned by the partitioner.
>
> Another issue I am facing is that when I sort the rdd further after
> partitioning, my partition has only keys which are equal.
>
> My Partitioner is as below:
>
> public class BlockPartitioner extends Partitioner {
>
> private int numPartitions = 8;
>
> @Override
>
> public int numPartitions() {
>
> return numPartitions;
>
> }
>
>
> @Override
>
> public int getPartition(Object key) {
>
> String dept = key.subString(0,7);
>
> int partitionId = dept.hashCode();
>
> return partitionId % numPartitions;
>
>  }
>
> }
>
> I am using "foreachPartition" of the java pair rddd to verify my
> partitions.
>
> Thanks
> Ankur
>

Reply via email to