Re: Partition Case Class RDD without ParRDDFunctions

Night Wolf Thu, 07 May 2015 05:18:10 -0700

MyClass is a basic scala case class (using Spark 1.3.1);

case class Result(crn: Long, pid: Int, promoWk: Int, windowKey: Int,
ipi: Double) {
  override def hashCode(): Int = crn.hashCode()
}



On Wed, May 6, 2015 at 8:09 PM, ayan guha <[email protected]> wrote:

> How does your MyClqss looks like? I was experimenting with Row class in
> python and apparently partitionby automatically takes first column as key.
> However, I am not sure how you can access a part of an object without
> deserializing it (either explicitly or Spark doing it for you)....
>
> On Wed, May 6, 2015 at 7:14 PM, Night Wolf <[email protected]> wrote:
>
>> Hi,
>>
>> If I have an RDD[MyClass] and I want to partition it by the hash code of
>> MyClass for performance reasons, is there any way to do this without
>> converting it into a PairRDD RDD[(K,V)] and calling partitionBy???
>>
>> Mapping it to a tuple2 seems like a waste of space/computation.
>>
>> It looks like the PairRDDFunctions..partitionBy() uses a
>> ShuffleRDD[K,V,C] requires K,V,C? Could I create a new
>> ShuffleRDD[MyClass,MyClass,MyClass](caseClassRdd, new HashParitioner)?
>>
>> Cheers,
>> N
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Partition Case Class RDD without ParRDDFunctions

Reply via email to