MyClass is a basic scala case class (using Spark 1.3.1); case class Result(crn: Long, pid: Int, promoWk: Int, windowKey: Int, ipi: Double) { override def hashCode(): Int = crn.hashCode() }
On Wed, May 6, 2015 at 8:09 PM, ayan guha <guha.a...@gmail.com> wrote: > How does your MyClqss looks like? I was experimenting with Row class in > python and apparently partitionby automatically takes first column as key. > However, I am not sure how you can access a part of an object without > deserializing it (either explicitly or Spark doing it for you).... > > On Wed, May 6, 2015 at 7:14 PM, Night Wolf <nightwolf...@gmail.com> wrote: > >> Hi, >> >> If I have an RDD[MyClass] and I want to partition it by the hash code of >> MyClass for performance reasons, is there any way to do this without >> converting it into a PairRDD RDD[(K,V)] and calling partitionBy??? >> >> Mapping it to a tuple2 seems like a waste of space/computation. >> >> It looks like the PairRDDFunctions..partitionBy() uses a >> ShuffleRDD[K,V,C] requires K,V,C? Could I create a new >> ShuffleRDD[MyClass,MyClass,MyClass](caseClassRdd, new HashParitioner)? >> >> Cheers, >> N >> > > > > -- > Best Regards, > Ayan Guha >