There's an old JIRA issue proposing to make RDD covariant in T: https://spark-project.atlassian.net/browse/SPARK-697
I think that I tried making RDD covariant in T at some point, but ran into compiler errors. On Thu, Sep 26, 2013 at 2:57 PM, Reynold Xin <[email protected]> wrote: > You can do a cast > > val rdd = some RDD[SomeData] > > rdd.asInstanceOf[RDD[Tuple2[Int, Data]]].reduceByKey(...) > > > > It's invariant because of historic reasons I think. It is fairly hard to > change it now. > > > > -- > Reynold Xin, AMPLab, UC Berkeley > http://rxin.org > > > > On Thu, Sep 26, 2013 at 6:25 AM, Han JU <[email protected]> wrote: > >> Hi, >> >> I have some classes like >> >> abstract class RawData[+K, +V](id: K, data: V) extends Tuple2[K, V](uid, >> data) >> >> case class SomeData(id: Int, data: Data) extends RawData[Int, Data](id, >> data) >> >> >> to model some input data. >> >> Then I find out that RDD[SomeData] doesn't have access to >> pairRDDFunctions, like join. But SomeData is indeed a subclass of Tuple2. >> >> I guess that the problem comes from the invariance of T in RDD[T], and >> RDD[SomeData] is not a subclass of RDD[Tuple2] so the implicit conversion >> won't work. >> >> So, >> >> 1) how could I work this around? How do you model data of lots of fields >> that need to be joined? I don't really want to have things like "_._2._2" >> but rather "_.id" or "_.data.someFields". >> >> 2) is there some reason for invariance of T in RDD? could it be covariant? >> >> >> Thanks! >> >> -- >> *JU Han* >> >> Data Engineer @ Botify.com >> >> +33 0619608888 >> > >
