Hi, I have some classes like
abstract class RawData[+K, +V](id: K, data: V) extends Tuple2[K, V](uid, data) case class SomeData(id: Int, data: Data) extends RawData[Int, Data](id, data) to model some input data. Then I find out that RDD[SomeData] doesn't have access to pairRDDFunctions, like join. But SomeData is indeed a subclass of Tuple2. I guess that the problem comes from the invariance of T in RDD[T], and RDD[SomeData] is not a subclass of RDD[Tuple2] so the implicit conversion won't work. So, 1) how could I work this around? How do you model data of lots of fields that need to be joined? I don't really want to have things like "_._2._2" but rather "_.id" or "_.data.someFields". 2) is there some reason for invariance of T in RDD? could it be covariant? Thanks! -- *JU Han* Data Engineer @ Botify.com +33 0619608888
