Koert's answer is very likely correct. This implicit definition which
converts an RDD[(K, V)] to provide PairRDDFunctions requires a ClassTag is
available for K:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1124

To fully understand what's going on from a Scala beginner's point of view,
you'll have to look up ClassTags, context bounds (the "K : ClassTag"
syntax), and implicit functions. Fortunately, you don't have to understand
monads...


On Tue, Apr 1, 2014 at 2:06 PM, Koert Kuipers <ko...@tresata.com> wrote:

>   import org.apache.spark.SparkContext._
>   import org.apache.spark.rdd.RDD
>   import scala.reflect.ClassTag
>
>   def joinTest[K: ClassTag](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) :
> RDD[(K, Int)] = {
>
>     rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) }
>   }
>
>
> On Tue, Apr 1, 2014 at 4:55 PM, Daniel Siegmann 
> <daniel.siegm...@velos.io>wrote:
>
>> When my tuple type includes a generic type parameter, the pair RDD
>> functions aren't available. Take for example the following (a join on two
>> RDDs, taking the sum of the values):
>>
>> def joinTest(rddA: RDD[(String, Int)], rddB: RDD[(String, Int)]) :
>> RDD[(String, Int)] = {
>>     rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) }
>> }
>>
>> That works fine, but lets say I replace the type of the key with a
>> generic type:
>>
>> def joinTest[K](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) : RDD[(K, Int)]
>> = {
>>     rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) }
>> }
>>
>> This latter function gets the compiler error "value join is not a member
>> of org.apache.spark.rdd.RDD[(K, Int)]".
>>
>> The reason is probably obvious, but I don't have much Scala experience.
>> Can anyone explain what I'm doing wrong?
>>
>> --
>> Daniel Siegmann, Software Developer
>> Velos
>> Accelerating Machine Learning
>>
>> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
>> E: daniel.siegm...@velos.io W: www.velos.io
>>
>
>

Reply via email to