The additional methods on RDDs of pairs are defined in a class called
PairRDDFunctions (
https://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions).
 SparkContext provides an implicit conversion from RDD[T] to
PairRDDFunctions[T] to make this transparent to users.

To import those implicit conversions, use

import org.apache.spark.SparkContext._


These conversions are automatically imported by Spark Shell, but you'll
have to import them yourself in standalone programs.


On Thu, Nov 7, 2013 at 11:54 AM, Philip Ogren <[email protected]>wrote:

>  On the front page <http://spark.incubator.apache.org/> of the Spark
> website there is the following simple word count implementation:
>
> file = spark.textFile("hdfs://...")
> file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_
> + _)
>
> The same code can be found in the Quick 
> Start<http://spark.incubator.apache.org/docs/latest/quick-start.html>quide.  
> When I follow the steps in my spark-shell (version 0.8.0) it works
> fine.  The reduceByKey method is also shown in the list of 
> transformations<http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#transformations>in
>  the Spark Programming Guide.  The bottom of this list directs the reader
> to the API docs for the class RDD (this link is broken, BTW).  The API
> docs for 
> RDD<http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD>does
>  not list a reduceByKey method for RDD.  Also, when I try to compile
> the above code in a Scala class definition I get the following compile
> error:
>
> value reduceByKey is not a member of
> org.apache.spark.rdd.RDD[(java.lang.String, Int)]
>
> I am compiling with maven using the following dependency definition:
>
>         <dependency>
>             <groupId>org.apache.spark</groupId>
>             <artifactId>spark-core_2.9.3</artifactId>
>             <version>0.8.0-incubating</version>
>         </dependency>
>
> Can someone help me understand why this code works fine from the
> spark-shell but doesn't seem to exist in the API docs and won't compile?
>
> Thanks,
> Philip
>

Reply via email to