Re: Where is reduceByKey?

Philip Ogren Thu, 07 Nov 2013 13:16:29 -0800

I remember running into something very similar when trying to perform aforeach on java.util.List and I fixed it by adding the following import:


import scala.collection.JavaConversions._

And my foreach loop magically compiled - presumably due to a anotherimplicit conversion. Now this is the second time I've run into thisproblem and I didn't recognize it. I'm not sure that I would know whatto do the next time I run into this. Do you have some advice on how Ishould have recognized a missing import that provides implicitconversions and how I would know what to import? This strikes me ascode obfuscation. I guess this is more of a Scala question....


Thanks,
Philip



On 11/7/2013 2:01 PM, Josh Rosen wrote:

The additional methods on RDDs of pairs are defined in a class calledPairRDDFunctions(https://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions).SparkContext provides an implicit conversion from RDD[T] toPairRDDFunctions[T] to make this transparent to users.


To import those implicit conversions, use

    import org.apache.spark.SparkContext._

These conversions are automatically imported by Spark Shell, butyou'll have to import them yourself in standalone programs.

On Thu, Nov 7, 2013 at 11:54 AM, Philip Ogren <[email protected]<mailto:[email protected]>> wrote:


    On the front page <http://spark.incubator.apache.org/> of the
    Spark website there is the following simple word count implementation:

    file = spark.textFile("hdfs://...")
    file.flatMap(line => line.split(" ")).map(word => (word,
    1)).reduceByKey(_ + _)

    The same code can be found in the Quick Start
    <http://spark.incubator.apache.org/docs/latest/quick-start.html>
    quide.  When I follow the steps in my spark-shell (version 0.8.0)
    it works fine.  The reduceByKey method is also shown in the list
    of transformations
    
<http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#transformations>
    in the Spark Programming Guide.  The bottom of this list directs
    the reader to the API docs for the class RDD (this link is broken,
    BTW).  The API docs for RDD
    
<http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD>
    does not list a reduceByKey method for RDD.  Also, when I try to
    compile the above code in a Scala class definition I get the
    following compile error:

    value reduceByKey is not a member of
    org.apache.spark.rdd.RDD[(java.lang.String, Int)]

    I am compiling with maven using the following dependency definition:

            <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.9.3</artifactId>
    <version>0.8.0-incubating</version>
            </dependency>

    Can someone help me understand why this code works fine from the
    spark-shell but doesn't seem to exist in the API docs and won't
    compile?

    Thanks,
    Philip

Re: Where is reduceByKey?

Reply via email to