I remember running into something very similar when trying to perform a foreach on java.util.List and I fixed it by adding the following import:

import scala.collection.JavaConversions._

And my foreach loop magically compiled - presumably due to a another implicit conversion. Now this is the second time I've run into this problem and I didn't recognize it. I'm not sure that I would know what to do the next time I run into this. Do you have some advice on how I should have recognized a missing import that provides implicit conversions and how I would know what to import? This strikes me as code obfuscation. I guess this is more of a Scala question....

Thanks,
Philip



On 11/7/2013 2:01 PM, Josh Rosen wrote:
The additional methods on RDDs of pairs are defined in a class called PairRDDFunctions (https://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions). SparkContext provides an implicit conversion from RDD[T] to PairRDDFunctions[T] to make this transparent to users.

To import those implicit conversions, use

    import org.apache.spark.SparkContext._


These conversions are automatically imported by Spark Shell, but you'll have to import them yourself in standalone programs.


On Thu, Nov 7, 2013 at 11:54 AM, Philip Ogren <[email protected] <mailto:[email protected]>> wrote:

    On the front page <http://spark.incubator.apache.org/> of the
    Spark website there is the following simple word count implementation:

    file = spark.textFile("hdfs://...")
    file.flatMap(line => line.split(" ")).map(word => (word,
    1)).reduceByKey(_ + _)

    The same code can be found in the Quick Start
    <http://spark.incubator.apache.org/docs/latest/quick-start.html>
    quide.  When I follow the steps in my spark-shell (version 0.8.0)
    it works fine.  The reduceByKey method is also shown in the list
    of transformations
    
<http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#transformations>
    in the Spark Programming Guide.  The bottom of this list directs
    the reader to the API docs for the class RDD (this link is broken,
    BTW).  The API docs for RDD
    
<http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD>
    does not list a reduceByKey method for RDD.  Also, when I try to
    compile the above code in a Scala class definition I get the
    following compile error:

    value reduceByKey is not a member of
    org.apache.spark.rdd.RDD[(java.lang.String, Int)]

    I am compiling with maven using the following dependency definition:

            <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.9.3</artifactId>
    <version>0.8.0-incubating</version>
            </dependency>

    Can someone help me understand why this code works fine from the
    spark-shell but doesn't seem to exist in the API docs and won't
    compile?

    Thanks,
    Philip



Reply via email to