I remember running into something very similar when trying to perform a
foreach on java.util.List and I fixed it by adding the following import:
import scala.collection.JavaConversions._
And my foreach loop magically compiled - presumably due to a another
implicit conversion. Now this is the second time I've run into this
problem and I didn't recognize it. I'm not sure that I would know what
to do the next time I run into this. Do you have some advice on how I
should have recognized a missing import that provides implicit
conversions and how I would know what to import? This strikes me as
code obfuscation. I guess this is more of a Scala question....
Thanks,
Philip
On 11/7/2013 2:01 PM, Josh Rosen wrote:
The additional methods on RDDs of pairs are defined in a class called
PairRDDFunctions
(https://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions).
SparkContext provides an implicit conversion from RDD[T] to
PairRDDFunctions[T] to make this transparent to users.
To import those implicit conversions, use
import org.apache.spark.SparkContext._
These conversions are automatically imported by Spark Shell, but
you'll have to import them yourself in standalone programs.
On Thu, Nov 7, 2013 at 11:54 AM, Philip Ogren <[email protected]
<mailto:[email protected]>> wrote:
On the front page <http://spark.incubator.apache.org/> of the
Spark website there is the following simple word count implementation:
file = spark.textFile("hdfs://...")
file.flatMap(line => line.split(" ")).map(word => (word,
1)).reduceByKey(_ + _)
The same code can be found in the Quick Start
<http://spark.incubator.apache.org/docs/latest/quick-start.html>
quide. When I follow the steps in my spark-shell (version 0.8.0)
it works fine. The reduceByKey method is also shown in the list
of transformations
<http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#transformations>
in the Spark Programming Guide. The bottom of this list directs
the reader to the API docs for the class RDD (this link is broken,
BTW). The API docs for RDD
<http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD>
does not list a reduceByKey method for RDD. Also, when I try to
compile the above code in a Scala class definition I get the
following compile error:
value reduceByKey is not a member of
org.apache.spark.rdd.RDD[(java.lang.String, Int)]
I am compiling with maven using the following dependency definition:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.9.3</artifactId>
<version>0.8.0-incubating</version>
</dependency>
Can someone help me understand why this code works fine from the
spark-shell but doesn't seem to exist in the API docs and won't
compile?
Thanks,
Philip