Re: Where is reduceByKey?

Matei Zaharia Thu, 07 Nov 2013 16:35:58 -0800

Yeah, it’s true that this feature doesn’t provide any way to give good error 
messages. Maybe some IDEs will support it eventually, though I haven’t seen it.


Matei

On Nov 7, 2013, at 3:46 PM, Philip Ogren <philip.og...@oracle.com> wrote:

> Thanks - I think this would be a helpful note to add to the docs.  I went and 
> read a few things about Scala implicit conversions (I'm obviously new to the 
> language) and it seems like a very powerful language feature and now that I 
> know about them it will certainly be easy to identify when they are missing 
> (i.e. the first thing to suspect when you see a "not a member" compilation 
> message.)  I'm still a bit mystified as to how you would go about finding the 
> appropriate imports except that I suppose you aren't very likely to use 
> methods that you don't already know about!  Unless you are copying code 
> verbatim that doesn't have the necessary import statements....
> 
> 
> On 11/7/2013 4:05 PM, Matei Zaharia wrote:
>> Yeah, this is confusing and unfortunately as far as I know it’s API 
>> specific. Maybe we should add this to the documentation page for RDD.
>> 
>> The reason for these conversions is to only allow some operations based on 
>> the underlying data type of the collection. For example, Scala collections 
>> support sum() as long as they contain numeric types. That’s fine for the 
>> Scala collection library since its conversions are imported by default, but 
>> I guess it makes it confusing for third-party apps.
>> 
>> Matei
>> 
>> On Nov 7, 2013, at 1:15 PM, Philip Ogren <philip.og...@oracle.com> wrote:
>> 
>>> I remember running into something very similar when trying to perform a 
>>> foreach on java.util.List and I fixed it by adding the following import:
>>> 
>>> import scala.collection.JavaConversions._
>>> 
>>> And my foreach loop magically compiled - presumably due to a another 
>>> implicit conversion.  Now this is the second time I've run into this 
>>> problem and I didn't recognize it.  I'm not sure that I would know what to 
>>> do the next time I run into this.  Do you have some advice on how I should 
>>> have recognized a missing import that provides implicit conversions and how 
>>> I would know what to import?  This strikes me as code obfuscation.  I guess 
>>> this is more of a Scala question....
>>> 
>>> Thanks,
>>> Philip
>>> 
>>> 
>>> 
>>> On 11/7/2013 2:01 PM, Josh Rosen wrote:
>>>> The additional methods on RDDs of pairs are defined in a class called 
>>>> PairRDDFunctions 
>>>> (https://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions).
>>>>   SparkContext provides an implicit conversion from RDD[T] to 
>>>> PairRDDFunctions[T] to make this transparent to users.
>>>> 
>>>> To import those implicit conversions, use
>>>> 
>>>> import org.apache.spark.SparkContext._
>>>> 
>>>> These conversions are automatically imported by Spark Shell, but you'll 
>>>> have to import them yourself in standalone programs.
>>>> 
>>>> 
>>>> On Thu, Nov 7, 2013 at 11:54 AM, Philip Ogren <philip.og...@oracle.com> 
>>>> wrote:
>>>> On the front page of the Spark website there is the following simple word 
>>>> count implementation:
>>>> 
>>>> file = spark.textFile("hdfs://...")
>>>> file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ 
>>>> + _)
>>>> 
>>>> The same code can be found in the Quick Start quide.  When I follow the 
>>>> steps in my spark-shell (version 0.8.0) it works fine.  The reduceByKey 
>>>> method is also shown in the list of transformations in the Spark 
>>>> Programming Guide.  The bottom of this list directs the reader to the API 
>>>> docs for the class RDD (this link is broken, BTW).  The API docs for RDD 
>>>> does not list a reduceByKey method for RDD.  Also, when I try to compile 
>>>> the above code in a Scala class definition I get the following compile 
>>>> error:
>>>> 
>>>> value reduceByKey is not a member of 
>>>> org.apache.spark.rdd.RDD[(java.lang.String, Int)]
>>>> 
>>>> I am compiling with maven using the following dependency definition:
>>>> 
>>>>         <dependency>
>>>>             <groupId>org.apache.spark</groupId>
>>>>             <artifactId>spark-core_2.9.3</artifactId>
>>>>             <version>0.8.0-incubating</version>
>>>>         </dependency>
>>>> 
>>>> Can someone help me understand why this code works fine from the 
>>>> spark-shell but doesn't seem to exist in the API docs and won't compile?  
>>>> 
>>>> Thanks,
>>>> Philip
>>>> 
>>> 
>> 
>

Re: Where is reduceByKey?

Reply via email to