[jira] [Commented] (SPARK-4459) JavaRDDLike.groupBy[K](f: JFunction[T, K]) may fail with typechecking errors

2014-11-17 Thread Alok Saldanha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215202#comment-14215202
 ] 

Alok Saldanha commented on SPARK-4459:
--

I created a standalone gist to demonstrate the problem, please see 
https://gist.github.com/alokito/40878fc25af21984463f

 JavaRDDLike.groupBy[K](f: JFunction[T, K]) may fail with typechecking errors
 

 Key: SPARK-4459
 URL: https://issues.apache.org/jira/browse/SPARK-4459
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.0.2, 1.1.0
Reporter: Alok Saldanha

 I believe this issue is essentially the same as SPARK-668.
 Original error: 
 {code}
 [ERROR] 
 /Users/saldaal1/workspace/JavaSparkSimpleApp/src/main/java/SimpleApp.java:[29,105]
  no suitable method found for 
 groupBy(org.apache.spark.api.java.function.Functionscala.Tuple2java.lang.String,java.lang.Long,java.lang.Long)
 [ERROR] method 
 org.apache.spark.api.java.JavaPairRDD.KgroupBy(org.apache.spark.api.java.function.Functionscala.Tuple2K,java.lang.Long,K)
  is not applicable
 [ERROR] (inferred type does not conform to equality constraint(s)
 {code}
 from core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
 {code}
 211  /**
 212* Return an RDD of grouped elements. Each group consists of a key and 
 a sequence of elements
 213* mapping to that key.
 214*/
 215   def groupBy[K](f: JFunction[T, K]): JavaPairRDD[K, JIterable[T]] = {
 216 implicit val ctagK: ClassTag[K] = fakeClassTag
 217 implicit val ctagV: ClassTag[JList[T]] = fakeClassTag
 218 JavaPairRDD.fromRDD(groupByResultToJava(rdd.groupBy(f)(fakeClassTag)))
 219   }
 {code}
 Then in core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala:
 {code}
   45 class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
   46(implicit val kClassTag: ClassTag[K], implicit 
 val vClassTag: ClassTag[V])
   47   extends JavaRDDLike[(K, V), JavaPairRDD[K, V]] {
 {code}
 The problem is that the type parameter T in JavaRDDLike is Tuple2[K,V], which 
 means the combined signature for groupBy in the JavaPairRDD is 
 {code}
 groupBy[K](f: JFunction[Tuple2[K,V], K])
 {code}
 which imposes an unfortunate correlation between the Tuple2 and the return 
 type of the grouping function, namely that the return type of the grouping 
 function must be the same as the first type of the JavaPairRDD.
 If we compare the method signature to flatMap:
 {code}
 105   /**
 106*  Return a new RDD by first applying a function to all elements of 
 this
 107*  RDD, and then flattening the results.
 108*/
 109   def flatMap[U](f: FlatMapFunction[T, U]): JavaRDD[U] = {
 110 import scala.collection.JavaConverters._
 111 def fn = (x: T) = f.call(x).asScala
 112 JavaRDD.fromRDD(rdd.flatMap(fn)(fakeClassTag[U]))(fakeClassTag[U])
 113   }
 {code}
 we see there should be an easy fix by changing the type parameter of the 
 groupBy function from K to U.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4459) JavaRDDLike.groupBy[K](f: JFunction[T, K]) may fail with typechecking errors

2014-11-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215301#comment-14215301
 ] 

Apache Spark commented on SPARK-4459:
-

User 'alokito' has created a pull request for this issue:
https://github.com/apache/spark/pull/3327

 JavaRDDLike.groupBy[K](f: JFunction[T, K]) may fail with typechecking errors
 

 Key: SPARK-4459
 URL: https://issues.apache.org/jira/browse/SPARK-4459
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.0.2, 1.1.0
Reporter: Alok Saldanha

 I believe this issue is essentially the same as SPARK-668.
 Original error: 
 {code}
 [ERROR] 
 /Users/saldaal1/workspace/JavaSparkSimpleApp/src/main/java/SimpleApp.java:[29,105]
  no suitable method found for 
 groupBy(org.apache.spark.api.java.function.Functionscala.Tuple2java.lang.String,java.lang.Long,java.lang.Long)
 [ERROR] method 
 org.apache.spark.api.java.JavaPairRDD.KgroupBy(org.apache.spark.api.java.function.Functionscala.Tuple2K,java.lang.Long,K)
  is not applicable
 [ERROR] (inferred type does not conform to equality constraint(s)
 {code}
 from core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
 {code}
 211  /**
 212* Return an RDD of grouped elements. Each group consists of a key and 
 a sequence of elements
 213* mapping to that key.
 214*/
 215   def groupBy[K](f: JFunction[T, K]): JavaPairRDD[K, JIterable[T]] = {
 216 implicit val ctagK: ClassTag[K] = fakeClassTag
 217 implicit val ctagV: ClassTag[JList[T]] = fakeClassTag
 218 JavaPairRDD.fromRDD(groupByResultToJava(rdd.groupBy(f)(fakeClassTag)))
 219   }
 {code}
 Then in core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala:
 {code}
   45 class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
   46(implicit val kClassTag: ClassTag[K], implicit 
 val vClassTag: ClassTag[V])
   47   extends JavaRDDLike[(K, V), JavaPairRDD[K, V]] {
 {code}
 The problem is that the type parameter T in JavaRDDLike is Tuple2[K,V], which 
 means the combined signature for groupBy in the JavaPairRDD is 
 {code}
 groupBy[K](f: JFunction[Tuple2[K,V], K])
 {code}
 which imposes an unfortunate correlation between the Tuple2 and the return 
 type of the grouping function, namely that the return type of the grouping 
 function must be the same as the first type of the JavaPairRDD.
 If we compare the method signature to flatMap:
 {code}
 105   /**
 106*  Return a new RDD by first applying a function to all elements of 
 this
 107*  RDD, and then flattening the results.
 108*/
 109   def flatMap[U](f: FlatMapFunction[T, U]): JavaRDD[U] = {
 110 import scala.collection.JavaConverters._
 111 def fn = (x: T) = f.call(x).asScala
 112 JavaRDD.fromRDD(rdd.flatMap(fn)(fakeClassTag[U]))(fakeClassTag[U])
 113   }
 {code}
 we see there should be an easy fix by changing the type parameter of the 
 groupBy function from K to U.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org