[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083606#comment-17083606 ] Alexandre Archambault commented on SPARK-2620: -- FYI, adding "final" in front of "case class P(name: String)" fixes that. {{final case class P(name: String)}} Beware that this works thanks to a bug, described around [https://github.com/scala/bug/issues/4440#issuecomment-365858185] in particular, that makes final case classes ignore their outer reference in equals. It's still not "fixed" as of scala 2.12.10 / 2.13.1, so adding final still works as a workaround. [https://github.com/scala/bug/issues/11940] discusses maybe allowing to tweak outer references comparisons, which could offer a way to properly fix the issue here. (One could add an equals method to the outer wrapper, if [https://github.com/scala/bug/issues/11940] gets fixed, fixing the issue here.) > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0, > 2.2.0, 2.3.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: bulk-closed, case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482099#comment-16482099 ] Stavros Kontopoulos commented on SPARK-2620: I can reproduce this with 2.4 snapshot. Overriding equals and hashCode changes the behavior. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0, > 2.2.0, 2.3.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822726#comment-15822726 ] Hyukjin Kwon commented on SPARK-2620: - ^ I can still reproduce this. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803407#comment-15803407 ] Subhankar Dey Sarkar commented on SPARK-2620: - I am using spark 2 and just used the above commands to test is case classes work as key in case of reducebykey or joins, seems like it does not, please see below:- scala> case class P(name:String) defined class P scala> val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) ps: Array[P] = Array(P(alice), P(bob), P(charly), P(bob)) scala> sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect res6: Array[(P, Int)] = Array((P(charly),1), (P(alice),1), (P(bob),1), (P(bob),1)) > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674308#comment-15674308 ] holdenk commented on SPARK-2620: I don't think its been resolved, does your code need to be in the repl or can you compile it? Do you have a small repro case so we can see if it might be the same issue? If your working in Jupyter its possible that the issue might be fixed by using the new scala kernel which has a different REPL backend but I haven't had a chance to investigate it and see if the same issue is present there. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654830#comment-15654830 ] Shirish commented on SPARK-2620: Has this been resolved in 1.6? I am facing similar issue - not sure if my problem is because of this issue. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15538581#comment-15538581 ] Maciej Szymkiewicz commented on SPARK-2620: --- With Scala 2.11 we get around by creating a package: {code} scala> :paste -raw // Entering paste mode (ctrl-D to finish) package domain case class P(name:String) // Exiting paste mode, now interpreting. scala> import domain._ import domain._ scala> val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) ps: Array[domain.P] = Array(P(alice), P(bob), P(charly), P(bob)) scala> sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect res0: Array[(domain.P, Int)] = Array((P(bob),2), (P(alice),1), (P(charly),1)) {code} Maybe this is a hint how this can be resolved. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735557#comment-14735557 ] Glenn Strycker commented on SPARK-2620: --- I am finding similar behavior for a non-case-class RDD... see SPARK-10493 > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341895#comment-14341895 ] Marko Bonaci commented on SPARK-2620: - *Spark 1.2 shell local:* {code:java} scala> case class P(name:String) defined class P scala> val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) ps: Array[P] = Array(P(alice), P(bob), P(charly), P(bob)) scala> sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect res8: Array[(P, Int)] = Array((P(alice),1), (P(charly),1), (P(bob),2)) {code} > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286069#comment-14286069 ] Tobias Schlatter commented on SPARK-2620: - It is a hack in an attempt to have anonymous functions (and other classes) close over REPL state and send them over the wire (rather than re-running the static initialization code on the executor). It's up to you if you want to take the [red pill|https://github.com/apache/spark/blob/b6cf1348170951396a6a5d8a65fb670382304f5b/repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala#L100]. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285381#comment-14285381 ] Frank Rosner commented on SPARK-2620: - Why is the Spark REPL wrapping user code into classes instead of objects? Is this related to how the stuff gets send to the nodes? Can't we change it? > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285074#comment-14285074 ] Tobias Schlatter commented on SPARK-2620: - [SI-1698|https://issues.scala-lang.org/browse/SI-1698] TL;DR Yet another Scala fail: Officially by design > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285058#comment-14285058 ] Derrick Burns commented on SPARK-2620: -- Thanks for the info! It would seem to me that the latter is a bug in the Scala compiler. Specifically, if one wanted an isInstanceOf check that ignored the outer class, it would seem natural to encode that as: {code} x.isInstanceOf[a#B] {code} On Tue, Jan 20, 2015 at 5:21 PM, Tobias Schlatter (JIRA) > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284969#comment-14284969 ] Tobias Schlatter commented on SPARK-2620: - I am currently looking into the various issues in the REPL. This one is caused by the fact that the Spark REPL (unlike the Scala REPL) uses classes instead of objects to wrap user code. This leads to serialized case classes having different outer pointers and therefore do not equal. Fun fact: Given: {code} class A { class B } val a = new A {code} {code} x match { case _: a.B => true case _ => false } {code} and {code} x.isInstanceOf[a.B] {code} are not equivalent (former checks outer pointer, latter does not). > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282608#comment-14282608 ] Derrick Burns commented on SPARK-2620: -- I think we need a solution not a workaround that requires end user action. I suspect that there is a solution to this problem in scalaz. Alternatively, I suspect that one can write a macro to generate the proper equality check. Sent from my iPhone > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282323#comment-14282323 ] Frank Rosner commented on SPARK-2620: - The issue is caused by the fact that pattern matching of case classes does not work in the Spark shell. Reducing by key (same as grouping, distinc operators, etc.) rely on equality checking of objects. Case classes implement equality checking by using pattern matching to perform type checking and conversion. When I implement a custom equals method for the case class that checks the type with {{isInstanceOf}} and casts with {{asInstanceOf}}, then {{==}} works and so do distinct and key-based operations. Does it make sense to break this down and rework this issue to cover the actual problem rather than a symptom? Are there any plans to fix this issue? It requires some extra effort when working with case classes and the REPL, especially blocking rapid data exploration and analytics. I will try to dig into the code and see whether I can find a solution but there seems to be an ongoing discussion for quite a while now. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Assignee: Tobias Schlatter >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209255#comment-14209255 ] Derrick Burns commented on SPARK-2620: -- I also hit the bug when running Spark 1.1.0 in local mode. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194576#comment-14194576 ] Andre Schumacher commented on SPARK-2620: - I also bumped into this issue (on Spark 1.1.0) and it is kind of extremely annoying although it only affects the REPL. Is anybody actively working on reolving this? Given it's already a few months old: are there any blockers for making this work? Matei mentioned the way code is wrapped inside the REPL. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153283#comment-14153283 ] matthias commented on SPARK-2620: - I have it too with 1.1.0 > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143895#comment-14143895 ] Grega Kespret commented on SPARK-2620: -- We have this issue on Spark 1.1.0. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134101#comment-14134101 ] Daniel Siegmann commented on SPARK-2620: I have tested the case in spark-shell on Spark 1.1.0. It is still broken. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134096#comment-14134096 ] Sean Owen commented on SPARK-2620: -- FWIW, here is a mailing list comment that suggests 1.1 works with these case classes, although this is not a case where the REPL is being used: http://apache-spark-user-list.1001560.n3.nabble.com/Compiler-issues-for-multiple-map-on-RDD-td14248.html > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108998#comment-14108998 ] Prashant Sharma commented on SPARK-2620: I was wrong, this only fails in REPL. But when I test {noformat} scala> P("bob") == P("bob") P("bob") == P("bob") res19: Boolean = true {noformat} I am confused why this happens, I will dig deeper some other time. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108958#comment-14108958 ] Prashant Sharma commented on SPARK-2620: I just tried these test code snippets on the Spark Repl(built from master), and they pass with expected results. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074673#comment-14074673 ] Matei Zaharia commented on SPARK-2620: -- The problem is that case class is compiled differently in the spark shell than in local tests. in spark-shell it currently becomes an inner class, which is unlike the behavior in plain Scala or the plain Scala REPL, but is due to the way we create wrapper objects for each command. You could add a test to ReplSuite, then that will fail. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074196#comment-14074196 ] Apache Spark commented on SPARK-2620: - User 'ash211' has created a pull request for this issue: https://github.com/apache/spark/pull/1588 > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074195#comment-14074195 ] Andrew Ash commented on SPARK-2620: --- I attempted to write a unit test to demonstrate this failure, but my unit tests passed for both distinct and groupByKey... https://github.com/apache/spark/pull/1588 [~gmaas] can you repro either 1) with something other than MASTER=local[4] or 2) an application rather than in spark-shell ? > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070604#comment-14070604 ] Aaron commented on SPARK-2620: -- If you look at the diff of distinct from branch-0.9 to master you see - def distinct(numPartitions: Int): RDD[T] = + def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = Is it possible that case classes don't have an implicit ordering and that is why this fails? > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070543#comment-14070543 ] Daniel Siegmann commented on SPARK-2620: I have confirmed this on Spark 1.0.1 as well. As Gerard noted on the mailing list, this bug does NOT affect Spark 0.9.1 (confirmed in Spark shell). > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070443#comment-14070443 ] Gerard Maas commented on SPARK-2620: [~sowen] No, doesn't look like it is. > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070439#comment-14070439 ] Sean Owen commented on SPARK-2620: -- Duplicate of https://issues.apache.org/jira/browse/SPARK-1199 I think? > case class cannot be used as key for reduce > --- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 > Environment: reproduced on spark-shell local[4] >Reporter: Gerard Maas >Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)