AFAIK it's a known issue of some sort in the Scala REPL, which is what the Spark REPL is. The PR that was closed was just adding tests to show it's a bug. I don't know if there is any workaround now.
On Fri, Dec 19, 2014 at 7:21 PM, Jay Hutfles <jayhutf...@gmail.com> wrote: > Found a problem in the spark-shell, but can't confirm that it's related to > open issues on Spark's JIRA page. I was wondering if anyone could help > identify if this is an issue or if it's already being addressed. > > Test: (in spark-shell) > case class Person(name: String, age: Int) > val peopleList = List(Person("Alice", 35), Person("Bob", 47), > Person("Alice", 35), Person("Bob", 15)) > val peopleRDD = sc.parallelize(peopleList) > assert(peopleList.distinct.size == peopleRDD.distinct.count) > > > At first I thought it was related to issue SPARK-2620 > (https://issues.apache.org/jira/browse/SPARK-2620), which says case classes > can't be used as keys in spark-shell due to how case classes are compiled by > the REPL. It lists .reduceByKey, .groupByKey and .distinct as being > affected. But the associated pull request for adding tests to cover this > (https://github.com/apache/spark/pull/1588) was closed. > > Is this something I just have to live with when using the REPL? Or is this > covered by something bigger that's being addressed? > > Thanks in advance > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spark-shell-bug-with-RDDs-and-case-classes-tp20789.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org