AFAIK it's a known issue of some sort in the Scala REPL, which is what
the Spark REPL is. The PR that was closed was just adding tests to
show it's a bug. I don't know if there is any workaround now.

On Fri, Dec 19, 2014 at 7:21 PM, Jay Hutfles <jayhutf...@gmail.com> wrote:
> Found a problem in the spark-shell, but can't confirm that it's related to
> open issues on Spark's JIRA page.  I was wondering if anyone could help
> identify if this is an issue or if it's already being addressed.
>
> Test:  (in spark-shell)
> case class Person(name: String, age: Int)
> val peopleList = List(Person("Alice", 35), Person("Bob", 47),
> Person("Alice", 35), Person("Bob", 15))
> val peopleRDD = sc.parallelize(peopleList)
> assert(peopleList.distinct.size == peopleRDD.distinct.count)
>
>
> At first I thought it was related to issue SPARK-2620
> (https://issues.apache.org/jira/browse/SPARK-2620), which says case classes
> can't be used as keys in spark-shell due to how case classes are compiled by
> the REPL.  It lists .reduceByKey, .groupByKey and .distinct as being
> affected.  But the associated pull request for adding tests to cover this
> (https://github.com/apache/spark/pull/1588) was closed.
>
> Is this something I just have to live with when using the REPL?  Or is this
> covered by something bigger that's being addressed?
>
> Thanks in advance
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-shell-bug-with-RDDs-and-case-classes-tp20789.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to