[jira] [Commented] (TOREE-428) Can't use case class in the Scala notebook
[ https://issues.apache.org/jira/browse/TOREE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084653#comment-17084653 ] Marcello Leida commented on TOREE-428: -- Comfirn here as well that the workaround is not working > Can't use case class in the Scala notebook > -- > > Key: TOREE-428 > URL: https://issues.apache.org/jira/browse/TOREE-428 > Project: TOREE > Issue Type: Bug > Components: Build >Reporter: Haifeng Li >Priority: Major > Fix For: 0.2.0 > > > the version of docker: > jupyter/all-spark-notebook:lastest > the way to start docker: > docker run -it --rm -p : jupyter/all-spark-notebook:latest > or > docker ps -a > docker start -i containerID > the steps: > Visit http://localhost: > Start an toree notebook > input code above > {code:java} > import spark.implicits._ > val p = spark.sparkContext.textFile ("../Data/person.txt") > val pmap = p.map ( _.split (",")) > pmap.collect() > {code} > the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), > Array(George, Bush, 68), Array(Bill, Clinton, 68)) > {code:java} > case class Persons (first_name:String,last_name: String,age:Int) > val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt)) > personRDD.take(1) > {code} > the error message: > {code:java} > org.apache.spark.SparkDriverExecutionException: Execution error > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062) > at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at org.apache.spark.rdd.RDD.take(RDD.scala:1327) > ... 39 elided > Caused by: java.lang.ArrayStoreException: [LPersons; > at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) > at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > The above code is working with the spark-shell. From error message, I > speculated that the driver program didn't correctly handle case class Persons > to RDD partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TOREE-428) Can't use case class in the Scala notebook
[ https://issues.apache.org/jira/browse/TOREE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310413#comment-16310413 ] George Hoffman commented on TOREE-428: -- I can confirm that the workaround also does not work for me. In what context has the workaround been tested? > Can't use case class in the Scala notebook > -- > > Key: TOREE-428 > URL: https://issues.apache.org/jira/browse/TOREE-428 > Project: TOREE > Issue Type: Bug > Components: Build >Reporter: Haifeng Li > Fix For: 0.2.0 > > > the version of docker: > jupyter/all-spark-notebook:lastest > the way to start docker: > docker run -it --rm -p : jupyter/all-spark-notebook:latest > or > docker ps -a > docker start -i containerID > the steps: > Visit http://localhost: > Start an toree notebook > input code above > {code:java} > import spark.implicits._ > val p = spark.sparkContext.textFile ("../Data/person.txt") > val pmap = p.map ( _.split (",")) > pmap.collect() > {code} > the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), > Array(George, Bush, 68), Array(Bill, Clinton, 68)) > {code:java} > case class Persons (first_name:String,last_name: String,age:Int) > val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt)) > personRDD.take(1) > {code} > the error message: > {code:java} > org.apache.spark.SparkDriverExecutionException: Execution error > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062) > at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at org.apache.spark.rdd.RDD.take(RDD.scala:1327) > ... 39 elided > Caused by: java.lang.ArrayStoreException: [LPersons; > at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) > at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > The above code is working with the spark-shell. From error message, I > speculated that the driver program didn't correctly handle case class Persons > to RDD partition. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TOREE-428) Can't use case class in the Scala notebook
[ https://issues.apache.org/jira/browse/TOREE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294798#comment-16294798 ] Harri Hämäläinen commented on TOREE-428: Can't confirm the fix from [~blue_impala_48d6] , separate cells and the issue exists with jupyter/all-spark-notebook:latest. > Can't use case class in the Scala notebook > -- > > Key: TOREE-428 > URL: https://issues.apache.org/jira/browse/TOREE-428 > Project: TOREE > Issue Type: Bug > Components: Build >Reporter: Haifeng Li > Fix For: 0.2.0 > > > the version of docker: > jupyter/all-spark-notebook:lastest > the way to start docker: > docker run -it --rm -p : jupyter/all-spark-notebook:latest > or > docker ps -a > docker start -i containerID > the steps: > Visit http://localhost: > Start an toree notebook > input code above > {code:java} > import spark.implicits._ > val p = spark.sparkContext.textFile ("../Data/person.txt") > val pmap = p.map ( _.split (",")) > pmap.collect() > {code} > the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), > Array(George, Bush, 68), Array(Bill, Clinton, 68)) > {code:java} > case class Persons (first_name:String,last_name: String,age:Int) > val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt)) > personRDD.take(1) > {code} > the error message: > {code:java} > org.apache.spark.SparkDriverExecutionException: Execution error > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062) > at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at org.apache.spark.rdd.RDD.take(RDD.scala:1327) > ... 39 elided > Caused by: java.lang.ArrayStoreException: [LPersons; > at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) > at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > The above code is working with the spark-shell. From error message, I > speculated that the driver program didn't correctly handle case class Persons > to RDD partition. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TOREE-428) Can't use case class in the Scala notebook
[ https://issues.apache.org/jira/browse/TOREE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240452#comment-16240452 ] Paul Balm commented on TOREE-428: - I confirm this issue. This is my test case (slightly simpler): * Create the test data from the terminal: {{F=arraystore.csv ; echo a > $F; echo b >> $F; echo c >> $F}} * Read the file into an RDD with a case class: {noformat} case class IdClass(id: String) sc.textFile("arraystore.csv").map(IdClass).collect() {noformat} This produces the stacktrace in the description. I don't have any particular insights into the problem, but an {{ArrayStoreException}} is normally an indication that an object is being stored in an array with an incompatible type. For example when you have an array Strings and you're trying to put an {{IdClass}} or {{Person}} object into it. > Can't use case class in the Scala notebook > -- > > Key: TOREE-428 > URL: https://issues.apache.org/jira/browse/TOREE-428 > Project: TOREE > Issue Type: Bug > Components: Build >Reporter: Haifeng Li > > the version of docker: > jupyter/all-spark-notebook:lastest > the way to start docker: > docker run -it --rm -p : jupyter/all-spark-notebook:latest > or > docker ps -a > docker start -i containerID > the steps: > Visit http://localhost: > Start an toree notebook > input code above > {code:java} > import spark.implicits._ > val p = spark.sparkContext.textFile ("../Data/person.txt") > val pmap = p.map ( _.split (",")) > pmap.collect() > {code} > the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), > Array(George, Bush, 68), Array(Bill, Clinton, 68)) > {code:java} > case class Persons (first_name:String,last_name: String,age:Int) > val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt)) > personRDD.take(1) > {code} > the error message: > {code:java} > org.apache.spark.SparkDriverExecutionException: Execution error > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062) > at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at org.apache.spark.rdd.RDD.take(RDD.scala:1327) > ... 39 elided > Caused by: java.lang.ArrayStoreException: [LPersons; > at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043) > at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > The above code is working with the spark-shell. From error message, I > speculated that the driver program didn't correctly handle case class Persons > to RDD partition. -- This message was sent by Atlassian JIRA (v6.4.14#64029)