[jira] [Commented] (TOREE-428) Can't use case class in the Scala notebook

2020-04-16 Thread Marcello Leida (Jira)


[ 
https://issues.apache.org/jira/browse/TOREE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084653#comment-17084653
 ] 

Marcello Leida commented on TOREE-428:
--

Comfirn here as well that the workaround is not working

> Can't use case class in the Scala notebook
> --
>
> Key: TOREE-428
> URL: https://issues.apache.org/jira/browse/TOREE-428
> Project: TOREE
>  Issue Type: Bug
>  Components: Build
>Reporter: Haifeng Li
>Priority: Major
> Fix For: 0.2.0
>
>
> the version of docker:
> jupyter/all-spark-notebook:lastest
> the way to start docker:
> docker run -it --rm -p : jupyter/all-spark-notebook:latest
> or
> docker ps -a
> docker start -i containerID
> the steps:
> Visit http://localhost:
> Start an toree notebook
> input code above
> {code:java}
> import spark.implicits._
> val p = spark.sparkContext.textFile ("../Data/person.txt")
> val pmap = p.map ( _.split (","))
> pmap.collect()
> {code}
> the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), 
> Array(George, Bush, 68), Array(Bill, Clinton, 68))
> {code:java}
> case class Persons (first_name:String,last_name: String,age:Int)
> val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt))
> personRDD.take(1)
> {code}
> the error message:
> {code:java}
> org.apache.spark.SparkDriverExecutionException: Execution error
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
>   at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
>   ... 39 elided
> Caused by: java.lang.ArrayStoreException: [LPersons;
>   at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
>   at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> The above code is working with the spark-shell. From error message, I 
> speculated that the driver program didn't correctly handle case class Persons 
> to RDD partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TOREE-428) Can't use case class in the Scala notebook

2018-01-03 Thread George Hoffman (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310413#comment-16310413
 ] 

George Hoffman commented on TOREE-428:
--

I can confirm that the workaround also does not work for me.  In what context 
has the workaround been tested?

> Can't use case class in the Scala notebook
> --
>
> Key: TOREE-428
> URL: https://issues.apache.org/jira/browse/TOREE-428
> Project: TOREE
>  Issue Type: Bug
>  Components: Build
>Reporter: Haifeng Li
> Fix For: 0.2.0
>
>
> the version of docker:
> jupyter/all-spark-notebook:lastest
> the way to start docker:
> docker run -it --rm -p : jupyter/all-spark-notebook:latest
> or
> docker ps -a
> docker start -i containerID
> the steps:
> Visit http://localhost:
> Start an toree notebook
> input code above
> {code:java}
> import spark.implicits._
> val p = spark.sparkContext.textFile ("../Data/person.txt")
> val pmap = p.map ( _.split (","))
> pmap.collect()
> {code}
> the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), 
> Array(George, Bush, 68), Array(Bill, Clinton, 68))
> {code:java}
> case class Persons (first_name:String,last_name: String,age:Int)
> val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt))
> personRDD.take(1)
> {code}
> the error message:
> {code:java}
> org.apache.spark.SparkDriverExecutionException: Execution error
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
>   at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
>   ... 39 elided
> Caused by: java.lang.ArrayStoreException: [LPersons;
>   at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
>   at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> The above code is working with the spark-shell. From error message, I 
> speculated that the driver program didn't correctly handle case class Persons 
> to RDD partition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TOREE-428) Can't use case class in the Scala notebook

2017-12-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/TOREE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294798#comment-16294798
 ] 

Harri Hämäläinen commented on TOREE-428:


Can't confirm the fix from [~blue_impala_48d6] , separate cells and the issue 
exists with jupyter/all-spark-notebook:latest.

> Can't use case class in the Scala notebook
> --
>
> Key: TOREE-428
> URL: https://issues.apache.org/jira/browse/TOREE-428
> Project: TOREE
>  Issue Type: Bug
>  Components: Build
>Reporter: Haifeng Li
> Fix For: 0.2.0
>
>
> the version of docker:
> jupyter/all-spark-notebook:lastest
> the way to start docker:
> docker run -it --rm -p : jupyter/all-spark-notebook:latest
> or
> docker ps -a
> docker start -i containerID
> the steps:
> Visit http://localhost:
> Start an toree notebook
> input code above
> {code:java}
> import spark.implicits._
> val p = spark.sparkContext.textFile ("../Data/person.txt")
> val pmap = p.map ( _.split (","))
> pmap.collect()
> {code}
> the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), 
> Array(George, Bush, 68), Array(Bill, Clinton, 68))
> {code:java}
> case class Persons (first_name:String,last_name: String,age:Int)
> val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt))
> personRDD.take(1)
> {code}
> the error message:
> {code:java}
> org.apache.spark.SparkDriverExecutionException: Execution error
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
>   at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
>   ... 39 elided
> Caused by: java.lang.ArrayStoreException: [LPersons;
>   at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
>   at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> The above code is working with the spark-shell. From error message, I 
> speculated that the driver program didn't correctly handle case class Persons 
> to RDD partition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TOREE-428) Can't use case class in the Scala notebook

2017-11-06 Thread Paul Balm (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240452#comment-16240452
 ] 

Paul Balm commented on TOREE-428:
-

I confirm this issue. This is my test case (slightly simpler):

* Create the test data from the terminal: {{F=arraystore.csv ; echo a > $F; 
echo b >> $F; echo c >> $F}}
* Read the file into an RDD with a case class: 

{noformat}
case class IdClass(id: String)
sc.textFile("arraystore.csv").map(IdClass).collect()
{noformat}

This produces the stacktrace in the description.

I don't have any particular insights into the problem, but an 
{{ArrayStoreException}} is normally an indication that an object is being 
stored in an array with an incompatible type. For example when you have an 
array Strings and you're trying to put an {{IdClass}} or {{Person}} object into 
it.


> Can't use case class in the Scala notebook
> --
>
> Key: TOREE-428
> URL: https://issues.apache.org/jira/browse/TOREE-428
> Project: TOREE
>  Issue Type: Bug
>  Components: Build
>Reporter: Haifeng Li
>
> the version of docker:
> jupyter/all-spark-notebook:lastest
> the way to start docker:
> docker run -it --rm -p : jupyter/all-spark-notebook:latest
> or
> docker ps -a
> docker start -i containerID
> the steps:
> Visit http://localhost:
> Start an toree notebook
> input code above
> {code:java}
> import spark.implicits._
> val p = spark.sparkContext.textFile ("../Data/person.txt")
> val pmap = p.map ( _.split (","))
> pmap.collect()
> {code}
> the output:res0: Array[Array[String]] = Array(Array(Barack, Obama, 53), 
> Array(George, Bush, 68), Array(Bill, Clinton, 68))
> {code:java}
> case class Persons (first_name:String,last_name: String,age:Int)
> val personRDD = pmap.map ( p => Persons (p(0), p(1), p(2).toInt))
> personRDD.take(1)
> {code}
> the error message:
> {code:java}
> org.apache.spark.SparkDriverExecutionException: Execution error
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1186)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
>   at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
>   ... 39 elided
> Caused by: java.lang.ArrayStoreException: [LPersons;
>   at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:2043)
>   at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1182)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1711)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> The above code is working with the spark-shell. From error message, I 
> speculated that the driver program didn't correctly handle case class Persons 
> to RDD partition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)