[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd
[ https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152742#comment-16152742 ] Daniel Darabos commented on SPARK-21418: Sorry for the delay. I can confirm that removing {{-Dsun.io.serialization.extendedDebugInfo=true}} is the fix. We only use this flag when running unit tests, but it's very useful for debugging serialization issues. It happens often in Spark that you accidentally include something in a closure that cannot be serialized. It's hard to figure out without this flag what caused that. > NoSuchElementException: None.get on DataFrame.rdd > - > > Key: SPARK-21418 > URL: https://issues.apache.org/jira/browse/SPARK-21418 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Daniel Darabos > > I don't have a minimal reproducible example yet, sorry. I have the following > lines in a unit test for our Spark application: > {code} > val df = mySparkSession.read.format("jdbc") > .options(Map("url" -> url, "dbtable" -> "test_table")) > .load() > df.show > println(df.rdd.collect) > {code} > The output shows the DataFrame contents from {{df.show}}. But the {{collect}} > fails: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.util.NoSuchElementException: None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477) > at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at >
[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd
[ https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088952#comment-16088952 ] Daniel Darabos commented on SPARK-21418: I'm on holiday without a computer through the coming week, but I'll try to dig deeper after that. I do recall that we enable a JVM flag for printing extra details on serialization errors. Now I wonder if that flag collects string forms even when no error happens. I guess I should not be surprised: if it did not, there would be no reason to ever disable this feature. That already suggests an easy workaround :). Thanks! On Jul 15, 2017 6:44 PM, "Kazuaki Ishizaki (JIRA)"wrote: [ https://issues.apache.org/jira/browse/SPARK-21418?page= com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel=16088659#comment-16088659 ] Kazuaki Ishizaki commented on SPARK-21418: -- I am curious why {java.io.ObjectOutputStream.writeOrdinaryObject} calls `toString` method. Do you specify some option to run this program for JVM? following lines in a unit test for our Spark application: {{collect}} fails: serialization failed: java.util.NoSuchElementException: None.get $apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec. scala:70) DataSourceScanExec.scala:54) DataSourceScanExec.scala:52) 1.apply(TraversableLike.scala:234) 1.apply(TraversableLike.scala:234) ResizableArray.scala:59) DataSourceScanExec.scala:52) DataSourceScanExec.scala:75) QueryPlan.scala:349) apache$spark$sql$execution$DataSourceScanExec$$super$verboseString( DataSourceScanExec.scala:75) class.verboseString(DataSourceScanExec.scala:60) DataSourceScanExec.scala:75) generateTreeString(TreeNode.scala:556) generateTreeString(WholeStageCodegenExec.scala:451) generateTreeString(TreeNode.scala:576) TreeNode.scala:480) TreeNode.scala:477) TreeNode.scala:474) ObjectOutputStream.java:1421) ObjectOutputStream.java:1548) ObjectOutputStream.java:1509) ObjectOutputStream.java:1432) ObjectOutputStream.java:1548) ObjectOutputStream.java:1509) ObjectOutputStream.java:1432) ObjectOutputStream.java:1548) ObjectOutputStream.java:1509) ObjectOutputStream.java:1432) ObjectOutputStream.java:1548) ObjectOutputStream.java:1509) ObjectOutputStream.java:1432) ObjectOutputStream.java:1548) ObjectOutputStream.java:1509) ObjectOutputStream.java:1432) writeObject(List.scala:468) NativeMethodAccessorImpl.java:62) DelegatingMethodAccessorImpl.java:43) ObjectStreamClass.java:1028) ObjectOutputStream.java:1496) ObjectOutputStream.java:1432) ObjectOutputStream.java:1548) ObjectOutputStream.java:1509) ObjectOutputStream.java:1432) ObjectOutputStream.java:1548) ObjectOutputStream.java:1509) ObjectOutputStream.java:1432) writeObject(JavaSerializer.scala:43) serialize(JavaSerializer.scala:100) DAGScheduler.scala:1003) scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:930) DAGScheduler.scala:874) doOnReceive(DAGScheduler.scala:1677) onReceive(DAGScheduler.scala:1669) onReceive(DAGScheduler.scala:1658) 91fa80fe8a2480d64c430bd10f97b3d44c007bcc#diff-2a91a9a59953aa82fa132aaf45bd73 1bR69 from https://issues.apache.org/jira/browse/SPARK-20070. It tries to redact sensitive information from {{explain}} output. (We are not trying to explain anything here, so I doubt it is meant to be running in this case.) When it needs to access some configuration, it tries to take it from the "current" Spark session, which it just reads from a thread-local variable. We appear to be on a thread where this variable is not set. constraint on multi-threaded Spark applications. -- This message was sent by Atlassian JIRA (v6.4.14#64029) > NoSuchElementException: None.get on DataFrame.rdd > - > > Key: SPARK-21418 > URL: https://issues.apache.org/jira/browse/SPARK-21418 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Daniel Darabos > > I don't have a minimal reproducible example yet, sorry. I have the following > lines in a unit test for our Spark application: > {code} > val df = mySparkSession.read.format("jdbc") > .options(Map("url" -> url, "dbtable" -> "test_table")) > .load() > df.show > println(df.rdd.collect) > {code} > The output shows the DataFrame contents from {{df.show}}. But the {{collect}} > fails: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.util.NoSuchElementException: None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70) > at >
[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd
[ https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088659#comment-16088659 ] Kazuaki Ishizaki commented on SPARK-21418: -- I am curious why {java.io.ObjectOutputStream.writeOrdinaryObject} calls `toString` method. Do you specify some option to run this program for JVM? > NoSuchElementException: None.get on DataFrame.rdd > - > > Key: SPARK-21418 > URL: https://issues.apache.org/jira/browse/SPARK-21418 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Daniel Darabos > > I don't have a minimal reproducible example yet, sorry. I have the following > lines in a unit test for our Spark application: > {code} > val df = mySparkSession.read.format("jdbc") > .options(Map("url" -> url, "dbtable" -> "test_table")) > .load() > df.show > println(df.rdd.collect) > {code} > The output shows the DataFrame contents from {{df.show}}. But the {{collect}} > fails: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.util.NoSuchElementException: None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477) > at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at