[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd

2017-09-04 Thread Daniel Darabos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152742#comment-16152742
 ] 

Daniel Darabos commented on SPARK-21418:


Sorry for the delay. I can confirm that removing 
{{-Dsun.io.serialization.extendedDebugInfo=true}} is the fix. We only use this 
flag when running unit tests, but it's very useful for debugging serialization 
issues. It happens often in Spark that you accidentally include something in a 
closure that cannot be serialized. It's hard to figure out without this flag 
what caused that.

> NoSuchElementException: None.get on DataFrame.rdd
> -
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> 

[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd

2017-07-16 Thread Daniel Darabos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088952#comment-16088952
 ] 

Daniel Darabos commented on SPARK-21418:


I'm on holiday without a computer through the coming week, but I'll try to
dig deeper after that.

I do recall that we enable a JVM flag for printing extra details on
serialization errors. Now I wonder if that flag collects string forms even
when no error happens. I guess I should not be surprised: if it did not,
there would be no reason to ever disable this feature.

That already suggests an easy workaround :). Thanks!

On Jul 15, 2017 6:44 PM, "Kazuaki Ishizaki (JIRA)"  wrote:


[ https://issues.apache.org/jira/browse/SPARK-21418?page=
com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel=16088659#comment-16088659 ]

Kazuaki Ishizaki commented on SPARK-21418:
--

I am curious why {java.io.ObjectOutputStream.writeOrdinaryObject} calls
`toString` method. Do you specify some option to run this program for JVM?

following lines in a unit test for our Spark application:
{{collect}} fails:
serialization failed: java.util.NoSuchElementException: None.get
$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.
scala:70)
DataSourceScanExec.scala:54)
DataSourceScanExec.scala:52)
1.apply(TraversableLike.scala:234)
1.apply(TraversableLike.scala:234)
ResizableArray.scala:59)
DataSourceScanExec.scala:52)
DataSourceScanExec.scala:75)
QueryPlan.scala:349)
apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(
DataSourceScanExec.scala:75)
class.verboseString(DataSourceScanExec.scala:60)
DataSourceScanExec.scala:75)
generateTreeString(TreeNode.scala:556)
generateTreeString(WholeStageCodegenExec.scala:451)
generateTreeString(TreeNode.scala:576)
TreeNode.scala:480)
TreeNode.scala:477)
TreeNode.scala:474)
ObjectOutputStream.java:1421)
ObjectOutputStream.java:1548)
ObjectOutputStream.java:1509)
ObjectOutputStream.java:1432)
ObjectOutputStream.java:1548)
ObjectOutputStream.java:1509)
ObjectOutputStream.java:1432)
ObjectOutputStream.java:1548)
ObjectOutputStream.java:1509)
ObjectOutputStream.java:1432)
ObjectOutputStream.java:1548)
ObjectOutputStream.java:1509)
ObjectOutputStream.java:1432)
ObjectOutputStream.java:1548)
ObjectOutputStream.java:1509)
ObjectOutputStream.java:1432)
writeObject(List.scala:468)
NativeMethodAccessorImpl.java:62)
DelegatingMethodAccessorImpl.java:43)
ObjectStreamClass.java:1028)
ObjectOutputStream.java:1496)
ObjectOutputStream.java:1432)
ObjectOutputStream.java:1548)
ObjectOutputStream.java:1509)
ObjectOutputStream.java:1432)
ObjectOutputStream.java:1548)
ObjectOutputStream.java:1509)
ObjectOutputStream.java:1432)
writeObject(JavaSerializer.scala:43)
serialize(JavaSerializer.scala:100)
DAGScheduler.scala:1003)
scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:930)
DAGScheduler.scala:874)
doOnReceive(DAGScheduler.scala:1677)
onReceive(DAGScheduler.scala:1669)
onReceive(DAGScheduler.scala:1658)
91fa80fe8a2480d64c430bd10f97b3d44c007bcc#diff-2a91a9a59953aa82fa132aaf45bd73
1bR69 from https://issues.apache.org/jira/browse/SPARK-20070. It tries to
redact sensitive information from {{explain}} output. (We are not trying to
explain anything here, so I doubt it is meant to be running in this case.)
When it needs to access some configuration, it tries to take it from the
"current" Spark session, which it just reads from a thread-local variable.
We appear to be on a thread where this variable is not set.
constraint on multi-threaded Spark applications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


> NoSuchElementException: None.get on DataFrame.rdd
> -
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> 

[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd

2017-07-15 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088659#comment-16088659
 ] 

Kazuaki Ishizaki commented on SPARK-21418:
--

I am curious why {java.io.ObjectOutputStream.writeOrdinaryObject} calls 
`toString` method. Do you specify some option to run this program for JVM?

> NoSuchElementException: None.get on DataFrame.rdd
> -
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at