[jira] [Commented] (SPARK-22714) Spark API Not responding when Fatal exception occurred in event loop

2018-01-05 Thread todesking (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312872#comment-16312872
 ] 

todesking commented on SPARK-22714:
---

> is this reproducible outside of Spark REPL? 

Yes. Reproduction code here: 
https://github.com/todesking/sandbox/tree/master/SPARK-22714

To reproduce: 

{noformat}
sbt assembly && spark-submit 
target/scala-2.11/spark-22714-assembly-0.1-SNAPSHOT.jar --config 
spark.driver.memory=1g
{noformat}


> Spark API Not responding when Fatal exception occurred in event loop
> 
>
> Key: SPARK-22714
> URL: https://issues.apache.org/jira/browse/SPARK-22714
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: todesking
>Priority: Critical
>
> To reproduce, let Spark to throw an OOM Exception in event loop:
> {noformat}
> scala> spark.sparkContext.getConf.get("spark.driver.memory")
> res0: String = 1g
> scala> val a = new Array[Int](4 * 1000 * 1000)
> scala> val ds = spark.createDataset(a)
> scala> ds.rdd.zipWithIndex
> [Stage 0:>  (0 + 0) / 
> 3]Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: 
> Java heap space
> [Stage 0:>  (0 + 0) / 
> 3]
> // Spark is not responding
> {noformat}
> While not responding, Spark waiting for some Promise, but is never done.
> The promise depends some process in event loop thread, but the thread is dead 
> when Fatal exception is thrown.
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x7ffc9300b000 nid=0x1703 waiting on 
> condition [0x70216000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007ad978eb8> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
> at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:619)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
> at 
> org.apache.spark.rdd.ZippedWithIndexRDD.(ZippedWithIndexRDD.scala:50)
> at 
> org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1293)
> at 
> org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1293)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1292)
> {noformat}
> I don't know how to fix it properly, but it seems we need to add Fatal error 
> handling to EventLoop.run() in core/EventLoop.scala



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22714) Spark API Not responding when Fatal exception occurred in event loop

2018-01-03 Thread Sujith Jay Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309366#comment-16309366
 ] 

Sujith Jay Nair commented on SPARK-22714:
-

Hi [~todesking], is this reproducible outside of Spark REPL? Trying to 
understand if this is specific to Spark shell.

> Spark API Not responding when Fatal exception occurred in event loop
> 
>
> Key: SPARK-22714
> URL: https://issues.apache.org/jira/browse/SPARK-22714
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: todesking
>Priority: Critical
>
> To reproduce, let Spark to throw an OOM Exception in event loop:
> {noformat}
> scala> spark.sparkContext.getConf.get("spark.driver.memory")
> res0: String = 1g
> scala> val a = new Array[Int](4 * 1000 * 1000)
> scala> val ds = spark.createDataset(a)
> scala> ds.rdd.zipWithIndex
> [Stage 0:>  (0 + 0) / 
> 3]Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: 
> Java heap space
> [Stage 0:>  (0 + 0) / 
> 3]
> // Spark is not responding
> {noformat}
> While not responding, Spark waiting for some Promise, but is never done.
> The promise depends some process in event loop thread, but the thread is dead 
> when Fatal exception is thrown.
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x7ffc9300b000 nid=0x1703 waiting on 
> condition [0x70216000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007ad978eb8> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
> at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:619)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
> at 
> org.apache.spark.rdd.ZippedWithIndexRDD.(ZippedWithIndexRDD.scala:50)
> at 
> org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1293)
> at 
> org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1293)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1292)
> {noformat}
> I don't know how to fix it properly, but it seems we need to add Fatal error 
> handling to EventLoop.run() in core/EventLoop.scala



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org