[jira] [Commented] (SPARK-28340) Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException

2019-09-04 Thread Colin Ma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922207#comment-16922207
 ] 

Colin Ma commented on SPARK-28340:
--

In general, the exception will be caught and processed in Executor.run, and 
there will be no stack trace about the "excepted interrupt exception", eg, 
TaskKilledException, InterruptedException etc.

ClosedByInterruptException should also be treated as the "excepted interrupt 
exception", and don't log the trace stack.  It's hard to figure out all of 
them, but we can fix them start with reported places first.

 

> Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught 
> exception while reverting partial writes to file: 
> java.nio.channels.ClosedByInterruptException"
> 
>
> Key: SPARK-28340
> URL: https://issues.apache.org/jira/browse/SPARK-28340
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Priority: Minor
>
> If a Spark task is killed while writing blocks to disk (due to intentional 
> job kills, automated killing of redundant speculative tasks, etc) then Spark 
> may log exceptions like
> {code:java}
> 19/07/10 21:31:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception 
> while reverting partial writes to file /
> java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:218)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1369)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:105)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748){code}
> If {{BypassMergeSortShuffleWriter}} is being used then a single cancelled 
> task can result in hundreds of these stacktraces being logged.
> Here are some StackOverflow questions asking about this:
>  * [https://stackoverflow.com/questions/40027870/spark-jobserver-job-crash]
>  * 
> [https://stackoverflow.com/questions/50646953/why-is-java-nio-channels-closedbyinterruptexceptio-called-when-caling-multiple]
>  * 
> [https://stackoverflow.com/questions/41867053/java-nio-channels-closedbyinterruptexception-in-spark]
>  * 
> [https://stackoverflow.com/questions/56845041/are-closedbyinterruptexception-exceptions-expected-when-spark-speculation-kills]
>  
> Can we prevent this exception from occurring? If not, can we treat this 
> "expected exception" in a special manner to avoid log spam? My concern is 
> that the presence of large numbers of spurious exceptions is confusing to 
> users when they are inspecting Spark logs to diagnose other issues.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28340) Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException

2019-08-29 Thread Saisai Shao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918386#comment-16918386
 ] 

Saisai Shao commented on SPARK-28340:
-

My simple concern is that there may be other places which will potentially 
throw this "ClosedByInterruptException" during task killing, it seems hard to 
figure out all of them.

> Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught 
> exception while reverting partial writes to file: 
> java.nio.channels.ClosedByInterruptException"
> 
>
> Key: SPARK-28340
> URL: https://issues.apache.org/jira/browse/SPARK-28340
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Priority: Minor
>
> If a Spark task is killed while writing blocks to disk (due to intentional 
> job kills, automated killing of redundant speculative tasks, etc) then Spark 
> may log exceptions like
> {code:java}
> 19/07/10 21:31:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception 
> while reverting partial writes to file /
> java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:218)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1369)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:105)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748){code}
> If {{BypassMergeSortShuffleWriter}} is being used then a single cancelled 
> task can result in hundreds of these stacktraces being logged.
> Here are some StackOverflow questions asking about this:
>  * [https://stackoverflow.com/questions/40027870/spark-jobserver-job-crash]
>  * 
> [https://stackoverflow.com/questions/50646953/why-is-java-nio-channels-closedbyinterruptexceptio-called-when-caling-multiple]
>  * 
> [https://stackoverflow.com/questions/41867053/java-nio-channels-closedbyinterruptexception-in-spark]
>  * 
> [https://stackoverflow.com/questions/56845041/are-closedbyinterruptexception-exceptions-expected-when-spark-speculation-kills]
>  
> Can we prevent this exception from occurring? If not, can we treat this 
> "expected exception" in a special manner to avoid log spam? My concern is 
> that the presence of large numbers of spurious exceptions is confusing to 
> users when they are inspecting Spark logs to diagnose other issues.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28340) Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException

2019-08-29 Thread Saisai Shao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918306#comment-16918306
 ] 

Saisai Shao commented on SPARK-28340:
-

We also saw a bunch of exceptions in our production environment. Looks like it 
is hard to prevent unless we change to not use `interrupt`, maybe we can just 
ignore logging such exceptions.

> Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught 
> exception while reverting partial writes to file: 
> java.nio.channels.ClosedByInterruptException"
> 
>
> Key: SPARK-28340
> URL: https://issues.apache.org/jira/browse/SPARK-28340
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Priority: Minor
>
> If a Spark task is killed while writing blocks to disk (due to intentional 
> job kills, automated killing of redundant speculative tasks, etc) then Spark 
> may log exceptions like
> {code:java}
> 19/07/10 21:31:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception 
> while reverting partial writes to file /
> java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:218)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1369)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:105)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748){code}
> If {{BypassMergeSortShuffleWriter}} is being used then a single cancelled 
> task can result in hundreds of these stacktraces being logged.
> Here are some StackOverflow questions asking about this:
>  * [https://stackoverflow.com/questions/40027870/spark-jobserver-job-crash]
>  * 
> [https://stackoverflow.com/questions/50646953/why-is-java-nio-channels-closedbyinterruptexceptio-called-when-caling-multiple]
>  * 
> [https://stackoverflow.com/questions/41867053/java-nio-channels-closedbyinterruptexception-in-spark]
>  * 
> [https://stackoverflow.com/questions/56845041/are-closedbyinterruptexception-exceptions-expected-when-spark-speculation-kills]
>  
> Can we prevent this exception from occurring? If not, can we treat this 
> "expected exception" in a special manner to avoid log spam? My concern is 
> that the presence of large numbers of spurious exceptions is confusing to 
> users when they are inspecting Spark logs to diagnose other issues.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28340) Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException

2019-07-15 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885550#comment-16885550
 ] 

Josh Rosen commented on SPARK-28340:


SPARK-23816 is a related issue about fetch failures caused by killing of 
speculative tasks.

> Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught 
> exception while reverting partial writes to file: 
> java.nio.channels.ClosedByInterruptException"
> 
>
> Key: SPARK-28340
> URL: https://issues.apache.org/jira/browse/SPARK-28340
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Priority: Minor
>
> If a Spark task is killed while writing blocks to disk (due to intentional 
> job kills, automated killing of redundant speculative tasks, etc) then Spark 
> may log exceptions like
> {code:java}
> 19/07/10 21:31:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception 
> while reverting partial writes to file /
> java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:218)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1369)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:105)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748){code}
> If {{BypassMergeSortShuffleWriter}} is being used then a single cancelled 
> task can result in hundreds of these stacktraces being logged.
> Here are some StackOverflow questions asking about this:
>  * [https://stackoverflow.com/questions/40027870/spark-jobserver-job-crash]
>  * 
> [https://stackoverflow.com/questions/50646953/why-is-java-nio-channels-closedbyinterruptexceptio-called-when-caling-multiple]
>  * 
> [https://stackoverflow.com/questions/41867053/java-nio-channels-closedbyinterruptexception-in-spark]
>  * 
> [https://stackoverflow.com/questions/56845041/are-closedbyinterruptexception-exceptions-expected-when-spark-speculation-kills]
>  
> Can we prevent this exception from occurring? If not, can we treat this 
> "expected exception" in a special manner to avoid log spam? My concern is 
> that the presence of large numbers of spurious exceptions is confusing to 
> users when they are inspecting Spark logs to diagnose other issues.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28340) Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException

2019-07-15 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885547#comment-16885547
 ] 

Josh Rosen commented on SPARK-28340:


Another variant of this issue, this time on the shuffle read side:

{code:java}
19/07/15 15:08:50 ERROR storage.ShuffleBlockFetcherIterator: Error occurred 
while fetching local blocks
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:293)
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:50)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:205)
at 
org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:382)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:336)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:371)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.(ShuffleBlockFetcherIterator.scala:156)
at 
org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:45)
at 
org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:165)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/07/15 15:08:50 INFO executor.Executor: Executor interrupted and killed task 
0.1 in stage 122.0 (TID 28968), reason: another attempt succeeded

 {code}


> Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught 
> exception while reverting partial writes to file: 
> java.nio.channels.ClosedByInterruptException"
> 
>
> Key: SPARK-28340
> URL: https://issues.apache.org/jira/browse/SPARK-28340
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Priority: Minor
>
> If a Spark task is killed while writing blocks to disk (due to intentional 
> job kills, automated killing of redundant speculative tasks, etc) then Spark 
> may log exceptions like
> {code:java}
> 19/07/10 21:31:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception 
> while reverting partial writes to file /
> java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:218)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1369)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
>   at 
>