[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag ( although java 11 is coming though and people are encouraged to use the latest for security reasons and so we should do in our images). @squito all executors get a handler here: https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/core/src/main/scala/org/apache/spark/executor/Executor.scala#L62 and set it here: https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/core/src/main/scala/org/apache/spark/executor/Executor.scala#L88 CoarseGrainedExecutorBackend which is used by K8s creates an executor instance that sets that. @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM (the only solution there is add a handler afaik). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag ( although java 11 is coming though and people are encouraged to use the latest for security reasons and so we should do in our images). @squito all executors get a handler here: https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/core/src/main/scala/org/apache/spark/executor/Executor.scala#L62 and set it here: https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/core/src/main/scala/org/apache/spark/executor/Executor.scala#L88 CoarseGrainedExecutorBackend which is used by K8s creates an executor instance that sets that. @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM (the only solution there is a handler to the afaik). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag ( although java 11 is coming though and people are encouraged to use the latest for security reasons and so we should do in our images). @squito all executors get a handler here: https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/core/src/main/scala/org/apache/spark/executor/Executor.scala#L62 and set it here: https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/core/src/main/scala/org/apache/spark/executor/Executor.scala#L88 CoarseGrainedExecutorBackend which is used by K8s creates and executor instance that sets that. @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM (the only solution there is a handler to the afaik). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag (java 11 is coming though and people are encouraged to use the latest for security reasons and so we should do in our images). @squito all executors get a handler here: https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/core/src/main/scala/org/apache/spark/executor/Executor.scala#L62 and set it here: https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/core/src/main/scala/org/apache/spark/executor/Executor.scala#L88 CoarseGrainedExecutorBackend which is used by K8s creates and executor instance that sets that. @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM (the only solution there is a handler to the afaik). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag (java 11 is coming though and people are encouraged to use the latest for security reasons and so we should do in our images). @squito all executors get a handler here: https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/core/src/main/scala/org/apache/spark/executor/Executor.scala#L62 @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM (the only solution there is a handler to the afaik). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag (java 11 is coming though and people are encouraged to use the latest for security reasons and so we should do in our images). Also check if we can add a handler to the executors? @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM (the only solution there is a handler afaik). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag (java 11 is coming though and people are encouraged to use the latest for security reasons and so we should do in our images). Also check if we can add a handler to the executors? @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM (the only solution there is a handler to the afaik). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag (java 11 is coming though and people are encouraged to use the latest for security reasons and so we should do in our images). @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM (the only solution there is a handler afaik). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag (java 11 is coming though and people are encouraged to use the latest for security reasons and so we should do in our images). @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag (java 11 is coming though). @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag. @felixcheung @squito @mccheah @ifilonenko The problem is https://issues.apache.org/jira/browse/SPARK-27812, which can appear due to another type of exception besides OOM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-513038825 Ok if we all agree I can open a PR to add `XX:OnOutOfMemoryError` at the entrypoint with the old flag. Should I? The problem is https://issues.apache.org/jira/browse/SPARK-27812. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-510849683 @squito @srowen @dongjoon-hyun by having a handler (as mentioned in the ticket by HenryYu) without running shutdownhook we could solve also: https://issues.apache.org/jira/browse/SPARK-27812 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-510849683 @squito @srowen @dongjoon-hyun by having a handler (as mentioned in the ticket by HenryYu) without running shutdownhooks we could solve also: https://issues.apache.org/jira/browse/SPARK-27812 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-509017090 @squito `-XX:OnOutOfMemoryError="kill -9 %p" ` I think there more options now https://stackoverflow.com/questions/5792049/xxonoutofmemoryerror-kill-9-p-problem after java8u92 I use: ``` -XX:+ExitOnOutOfMemoryError -XX:+CrashOnOutOfMemoryError ``` Anyway if that is the consensus let's all agree, what about the executors? They have a shutdown handler... should they? I think this issue goes beyond K8s it affects all deployments. Also sometimes you may want to collect the crash report, so `CrashOnOutOfMemoryError` maybe a better option in some scenarios. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-509017090 @squito `-XX:OnOutOfMemoryError="kill -9 %p" ` I think there more options now https://stackoverflow.com/questions/5792049/xxonoutofmemoryerror-kill-9-p-problem after java8u92 I use: ``` -XX:+ExitOnOutOfMemoryError -XX:+CrashOnOutOfMemoryError ``` Anyway if that is the consensus let's all agree, what about the executors? They have a shutdown handler... should they? I think this issue goes beyond K8s it affects all deployments. Also sometimes you may want to collect the crash report, so `CrashOnOutOfMemoryError` maybe a better option. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-509017090 @squito `-XX:OnOutOfMemoryError="kill -9 %p" ` I think there more options now https://stackoverflow.com/questions/5792049/xxonoutofmemoryerror-kill-9-p-problem after java8u92 I use: ``` -XX:+ExitOnOutOfMemoryError -XX:+CrashOnOutOfMemoryError ``` Anyway if that is the consensus let's all agree, what about the executors? They have a shutdown handler... should they? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-508061578 @srowen @zsxwing should I fall back to the initial approach of clearing the shutdownhooks and exiting immediately? How should I proceed (the only thing I havent tried is to run the stop logic in a thread with high priority and a dedicated thread pool just in case that works and stop logic has the chance to run)? I don't see a lot of alternatives here (still waiting for @shipilev). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-508061578 @srowen @zsxwing should I fall back to the initial approach of clearing the shutdownhooks and exiting immediately? How should I proceed (the only thing I havent tried is to run the stop logic in a thread with high priority and a dedicated thread pool just in case that works and stop logic has the chance to run)? I don't see a lot of alternatives here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-504558356 @squito I tried `XX:+ExitOnOutOfMemoryError` (described in jira) but the thing is executors do have proper uncaught exception handling. Driver does not have. So initially I tried to add something like that so things shutdown gracefully if possible given the jvm state but without the shutdown hooks. If shutdown hooks are enabled we have this issue with joins (which is reproducible 100%). Also, If you run the pi example on minikube and Spark will get stuck, so the initial issue where there are no uncaught exception handler is also 100% reproducible. Regarding the interrupts I havent seen it work so far but who knows maybe I got unlucky, the thing is it is also depends on the jdk version because there are fixes happening as described in my last comment which changed the interrupts behavior. I called out the openjdk guys to see if their fix relates to this, no response yet. I will try to to do the join in another thread and make sure it happens with higher priority, so far there is no guarantee for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-504558356 @squito I tried `XX:+ExitOnOutOfMemoryError` (described in jira) but the thing is executors do have proper uncaught exception handling. Driver does not have. So initially I tried to add something like that so things shutdown gracefully if possible given the jvm state but without the shutdown hooks. If shutdown hooks are enabled we have this issue with joins (which is reproducible 100%). Also, If you run the pi example on minikube and Spark will get stuck, so the initial issue where there are no uncaught exception handler is also 100% reproducible. Regarding the interrupts I havent seen it work so far but who knows maybe I got unlucky, the thing is it is also depends on the jdk version because there are fixes happening as described in my last comment, that change interrupts behavior. I called the openjdk guys to see if their fix relates to this, no response yet. I will try to to do the join in another thread and make sure it happens with higher priority, so far there is no guarantee for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845. The fix in there ignores all interrupts until the hooks are completed (we call exit as well) so since the uncaught exception handler executes a shutdownhook from the event loop thread, that thread cannot be interrupted. @shipilev Hi ! I saw you reported that error and also discussed the related fix, any help would be great as I dont have the rights to comment to the ticket directly. Another question is do we have this pattern elsewhere like Master,Worker or Executor where there is already a handler? @srowen also thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845. The fix in there ignores all interrupts until the hooks are completed (we call exit as well) so since the uncaught exception handler executes a shutdownhook from the event loop thread, that thread cannot be interrupted. @shipilev Hi ! I saw you reported that error and also discussed the related fix, any help would be great as I dont have the rights to comment to the ticket directly. Another question is do we have this pattern elsewhere like Master,Worker or Executor where there is already a handler? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845. The fix in there ignores all interrupts until the hooks are completed (we call exit as well) so since the uncaught exception handler executes a shutdownhook from the event loop thread, that thread cannot be interrupted. @shipilev Hi ! I saw you reported that error and also discussed the related fix, any help would be great as I dont have the rights to comment to the ticket directly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845. The fix in there ignores all interrupts until the hooks are completed so since the uncaught exception handler executes a shutdownhook from the event loop thread, that thread cannot be interrupted. @shipilev Hi ! I saw you reported that error and also discussed the related fix, any help would be great as I dont have the rights to comment to the ticket directly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845. The fix in there ignores all interrupts until the hooks are completed so since the uncaught exception handler executes a shutdownhook from the event loop thread, that thread cannot be interrupted. @shipilev Hi ! I saw you reported that error and also discussed the related fix, any help would be great? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845. The fix in there ignores all interrupts until the hooks are completed so since the uncaught exception handler executes a shutdownhook from the event loop thread, that thread cannot be interrupted. @shipilev Hi @shipilev, I saw you reported that error and also discussed the related fix, any help would be great? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845. The fix in there ignores all interrupts until the hooks are completed so since the uncaught exception handler executes a shutdownhook from the event loop thread, that thread cannot be interrupted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845. The fix in there ignores all interrupts until the hooks are completed so since the uncaught exception handler executes a shutdownhook from the event loop thread it cannot be interrupted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845. The fix in there ignores all interrupts until the hooks are completed so since the uncaught exception handler executes a shutdownhook from the event loop thread it cannot be interrupted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting a bug like this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845, I could open a bug. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845, I could open a bug. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. Probably hitting this one: https://bugs.openjdk.java.net/browse/JDK-8154017 mentioned here:https://github.com/jacoco/jacoco/issues/394#issuecomment-208531845 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-501593451 @zsxwing we can only fix the related issue here, just avoid the deadlock so shutdown is finished. As for the generic case I dont see why this thread is not interrupted maybe because this is a special case when handling an Uncaught Exception via a handler coming from the thread it caused it. I will check what jvm does in this case but if there is anyone who knows more feel free to call him here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-500779047 @zsxwing @srowen any decision on how to approach this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-500779047 @zsxwing @srowen any decision on how to approach this? I can do the fix if we make a call. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-500779047 @zsxwing @srowen any decision on how to approach this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499837796 @zsxwing @srowen one solution that works is running the logic of stop in another thread: ``` def stop() { if (stopped.compareAndSet(false, true)) { @volatile var onStopCalled = false try { new Thread() { setDaemon(true) new Runnable { override def run(): Unit = { try { eventThread.join() // Call onStop after the event thread exits to make sure onReceive happens before onStop onStopCalled = true onStop() } catch { case ie: InterruptedException => Thread.currentThread().interrupt() if (!onStopCalled) { // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since // it's already called. onStop() } } } } }.start() } else { // Keep quiet to allow calling `stop` multiple times. } } ``` Assuming that is safe we let the shutdownhook proceed... any issues with that if we let that run in the background or there is a strict order for how things should run? One thing is that it may never run unless thread priority is high enough... ``` 19/06/07 10:31:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dag-scheduler-event-loop,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray.ensureSize(ResizableArray.scala:106) at scala.collection.mutable.ResizableArray.ensureSize$(ResizableArray.scala:96) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:49) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:85) at org.apache.spark.scheduler.TaskSetManager.addPendingTask(TaskSetManager.scala:264) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$2(TaskSetManager.scala:194) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1106/850290016.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$1(TaskSetManager.scala:193) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1105/1310826901.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:534) at org.apache.spark.scheduler.TaskSetManager.addPendingTasks(TaskSetManager.scala:192) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:252) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:210) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1233) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1084) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2126) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2118) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2107) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 19/06/07 10:31:21 INFO SparkContext: Invoking stop() from shutdown hook 19/06/07 10:31:21 INFO SparkUI: Stopped Spark web UI at http://spark-pi2-1559903354898-driver-svc.spark.svc:4040 19/06/07 10:31:21 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-pi2-1559903354898-driver-svc.spark.svc:7079 in memory (size: 1765.0 B, free: 110.0 MiB) 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/07 10:31:21 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/07 10:31:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/06/07 10:31:22 INFO MemoryStore: MemoryStore cleared 19/06/07 10:31:22 INFO BlockManager: BlockManager stopped 19/06/07 10:31:22 INFO BlockManagerMaster: BlockManagerMaster stopped 19/06/07 10:31:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499837796 @zsxwing @srowen one solution that works is running the logic of stop in another thread: ``` def stop() { if (stopped.compareAndSet(false, true)) { @volatile var onStopCalled = false try { new Thread() { setDaemon(true) new Runnable { override def run(): Unit = { try { eventThread.join() // Call onStop after the event thread exits to make sure onReceive happens before onStop onStopCalled = true onStop() } catch { case ie: InterruptedException => Thread.currentThread().interrupt() if (!onStopCalled) { // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since // it's already called. onStop() } } } } }.start() } else { // Keep quiet to allow calling `stop` multiple times. } } ``` Assuming that is safe we let the shutdownhook proceed... any issues with that if we let that run in the background or there is a strict order for how things should run? One thing is that it may never run... ``` 19/06/07 10:31:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dag-scheduler-event-loop,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray.ensureSize(ResizableArray.scala:106) at scala.collection.mutable.ResizableArray.ensureSize$(ResizableArray.scala:96) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:49) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:85) at org.apache.spark.scheduler.TaskSetManager.addPendingTask(TaskSetManager.scala:264) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$2(TaskSetManager.scala:194) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1106/850290016.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$1(TaskSetManager.scala:193) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1105/1310826901.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:534) at org.apache.spark.scheduler.TaskSetManager.addPendingTasks(TaskSetManager.scala:192) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:252) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:210) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1233) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1084) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2126) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2118) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2107) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 19/06/07 10:31:21 INFO SparkContext: Invoking stop() from shutdown hook 19/06/07 10:31:21 INFO SparkUI: Stopped Spark web UI at http://spark-pi2-1559903354898-driver-svc.spark.svc:4040 19/06/07 10:31:21 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-pi2-1559903354898-driver-svc.spark.svc:7079 in memory (size: 1765.0 B, free: 110.0 MiB) 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/07 10:31:21 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/07 10:31:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/06/07 10:31:22 INFO MemoryStore: MemoryStore cleared 19/06/07 10:31:22 INFO BlockManager: BlockManager stopped 19/06/07 10:31:22 INFO BlockManagerMaster: BlockManagerMaster stopped 19/06/07 10:31:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/06/07 10:31:22 INFO
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499837796 @zsxwing @srowen one solution that works is running the logic of stop in another thread: ``` def stop() { if (stopped.compareAndSet(false, true)) { @volatile var onStopCalled = false try { new Thread() { new Runnable { setDaemon(true) override def run(): Unit = { try { eventThread.join() // Call onStop after the event thread exits to make sure onReceive happens before onStop onStopCalled = true onStop() } catch { case ie: InterruptedException => Thread.currentThread().interrupt() if (!onStopCalled) { // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since // it's already called. onStop() } } } } }.start() } else { // Keep quiet to allow calling `stop` multiple times. } } ``` Assuming that is safe we let the shutdownhook proceed... any issues with that if we let that run in the background or there is a strict order for how things should run? One thing is that it may never run... ``` 19/06/07 10:31:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dag-scheduler-event-loop,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray.ensureSize(ResizableArray.scala:106) at scala.collection.mutable.ResizableArray.ensureSize$(ResizableArray.scala:96) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:49) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:85) at org.apache.spark.scheduler.TaskSetManager.addPendingTask(TaskSetManager.scala:264) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$2(TaskSetManager.scala:194) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1106/850290016.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$1(TaskSetManager.scala:193) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1105/1310826901.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:534) at org.apache.spark.scheduler.TaskSetManager.addPendingTasks(TaskSetManager.scala:192) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:252) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:210) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1233) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1084) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2126) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2118) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2107) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 19/06/07 10:31:21 INFO SparkContext: Invoking stop() from shutdown hook 19/06/07 10:31:21 INFO SparkUI: Stopped Spark web UI at http://spark-pi2-1559903354898-driver-svc.spark.svc:4040 19/06/07 10:31:21 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-pi2-1559903354898-driver-svc.spark.svc:7079 in memory (size: 1765.0 B, free: 110.0 MiB) 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/07 10:31:21 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/07 10:31:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/06/07 10:31:22 INFO MemoryStore: MemoryStore cleared 19/06/07 10:31:22 INFO BlockManager: BlockManager stopped 19/06/07 10:31:22 INFO BlockManagerMaster: BlockManagerMaster stopped 19/06/07 10:31:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/06/07 10:31:22 INFO
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499837796 @zsxwing @srowen one solution that works is running the logic of stop in another thread: ``` def stop() { if (stopped.compareAndSet(false, true)) { @volatile var onStopCalled = false try { new Thread() { new Runnable { setDaemon(true) override def run(): Unit = { try { eventThread.join() // Call onStop after the event thread exits to make sure onReceive happens before onStop onStopCalled = true onStop() } catch { case ie: InterruptedException => Thread.currentThread().interrupt() if (!onStopCalled) { // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since // it's already called. onStop() } } } } }.start() } else { // Keep quiet to allow calling `stop` multiple times. } } ``` Assuming that is safe we let the shutdownhook proceed... any issues with that if we let that run in the background or there is a strict order for how things should run? One thing is that i may never run... ``` 19/06/07 10:31:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dag-scheduler-event-loop,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray.ensureSize(ResizableArray.scala:106) at scala.collection.mutable.ResizableArray.ensureSize$(ResizableArray.scala:96) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:49) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:85) at org.apache.spark.scheduler.TaskSetManager.addPendingTask(TaskSetManager.scala:264) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$2(TaskSetManager.scala:194) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1106/850290016.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$1(TaskSetManager.scala:193) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1105/1310826901.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:534) at org.apache.spark.scheduler.TaskSetManager.addPendingTasks(TaskSetManager.scala:192) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:252) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:210) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1233) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1084) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2126) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2118) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2107) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 19/06/07 10:31:21 INFO SparkContext: Invoking stop() from shutdown hook 19/06/07 10:31:21 INFO SparkUI: Stopped Spark web UI at http://spark-pi2-1559903354898-driver-svc.spark.svc:4040 19/06/07 10:31:21 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-pi2-1559903354898-driver-svc.spark.svc:7079 in memory (size: 1765.0 B, free: 110.0 MiB) 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/07 10:31:21 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/07 10:31:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/06/07 10:31:22 INFO MemoryStore: MemoryStore cleared 19/06/07 10:31:22 INFO BlockManager: BlockManager stopped 19/06/07 10:31:22 INFO BlockManagerMaster: BlockManagerMaster stopped 19/06/07 10:31:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/06/07 10:31:22 INFO
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499837796 @zsxwing @srowen one solution that works is running the logic of stop in another thread: ``` def stop() { if (stopped.compareAndSet(false, true)) { @volatile var onStopCalled = false try { new Thread() { new Runnable { setDaemon(true) override def run(): Unit = { try { eventThread.join() // Call onStop after the event thread exits to make sure onReceive happens before onStop onStopCalled = true onStop() } catch { case ie: InterruptedException => Thread.currentThread().interrupt() if (!onStopCalled) { // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since // it's already called. onStop() } } } } }.start() } else { // Keep quiet to allow calling `stop` multiple times. } } ``` Assuming that is safe we let the shutdownhook proceed... any issues with that if we let that run in the background or there is a strict order for how things should run? ``` 19/06/07 10:31:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dag-scheduler-event-loop,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray.ensureSize(ResizableArray.scala:106) at scala.collection.mutable.ResizableArray.ensureSize$(ResizableArray.scala:96) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:49) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:85) at org.apache.spark.scheduler.TaskSetManager.addPendingTask(TaskSetManager.scala:264) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$2(TaskSetManager.scala:194) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1106/850290016.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$1(TaskSetManager.scala:193) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1105/1310826901.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:534) at org.apache.spark.scheduler.TaskSetManager.addPendingTasks(TaskSetManager.scala:192) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:252) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:210) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1233) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1084) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2126) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2118) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2107) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 19/06/07 10:31:21 INFO SparkContext: Invoking stop() from shutdown hook 19/06/07 10:31:21 INFO SparkUI: Stopped Spark web UI at http://spark-pi2-1559903354898-driver-svc.spark.svc:4040 19/06/07 10:31:21 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-pi2-1559903354898-driver-svc.spark.svc:7079 in memory (size: 1765.0 B, free: 110.0 MiB) 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/07 10:31:21 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/07 10:31:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/06/07 10:31:22 INFO MemoryStore: MemoryStore cleared 19/06/07 10:31:22 INFO BlockManager: BlockManager stopped 19/06/07 10:31:22 INFO BlockManagerMaster: BlockManagerMaster stopped 19/06/07 10:31:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/06/07 10:31:22 INFO SparkContext: Successfully stopped
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499837796 @zsxwing @srowen one solution that works is running the logic of stop in another thread: ``` def stop() { if (stopped.compareAndSet(false, true)) { @volatile var onStopCalled = false try { new Thread() { new Runnable { setDaemon(true) override def run(): Unit = { try { eventThread.join() // Call onStop after the event thread exits to make sure onReceive happens before onStop onStopCalled = true onStop() } catch { case ie: InterruptedException => Thread.currentThread().interrupt() if (!onStopCalled) { // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since // it's already called. onStop() } } } } }.start() } else { // Keep quiet to allow calling `stop` multiple times. } } ``` Assuming that is safe we let the shutdownhook proceed... any issues with that if we let that run in the background? ``` 19/06/07 10:31:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dag-scheduler-event-loop,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray.ensureSize(ResizableArray.scala:106) at scala.collection.mutable.ResizableArray.ensureSize$(ResizableArray.scala:96) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:49) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:85) at org.apache.spark.scheduler.TaskSetManager.addPendingTask(TaskSetManager.scala:264) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$2(TaskSetManager.scala:194) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1106/850290016.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$1(TaskSetManager.scala:193) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1105/1310826901.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:534) at org.apache.spark.scheduler.TaskSetManager.addPendingTasks(TaskSetManager.scala:192) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:252) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:210) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1233) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1084) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2126) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2118) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2107) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 19/06/07 10:31:21 INFO SparkContext: Invoking stop() from shutdown hook 19/06/07 10:31:21 INFO SparkUI: Stopped Spark web UI at http://spark-pi2-1559903354898-driver-svc.spark.svc:4040 19/06/07 10:31:21 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-pi2-1559903354898-driver-svc.spark.svc:7079 in memory (size: 1765.0 B, free: 110.0 MiB) 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/07 10:31:21 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/07 10:31:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/06/07 10:31:22 INFO MemoryStore: MemoryStore cleared 19/06/07 10:31:22 INFO BlockManager: BlockManager stopped 19/06/07 10:31:22 INFO BlockManagerMaster: BlockManagerMaster stopped 19/06/07 10:31:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/06/07 10:31:22 INFO SparkContext: Successfully stopped SparkContext 19/06/07 10:31:22 INFO ShutdownHookManager:
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499837796 @zsxwing @srowen one solution that works is running the logic of stop in another thread: ``` def stop() { @volatile var onStopCalled = false try { new Thread() { new Runnable { setDaemon(true) override def run(): Unit = { try { eventThread.join() // Call onStop after the event thread exits to make sure onReceive happens before onStop onStopCalled = true onStop() } catch { case ie: InterruptedException => Thread.currentThread().interrupt() if (!onStopCalled) { // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since // it's already called. onStop() } } } } }.start() } ``` Assuming that is safe we let the shutdownhook proceed... any issues with that if we let that run in the background? ``` 19/06/07 10:31:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dag-scheduler-event-loop,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray.ensureSize(ResizableArray.scala:106) at scala.collection.mutable.ResizableArray.ensureSize$(ResizableArray.scala:96) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:49) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:85) at org.apache.spark.scheduler.TaskSetManager.addPendingTask(TaskSetManager.scala:264) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$2(TaskSetManager.scala:194) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1106/850290016.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$1(TaskSetManager.scala:193) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1105/1310826901.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:534) at org.apache.spark.scheduler.TaskSetManager.addPendingTasks(TaskSetManager.scala:192) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:252) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:210) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1233) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1084) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2126) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2118) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2107) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 19/06/07 10:31:21 INFO SparkContext: Invoking stop() from shutdown hook 19/06/07 10:31:21 INFO SparkUI: Stopped Spark web UI at http://spark-pi2-1559903354898-driver-svc.spark.svc:4040 19/06/07 10:31:21 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-pi2-1559903354898-driver-svc.spark.svc:7079 in memory (size: 1765.0 B, free: 110.0 MiB) 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/07 10:31:21 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/07 10:31:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/06/07 10:31:22 INFO MemoryStore: MemoryStore cleared 19/06/07 10:31:22 INFO BlockManager: BlockManager stopped 19/06/07 10:31:22 INFO BlockManagerMaster: BlockManagerMaster stopped 19/06/07 10:31:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/06/07 10:31:22 INFO SparkContext: Successfully stopped SparkContext 19/06/07 10:31:22 INFO ShutdownHookManager: Shutdown hook called 19/06/07 10:31:22 INFO ShutdownHookManager: Deleting directory /tmp/spark-fef9ec63-c71e-4859-9910-12c51a336d75 19/06/07
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499837796 @zsxwing @srowen one solution that works is running the logic of stop in another thread: ``` def stop() { @volatile var onStopCalled = false try { new Thread() { new Runnable { setDaemon(true) override def run(): Unit = { try { eventThread.join() // Call onStop after the event thread exits to make sure onReceive happens before onStop onStopCalled = true onStop() } catch { case ie: InterruptedException => Thread.currentThread().interrupt() if (!onStopCalled) { // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since // it's already called. onStop() } } } } }.start() } ``` Assuming that is safe... any issues with that if we let that run in the background? ``` 19/06/07 10:31:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dag-scheduler-event-loop,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray.ensureSize(ResizableArray.scala:106) at scala.collection.mutable.ResizableArray.ensureSize$(ResizableArray.scala:96) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:49) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:85) at org.apache.spark.scheduler.TaskSetManager.addPendingTask(TaskSetManager.scala:264) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$2(TaskSetManager.scala:194) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1106/850290016.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$1(TaskSetManager.scala:193) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1105/1310826901.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:534) at org.apache.spark.scheduler.TaskSetManager.addPendingTasks(TaskSetManager.scala:192) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:252) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:210) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1233) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1084) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2126) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2118) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2107) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 19/06/07 10:31:21 INFO SparkContext: Invoking stop() from shutdown hook 19/06/07 10:31:21 INFO SparkUI: Stopped Spark web UI at http://spark-pi2-1559903354898-driver-svc.spark.svc:4040 19/06/07 10:31:21 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-pi2-1559903354898-driver-svc.spark.svc:7079 in memory (size: 1765.0 B, free: 110.0 MiB) 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/07 10:31:21 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/07 10:31:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/06/07 10:31:22 INFO MemoryStore: MemoryStore cleared 19/06/07 10:31:22 INFO BlockManager: BlockManager stopped 19/06/07 10:31:22 INFO BlockManagerMaster: BlockManagerMaster stopped 19/06/07 10:31:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/06/07 10:31:22 INFO SparkContext: Successfully stopped SparkContext 19/06/07 10:31:22 INFO ShutdownHookManager: Shutdown hook called 19/06/07 10:31:22 INFO ShutdownHookManager: Deleting directory /tmp/spark-fef9ec63-c71e-4859-9910-12c51a336d75 19/06/07 10:31:22 INFO ShutdownHookManager:
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499837796 @zsxwing @srowen one solution that works is running the logic of stop in another thread: ``` def stop() { @volatile var onStopCalled = false try { new Thread() { new Runnable { setDaemon(true) override def run(): Unit = { try { eventThread.join() // Call onStop after the event thread exits to make sure onReceive happens before onStop onStopCalled = true onStop() } catch { case ie: InterruptedException => Thread.currentThread().interrupt() if (!onStopCalled) { // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since // it's already called. onStop() } } } } }.start() } ``` Assuming that is safe... any issues with that if we let that run in the background? ``` 19/06/07 10:31:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[dag-scheduler-event-loop,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray.ensureSize(ResizableArray.scala:106) at scala.collection.mutable.ResizableArray.ensureSize$(ResizableArray.scala:96) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:49) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:85) at org.apache.spark.scheduler.TaskSetManager.addPendingTask(TaskSetManager.scala:264) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$2(TaskSetManager.scala:194) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1106/850290016.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.TaskSetManager.$anonfun$addPendingTasks$1(TaskSetManager.scala:193) at org.apache.spark.scheduler.TaskSetManager$$Lambda$1105/1310826901.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:534) at org.apache.spark.scheduler.TaskSetManager.addPendingTasks(TaskSetManager.scala:192) at org.apache.spark.scheduler.TaskSetManager.(TaskSetManager.scala:189) at org.apache.spark.scheduler.TaskSchedulerImpl.createTaskSetManager(TaskSchedulerImpl.scala:252) at org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:210) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1233) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1084) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2126) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2118) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2107) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 19/06/07 10:31:21 INFO SparkContext: Invoking stop() from shutdown hook 19/06/07 10:31:21 INFO SparkUI: Stopped Spark web UI at http://spark-pi2-1559903354898-driver-svc.spark.svc:4040 19/06/07 10:31:21 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-pi2-1559903354898-driver-svc.spark.svc:7079 in memory (size: 1765.0 B, free: 110.0 MiB) 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/07 10:31:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/07 10:31:21 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/07 10:31:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/06/07 10:31:22 INFO MemoryStore: MemoryStore cleared 19/06/07 10:31:22 INFO BlockManager: BlockManager stopped 19/06/07 10:31:22 INFO BlockManagerMaster: BlockManagerMaster stopped 19/06/07 10:31:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/06/07 10:31:22 INFO SparkContext: Successfully stopped SparkContext 19/06/07 10:31:22 INFO ShutdownHookManager: Shutdown hook called 19/06/07 10:31:22 INFO ShutdownHookManager: Deleting directory /tmp/spark-fef9ec63-c71e-4859-9910-12c51a336d75 19/06/07 10:31:22 INFO ShutdownHookManager:
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks (an array holds them) which creates an OOM error for the DAGScheduler eventLooop thread since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. Btw I can reproduce it on K8s in a consistent manner, it fails every time. On other thing is that in the code base there are other places where there is a join on a thread that will be stopped via the shutdown hook like contextCleaner and as I mentioned above shutdownHook does a lot of work eg. the SparkContext stop() method does stop a lot of stuff (not to mention there is one hook for Streaming as well). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks (an array holds them) which creates an OOM error for the DAGScheduler eventLooop thread since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. Btw I can reproduce it on K8s in a consistent manner, it fails every time. On other thing is that in the code base there are other places where there is a join on a thread that will be stopped via the shutdown hook like contextCleaner and as I mentioned above shutdownHook does a lot of work eg. the SparkContext stop() method does stop a lot of stuff (not to mention there is one for Streaming as well). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks (an array holds them) which creates an OOM error for the DAGScheduler eventLooop thread since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. Btw I can reproduce it on K8s in a consistent manner, it fails every time. On other thing is that in the code base there are other places where there is a join on a thread that will be stopped via the shutdown hook like contextCleaner and as I said above shutdownHook does a lot of work eg. the SparkContext stop() method does stop a lot of stuff (not to mention there is one for Streaming as well). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks (an array holds them) which creates an OOM error for the DAGScheduler eventLooop thread since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. Btw I can reproduce it on K8s in a consistent manner, it fails every time. On other thing is that in the code base there are many places where there is a join on a thread that will be stopped via the shutdown hook like contextCleaner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks (an array holds them) which creates an OOM error for the DAGScheduler eventLooop thread since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. Btw I can reproduce it on K8s in a consistent manner, it fails every time. On other thing is that in the code base there are other places where there is a join on a thread that will be stopped via the shutdown hook like contextCleaner and as I said above shutdownHook does a lot of work eg. the SparkContext stop() method does stop a lot of stuff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks (an array holds them) which creates an OOM error for the DAGScheduler eventLooop thread since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. Btw I can reproduce it on K8s in a consistent manner, it fails every time. On other thing is that in the code base there are other places where there is a join on a thread that will be stopped via the shutdown hook like contextCleaner and as I said above shutdownHook does a lot of work eg. the SparkContext stop() method does stop a lot of stuff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks (an array holds them) which creates an OOM error for the DAGScheduler eventLooop thread since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. Btw I can reproduce it on K8s in a consistent manner, it fails every time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks (an array holds them) which creates an OOM error for the DAGScheduler eventLooop thread since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. Btw I can reproduce it on K8s in a consistent manner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks (an array holds them) which creates an OOM error for the DAGScheduler eventLooop threads since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. Btw I can reproduce it on K8s in a consistent manner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks which create an OOM error for the DAGScheduler eventLooop threads since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. Btw I can reproduce it on K8s in a consistent manner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499791740 @zsxwing I describe how this happened in the jira ticket. I just run Spark on K8s SparkPi with 1M as the input parameter. This creates 1M tasks which create an OOM error for the DAGScheduler eventLooop threads since this is the one that will eventually try to submit the actual job, of course my jvm mem settings are enough to reproduce it, for the values pls have a look at the jira ticket. Of course this could happen in other cases where jvm is running out of memory and at some point this thread needs to allocate more memory. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499119354 @srowen Ideally yes we want the graceful shutdown without this deadlock if possible. My concern is can we actually be sure things will not lead to a deadlock elsewhere? Probably we need to check the threads allocated in general and involved in the shutdown. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499119354 @srowen Ideally yes we want the graceful shutdown without this deadlock if possible. My concern is can we actually be sore things will not lead to a deadlock elsewhere? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-498866549 Don't see how these unit tests relate to this PR, weird. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-498840730 @srowen @vanzin @squito @erikerlandson pls review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
skonto edited a comment on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-498840730 @srowen @vanzin @squito pls review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org