[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282764#comment-16282764 ] Marcelo Vanzin commented on SPARK-7736: --- Make sure all you guys are running apps in cluster mode if you want to see the proper status. I just ran a failing pyspark app in cluster mode to double check, and all seems fine. > Exception not failing Python applications (in yarn cluster mode) > > > Key: SPARK-7736 > URL: https://issues.apache.org/jira/browse/SPARK-7736 > Project: Spark > Issue Type: Bug > Components: YARN > Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 >Reporter: Shay Rojansky >Assignee: Marcelo Vanzin > Fix For: 1.5.1, 1.6.0 > > > It seems that exceptions thrown in Python spark apps after the SparkContext > is instantiated don't cause the application to fail, at least in Yarn: the > application is marked as SUCCEEDED. > Note that any exception right before the SparkContext correctly places the > application in FAILED state. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275066#comment-16275066 ] Dmitriy Reshetnikov commented on SPARK-7736: Spark 2.2 still facing that issue. In my case Azkaban executes Spark Job and finalStatus of this job in Resource Manager is SUCCESS in anycase. > Exception not failing Python applications (in yarn cluster mode) > > > Key: SPARK-7736 > URL: https://issues.apache.org/jira/browse/SPARK-7736 > Project: Spark > Issue Type: Bug > Components: YARN > Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 >Reporter: Shay Rojansky >Assignee: Marcelo Vanzin > Fix For: 1.5.1, 1.6.0 > > > It seems that exceptions thrown in Python spark apps after the SparkContext > is instantiated don't cause the application to fail, at least in Yarn: the > application is marked as SUCCEEDED. > Note that any exception right before the SparkContext correctly places the > application in FAILED state. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927240#comment-15927240 ] Yash Sharma commented on SPARK-7736: This does not seem Fixed. The application still completes with SUCCESS status even when an exception is thrown from the application. Spark version 2.0.2. > Exception not failing Python applications (in yarn cluster mode) > > > Key: SPARK-7736 > URL: https://issues.apache.org/jira/browse/SPARK-7736 > Project: Spark > Issue Type: Bug > Components: YARN > Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 >Reporter: Shay Rojansky >Assignee: Marcelo Vanzin > Fix For: 1.5.1, 1.6.0 > > > It seems that exceptions thrown in Python spark apps after the SparkContext > is instantiated don't cause the application to fail, at least in Yarn: the > application is marked as SUCCEEDED. > Note that any exception right before the SparkContext correctly places the > application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952263#comment-14952263 ] Shay Rojansky commented on SPARK-7736: -- Have just tested this with Spark 1.5.1 on Yarn 2.7.1 and the problem is still there - an exception thrown after the SparkContext has been created terminates the application but Yarn reports it as succeeded. > Exception not failing Python applications (in yarn cluster mode) > > > Key: SPARK-7736 > URL: https://issues.apache.org/jira/browse/SPARK-7736 > Project: Spark > Issue Type: Bug > Components: YARN > Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 >Reporter: Shay Rojansky >Assignee: Marcelo Vanzin > Fix For: 1.5.1, 1.6.0 > > > It seems that exceptions thrown in Python spark apps after the SparkContext > is instantiated don't cause the application to fail, at least in Yarn: the > application is marked as SUCCEEDED. > Note that any exception right before the SparkContext correctly places the > application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910186#comment-14910186 ] Zsolt Tóth commented on SPARK-7736: --- Created SPARK-10851. > Exception not failing Python applications (in yarn cluster mode) > > > Key: SPARK-7736 > URL: https://issues.apache.org/jira/browse/SPARK-7736 > Project: Spark > Issue Type: Bug > Components: YARN > Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 >Reporter: Shay Rojansky >Assignee: Marcelo Vanzin > Fix For: 1.5.1, 1.6.0 > > > It seems that exceptions thrown in Python spark apps after the SparkContext > is instantiated don't cause the application to fail, at least in Yarn: the > application is marked as SUCCEEDED. > Note that any exception right before the SparkContext correctly places the > application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908389#comment-14908389 ] Shivaram Venkataraman commented on SPARK-7736: -- [~ztoth] Could you open a new JIRA for the SparkR problem ? > Exception not failing Python applications (in yarn cluster mode) > > > Key: SPARK-7736 > URL: https://issues.apache.org/jira/browse/SPARK-7736 > Project: Spark > Issue Type: Bug > Components: YARN > Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 >Reporter: Shay Rojansky >Assignee: Marcelo Vanzin > Fix For: 1.5.1, 1.6.0 > > > It seems that exceptions thrown in Python spark apps after the SparkContext > is instantiated don't cause the application to fail, at least in Yarn: the > application is marked as SUCCEEDED. > Note that any exception right before the SparkContext correctly places the > application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908001#comment-14908001 ] Zsolt Tóth commented on SPARK-7736: --- As I see, this is also a problem for SparkR applications in yarn-cluster mode. Is there an open JIRA for that? > Exception not failing Python applications (in yarn cluster mode) > > > Key: SPARK-7736 > URL: https://issues.apache.org/jira/browse/SPARK-7736 > Project: Spark > Issue Type: Bug > Components: YARN > Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 >Reporter: Shay Rojansky >Assignee: Marcelo Vanzin > Fix For: 1.5.1, 1.6.0 > > > It seems that exceptions thrown in Python spark apps after the SparkContext > is instantiated don't cause the application to fail, at least in Yarn: the > application is marked as SUCCEEDED. > Note that any exception right before the SparkContext correctly places the > application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700451#comment-14700451 ] Apache Spark commented on SPARK-7736: - User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/8258 Exception not failing Python applications (in yarn cluster mode) Key: SPARK-7736 URL: https://issues.apache.org/jira/browse/SPARK-7736 Project: Spark Issue Type: Bug Components: YARN Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 Reporter: Shay Rojansky Assignee: Marcelo Vanzin Fix For: 1.6.0 It seems that exceptions thrown in Python spark apps after the SparkContext is instantiated don't cause the application to fail, at least in Yarn: the application is marked as SUCCEEDED. Note that any exception right before the SparkContext correctly places the application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652429#comment-14652429 ] Apache Spark commented on SPARK-7736: - User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/7751 Exception not failing Python applications (in yarn cluster mode) Key: SPARK-7736 URL: https://issues.apache.org/jira/browse/SPARK-7736 Project: Spark Issue Type: Bug Components: YARN Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 Reporter: Shay Rojansky It seems that exceptions thrown in Python spark apps after the SparkContext is instantiated don't cause the application to fail, at least in Yarn: the application is marked as SUCCEEDED. Note that any exception right before the SparkContext correctly places the application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621832#comment-14621832 ] Shay Rojansky commented on SPARK-7736: -- Neelesh, not sure I understood what you're saying exactly... I agree with Esben that at the end of the day, if a Spark application fails (by throwing an exception), and does so on all Yarn application attempts, that the Yarn status of that application definitely should be FAILED... Exception not failing Python applications (in yarn cluster mode) Key: SPARK-7736 URL: https://issues.apache.org/jira/browse/SPARK-7736 Project: Spark Issue Type: Bug Components: YARN Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 Reporter: Shay Rojansky It seems that exceptions thrown in Python spark apps after the SparkContext is instantiated don't cause the application to fail, at least in Yarn: the application is marked as SUCCEEDED. Note that any exception right before the SparkContext correctly places the application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620021#comment-14620021 ] Esben S. Nielsen commented on SPARK-7736: - Thanks for the comment. I don't understand how it apply here however as both listed pyspark programs (In my understanding) should result in step 2) of your scenario: p1) Unhandled execption raised before SparkContext initialization: --- from pyspark import SparkContext raise Exception('Fail') sc = SparkContext(appName=raise_seen_by_yarn) --- This results in an AM retry (total 2 AM tries as per YARN default) and subsequent marking of the application YARN status as FAILED. This is what I expect for a designed to fail AM. p2) Unhandled execption raised after SparkContext initialization: --- from pyspark import SparkContext sc = SparkContext(appName=raise_not_seen_by_yarn): raise Exception('Fail') --- This results in the the application being marked as SUCCEEDED (total of 1 AM try) which is not what I expect for a designed to fail AM. I've tried to look in the spark documentation if there should be taken special actions to signal failure to YARN but I haven't found anything? And looking at src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala : L118 where all sys.exit calls are considered successful termination regardless of exit code I can't see a way to signal failure to YARN after SparkContext initialization? Both p1 and p2 return with non-zero exit code when run with spark-submit --master yarn-client which is what I would expect. Exception not failing Python applications (in yarn cluster mode) Key: SPARK-7736 URL: https://issues.apache.org/jira/browse/SPARK-7736 Project: Spark Issue Type: Bug Components: YARN Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 Reporter: Shay Rojansky It seems that exceptions thrown in Python spark apps after the SparkContext is instantiated don't cause the application to fail, at least in Yarn: the application is marked as SUCCEEDED. Note that any exception right before the SparkContext correctly places the application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618885#comment-14618885 ] Neelesh Srinivas Salian commented on SPARK-7736: My 2 cents: To have a YARN failed, ApplicationMaster running the driver needs to fail. Scenario: 1) It fails once, YARN retries and succeeds if the exception has been handled correctly. This results in a Successful YARN job (assuming the child tasks (executors) succeeded). 2) The retries fail and the YARN job fails completely. You need the Spark Application to coz a failure in YARN to mark it as a Failure. Moreover, the ApplicationMaster.java code from the: /hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java in the Hadoop project should help. Reference: [1] http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html So, I would say this is expected behavior. Hope that helps. Please add/correct me if needed. Exception not failing Python applications (in yarn cluster mode) Key: SPARK-7736 URL: https://issues.apache.org/jira/browse/SPARK-7736 Project: Spark Issue Type: Bug Components: YARN Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 Reporter: Shay Rojansky It seems that exceptions thrown in Python spark apps after the SparkContext is instantiated don't cause the application to fail, at least in Yarn: the application is marked as SUCCEEDED. Note that any exception right before the SparkContext correctly places the application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618110#comment-14618110 ] Esben S. Nielsen commented on SPARK-7736: - Platform: spark 1.3.0, CDH 5.4.1 To reproduce with pyspark: --- from pyspark import SparkContext with SparkContext(appName=raise_uncaught) as sc: raise Exception('Fail') --- $ spark-submit --master yarn-cluster /path/to/my/pythonscript.py This ends up with the following YARN status: State: FINISHED FinalStatus:SUCCEEDED Diagnostics:Shutdown hook called before final status was reported. If the exception is thrown before the SparkContext is initialized YARN status displays as expected: --- from pyspark import SparkContext raise Exception('Fail') with SparkContext(appName=raise_caught) as sc: pass --- This ends up with the following YARN status: State: FAILED FinalStatus:FAILED Diagnostics: trace It seems (from the Diagnostics message) that https://github.com/apache/spark/blob/19834fa9184f0365a160bcb54bcd33eaa87c70dc/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala : L118 is hit when exceptions are raised after initializing SparkContext. This also means applications are not retried when failures happen after SparkContext initialization. Exception not failing Python applications (in yarn cluster mode) Key: SPARK-7736 URL: https://issues.apache.org/jira/browse/SPARK-7736 Project: Spark Issue Type: Bug Components: YARN Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 Reporter: Shay Rojansky It seems that exceptions thrown in Python spark apps after the SparkContext is instantiated don't cause the application to fail, at least in Yarn: the application is marked as SUCCEEDED. Note that any exception right before the SparkContext correctly places the application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601923#comment-14601923 ] Shay Rojansky commented on SPARK-7736: -- The problem is simply with the YARN status for the application. If a Spark application throws an exception after having instantiated the SparkContext, the application obviously terminates but YARN lists the job as SUCCEEDED. This makes it hard for users to see what happened to their jobs in the YARN UI. Let me know if this is still unclear. Exception not failing Python applications (in yarn cluster mode) Key: SPARK-7736 URL: https://issues.apache.org/jira/browse/SPARK-7736 Project: Spark Issue Type: Bug Components: YARN Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 Reporter: Shay Rojansky It seems that exceptions thrown in Python spark apps after the SparkContext is instantiated don't cause the application to fail, at least in Yarn: the application is marked as SUCCEEDED. Note that any exception right before the SparkContext correctly places the application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7736) Exception not failing Python applications (in yarn cluster mode)
[ https://issues.apache.org/jira/browse/SPARK-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601914#comment-14601914 ] Neelesh Srinivas Salian commented on SPARK-7736: Could you add more context to the issue? What is the return value / output expected on the applications? Exception not failing Python applications (in yarn cluster mode) Key: SPARK-7736 URL: https://issues.apache.org/jira/browse/SPARK-7736 Project: Spark Issue Type: Bug Components: YARN Environment: Spark 1.3.1, Yarn 2.7.0, Ubuntu 14.04 Reporter: Shay Rojansky It seems that exceptions thrown in Python spark apps after the SparkContext is instantiated don't cause the application to fail, at least in Yarn: the application is marked as SUCCEEDED. Note that any exception right before the SparkContext correctly places the application in FAILED state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org