[ https://issues.apache.org/jira/browse/AIRFLOW-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Arzt updated AIRFLOW-5456: ------------------------------------ Summary: Mark spark submit operator task as 'failed' when kubernetes pod never ran (was: Mark spark submit operator task as 'failed' when kubernetes pod phase 'Running' did not occur on spark-submit logs) > Mark spark submit operator task as 'failed' when kubernetes pod never ran > ------------------------------------------------------------------------- > > Key: AIRFLOW-5456 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5456 > Project: Apache Airflow > Issue Type: Bug > Components: operators > Affects Versions: 1.10.0, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.10.5 > Reporter: Sebastian Arzt > Priority: Minor > Labels: failure-handling, operator, spark, spark-submit > > Currently spark submit operator task will not fail if the corresponding pod > never entered phase 'Running'. > Background: we observed spark submit operator tasks marked as "success" > although the spark job was never running on kubernetes. > Logs (truncated): > {code:java} > [2019-09-11 09:21:02,732] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,732] {spark_submit_hook.py:427} INFO - 2019-09-11 09:21:02 INFO > LoggingPodStatusWatcherImpl:54 - State changed, new state: > [2019-09-11 09:21:02,732] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,732] {spark_submit_hook.py:410} INFO - Identified spark driver pod: > pod-name > [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,732] {spark_submit_hook.py:427} INFO - pod name: pod-name > [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,733] {spark_submit_hook.py:427} INFO - namespace: default > [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,733] {spark_submit_hook.py:427} INFO - pod uid: > 797f3157-d475-11e9-9758-1209ef52ae5e > [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,733] {spark_submit_hook.py:427} INFO - creation time: > 2019-09-11T09:21:02Z > [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,733] {spark_submit_hook.py:427} INFO - service account name: account > [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,733] {spark_submit_hook.py:427} INFO - volumes: vol1, vol2 > [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,734] {spark_submit_hook.py:427} INFO - node name: node name > [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,734] {spark_submit_hook.py:427} INFO - start time: > 2019-09-11T09:21:02Z > [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,734] {spark_submit_hook.py:427} INFO - container images: > some-image:tag > [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 > 09:21:02,734] {spark_submit_hook.py:427} INFO - phase: Pending > [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 > 09:27:56,813] {spark_submit_hook.py:427} INFO - 2019-09-11 09:27:56 INFO > LoggingPodStatusWatcherImpl:54 - Container final statuses: > [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 > 09:27:56,813] {spark_submit_hook.py:427} INFO - Container name: > spark-kubernetes-driver > [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 > 09:27:56,813] {spark_submit_hook.py:427} INFO - Container state: Terminated > {code} > Solution: Do not mark job as 'success' if phase 'Running' was never observed > in the spark-submit logs. -- This message was sent by Atlassian Jira (v8.3.2#803003)