[ 
https://issues.apache.org/jira/browse/AIRFLOW-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Arzt updated AIRFLOW-5456:
------------------------------------
    Summary: Mark spark submit operator task as 'failed' when kubernetes pod 
never ran  (was: Mark spark submit operator task as 'failed' when kubernetes 
pod phase 'Running' did not occur on spark-submit logs)

> Mark spark submit operator task as 'failed' when kubernetes pod never ran
> -------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5456
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5456
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: operators
>    Affects Versions: 1.10.0, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.10.5
>            Reporter: Sebastian Arzt
>            Priority: Minor
>              Labels: failure-handling, operator, spark, spark-submit
>
> Currently spark submit operator task will not fail if the corresponding pod 
> never entered phase 'Running'.
> Background: we observed spark submit operator tasks marked as "success" 
> although the spark job was never running on kubernetes.
> Logs (truncated):
> {code:java}
> [2019-09-11 09:21:02,732] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,732] {spark_submit_hook.py:427} INFO - 2019-09-11 09:21:02 INFO  
> LoggingPodStatusWatcherImpl:54 - State changed, new state:
> [2019-09-11 09:21:02,732] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,732] {spark_submit_hook.py:410} INFO - Identified spark driver pod: 
> pod-name
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,732] {spark_submit_hook.py:427} INFO - pod name: pod-name
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,733] {spark_submit_hook.py:427} INFO - namespace: default
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,733] {spark_submit_hook.py:427} INFO - pod uid: 
> 797f3157-d475-11e9-9758-1209ef52ae5e
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,733] {spark_submit_hook.py:427} INFO - creation time: 
> 2019-09-11T09:21:02Z
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,733] {spark_submit_hook.py:427} INFO - service account name: account
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,733] {spark_submit_hook.py:427} INFO - volumes: vol1, vol2
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,734] {spark_submit_hook.py:427} INFO - node name: node name
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,734] {spark_submit_hook.py:427} INFO - start time: 
> 2019-09-11T09:21:02Z
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,734] {spark_submit_hook.py:427} INFO - container images: 
> some-image:tag
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:21:02,734] {spark_submit_hook.py:427} INFO - phase: Pending
> [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:27:56,813] {spark_submit_hook.py:427} INFO - 2019-09-11 09:27:56 INFO  
> LoggingPodStatusWatcherImpl:54 - Container final statuses:
> [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:27:56,813] {spark_submit_hook.py:427} INFO - Container name: 
> spark-kubernetes-driver
> [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 
> 09:27:56,813] {spark_submit_hook.py:427} INFO - Container state: Terminated
> {code}
> Solution: Do not mark job as 'success' if phase 'Running' was never observed 
> in the spark-submit logs.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to