ywan2017 opened a new issue #8963:
URL: https://github.com/apache/airflow/issues/8963


   # SparkSubmitOperator could not get Exit Code after log stream interrupted 
by k8s old resource version exception
   
   ## description
   
   I use airflow to schedule spark jobs on k8s using SparkSubmitOperator.
   when spark jobs run on k8s for long time (>30mins), airflow often  mark job 
failed status when the job is still running even the job finish successfully.
   
   ### when it happen ,this exception often appears at the same time but not 
always
   ```java
   20/05/20 10:49:58 INFO TaskSetManager: Finished task 7.0 in stage 91.0 (TID 
13313) in 34319 ms on 172.30.238.243 (executor 5) (10/20)
   20/05/20 10:49:59 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
has been closed (this is expected if the application is shutting down.)
   io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 140189600 (140238574)
        at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
        at 
okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:307)
        at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:222)
        at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
        at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
        at 
okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   20/05/20 10:50:19 INFO TaskSetManager: Finished task 10.0 in stage 91.0 (TID 
13316) in 27478 ms on 172.30.244.131 (executor 1) (11/20)
   20/05/20 10:50:19 INFO TaskSetManager: Finished task 12.0 in stage 91.0 (TID 
13318) in 27052 ms on 172.30.244.131 (executor 1) (12/20)
   
   ```
   ## env
   
       uat env
       airflow version:1.10.10
       python:3.6.8
       k8s server 1.5
       k8s client ?
       spark 2.4.4-h2.7
   
   ## submit jobs by manual,get k8s logs directly
   
   ### Scenario 1: job succeed, log return from k8s
   
   ```log
   2020-05-11 11:58:21 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   2020-05-11 11:58:27 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: empreporting-1589212705263-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, 
spark-role -> driver
         pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
         creation time: 2020-05-11T15:58:27Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: N/A
         start time: N/A
         container images: N/A
         phase: Pending
         status: []
   
   2020-05-11 11:58:27 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: empreporting-1589212705263-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, 
spark-role -> driver
         pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
         creation time: 2020-05-11T15:58:27Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.157
         start time: N/A
         container images: N/A
         phase: Pending
         status: []
   
   2020-05-11 11:58:28 INFO  Client:54 - Waiting for application emp_reporting 
to finish...
   2020-05-11 11:58:48 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: empreporting-1589212705263-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, 
spark-role -> driver
         pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
         creation time: 2020-05-11T15:58:27Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.157
         start time: 2020-05-11T15:58:48Z
         container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         phase: Pending
         status: [ContainerStatus(containerID=null, 
image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
restartCount=0, state=ContainerState(running=null, terminated=null, 
waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, 
additionalProperties={}), additionalProperties={}), additionalProperties={})]
   
   2020-05-11 11:59:28 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: empreporting-1589212705263-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, 
spark-role -> driver
         pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
         creation time: 2020-05-11T15:58:27Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.157
         start time: 2020-05-11T15:58:48Z
         container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         phase: Running
         status: 
[ContainerStatus(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b,
 image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
restartCount=0, 
state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-11T15:59:26Z,
 additionalProperties={}), additionalProperties={}), terminated=null, 
waiting=null, additionalProperties={}), additionalProperties={})]
   
   2020-05-11 12:01:15 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: empreporting-1589212705263-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, 
spark-role -> driver
         pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
         creation time: 2020-05-11T15:58:27Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.157
         start time: 2020-05-11T15:58:48Z
         container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         phase: Succeeded
         status: 
[ContainerStatus(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b,
 image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
restartCount=0, state=ContainerState(running=null, 
terminated=ContainerStateTerminated(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b,
 exitCode=0, finishedAt=Time(time=2020-05-11T16:01:08Z, 
additionalProperties={}), message=null, reason=Completed, signal=null, 
startedAt=Time(time=2020-05-11T15:59:26Z, additionalProperties={}), 
additionalProperties={}), waiting=null, additionalProperties={}), 
additionalProperties={})]
   2020-05-11 12:01:15 INFO  LoggingPodStatusWatcherImpl:54 - Container final 
statuses:
   
   
         Container name: spark-kubernetes-driver
         Container image: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         Container state: Terminated
         Exit code: 0
   2020-05-11 12:01:15 INFO  Client:54 - Application emp_reporting finished.
   2020-05-11 12:01:15 INFO  ShutdownHookManager:54 - Shutdown hook called
   2020-05-11 12:01:15 INFO  ShutdownHookManager:54 - Deleting directory 
/tmp/spark-f3b0955d-fe7c-4858-a1c6-f70597d42104
   
   ```
   
   ### Scenario 2:job failed, log return from k8s
   
   ```log
   2020-05-11 11:54:10 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   2020-05-11 11:54:16 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: empreporting-1589212454526-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, 
spark-role -> driver
         pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
         creation time: 2020-05-11T15:54:16Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: N/A
         start time: N/A
         container images: N/A
         phase: Pending
         status: []
   
   2020-05-11 11:54:16 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: empreporting-1589212454526-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, 
spark-role -> driver
         pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
         creation time: 2020-05-11T15:54:16Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.186
         start time: N/A
         container images: N/A
         phase: Pending
         status: []
   
   2020-05-11 11:54:17 INFO  Client:54 - Waiting for application emp_reporting 
to finish...
   2020-05-11 11:54:17 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: empreporting-1589212454526-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, 
spark-role -> driver
         pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
         creation time: 2020-05-11T15:54:16Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.186
         start time: 2020-05-11T15:54:17Z
         container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         phase: Pending
         status: [ContainerStatus(containerID=null, 
image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
restartCount=0, state=ContainerState(running=null, terminated=null, 
waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, 
additionalProperties={}), additionalProperties={}), additionalProperties={})]
   
   2020-05-11 11:55:04 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: empreporting-1589212454526-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, 
spark-role -> driver
         pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
         creation time: 2020-05-11T15:54:16Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.186
         start time: 2020-05-11T15:54:17Z
         container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         phase: Running
         status: 
[ContainerStatus(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9,
 image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
restartCount=0, 
state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-11T15:55:03Z,
 additionalProperties={}), additionalProperties={}), terminated=null, 
waiting=null, additionalProperties={}), additionalProperties={})]
   
   2020-05-11 11:55:29 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: empreporting-1589212454526-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, 
spark-role -> driver
         pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
         creation time: 2020-05-11T15:54:16Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.186
         start time: 2020-05-11T15:54:17Z
         container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         phase: Failed
         status: 
[ContainerStatus(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9,
 image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
restartCount=0, state=ContainerState(running=null, 
terminated=ContainerStateTerminated(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9,
 exitCode=1, finishedAt=Time(time=2020-05-11T15:55:27Z, 
additionalProperties={}), message=null, reason=Error, signal=null, 
startedAt=Time(time=2020-05-11T15:55:03Z, additionalProperties={}), 
additionalProperties={}), waiting=null, additionalProperties={}), 
additionalProperties={})]
   2020-05-11 11:55:29 INFO  LoggingPodStatusWatcherImpl:54 - Container final 
statuses:
   
   
         Container name: spark-kubernetes-driver
         Container image: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         Container state: Terminated
         Exit code: 1
   2020-05-11 11:55:29 INFO  Client:54 - Application emp_reporting finished.
   2020-05-11 11:55:29 INFO  ShutdownHookManager:54 - Shutdown hook called
   2020-05-11 11:55:29 INFO  ShutdownHookManager:54 - Deleting directory 
/tmp/spark-8ba3c338-3eae-4bc7-9c25-3146642b9207
   ```
   
   ## airflow logs examples
   
   ### Scenario 1: job success, log return from airflow ,get right status:
   
   ```log
   [dawany@dawany-inf env_qa]$ kubectl describe pod 
rmtextract-1589270449587-driver -n batch-pipeline-qa
   Name:         rmtextract-1589270449587-driver
   Namespace:    batch-pipeline-qa
   Priority:     0
   Node:         10.93.122.236/10.93.122.236
   Start Time:   Tue, 12 May 2020 04:00:51 -0400
   Labels:       spark-app-selector=spark-2f7caeac92c94710ab54cf40b5ff29e5
                 spark-role=driver
   Annotations:  kubernetes.io/psp: db2oltp-dev-psp
   Status:       Succeeded
   IP:           172.30.244.25
   IPs:          <none>
   Containers:
     spark-kubernetes-driver:
       Container ID:  
containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6
       Image:         
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
       Image ID:      
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
       Ports:         7078/TCP, 7079/TCP, 4040/TCP
       Host Ports:    0/TCP, 0/TCP, 0/TCP
       Args:
         driver
         --properties-file
         /opt/spark/conf/spark.properties
         --class
         com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob
         spark-internal
         0
         3
         8
       State:          Terminated
         Reason:       Completed
         Exit Code:    0
         Started:      Tue, 12 May 2020 04:01:07 -0400
         Finished:     Tue, 12 May 2020 04:52:03 -0400
       Ready:          False
       Restart Count:  0
       Limits:
         memory:  11Gi
       Requests:
         cpu:     1
         memory:  11Gi
       Environment:
         SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
         SPARK_LOCAL_DIRS:           
/var/data/spark-02fa9b25-af0b-430d-ab75-aee12dd217a7
         SPARK_CONF_DIR:             /opt/spark/conf
       Mounts:
         /opt/spark/conf from spark-conf-volume (rw)
         /opt/spark/secrets from dswsecret-volume (rw)
         /var/data/spark-02fa9b25-af0b-430d-ab75-aee12dd217a7 from 
spark-local-dir-1 (rw)
         /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz 
(ro)
   Conditions:
     Type              Status
     Initialized       True
     Ready             False
     ContainersReady   False
     PodScheduled      True
   Volumes:
     spark-local-dir-1:
       Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
       Medium:
       SizeLimit:  <unset>
     dswsecret-volume:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  dswsecret
       Optional:    false
     spark-conf-volume:
       Type:      ConfigMap (a volume populated by a ConfigMap)
       Name:      rmtextract-1589270449587-driver-conf-map
       Optional:  false
     spark-token-fpqpz:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  spark-token-fpqpz
       Optional:    false
   QoS Class:       Burstable
   Node-Selectors:  <none>
   Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                    node.kubernetes.io/unreachable:NoExecute for 600s
   Events:          <none>
   
   ```
   
   ```log
   *** Reading local file: 
/home/airflow/airflow/logs/Transformation_Rmt_Master_Load_Sequence_Adhoc_Main/Rmt_Extract/2020-05-12T04:00:00+00:00/1.log
   [2020-05-12 03:00:43,970] {taskinstance.py:669} INFO - Dependencies all met 
for <TaskInstance: 
Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 
2020-05-12T04:00:00+00:00 [queued]>
   [2020-05-12 03:00:44,074] {taskinstance.py:669} INFO - Dependencies all met 
for <TaskInstance: 
Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 
2020-05-12T04:00:00+00:00 [queued]>
   [2020-05-12 03:00:44,074] {taskinstance.py:879} INFO -
   
--------------------------------------------------------------------------------
   [2020-05-12 03:00:44,074] {taskinstance.py:880} INFO - Starting attempt 1 of 
2
   [2020-05-12 03:00:44,074] {taskinstance.py:881} INFO -
   
--------------------------------------------------------------------------------
   [2020-05-12 03:00:44,173] {taskinstance.py:900} INFO - Executing 
<Task(SparkSubmitOperator): Rmt_Extract> on 2020-05-12T04:00:00+00:00
   [2020-05-12 03:00:44,176] {standard_task_runner.py:53} INFO - Started 
process 11051 to run task
   [2020-05-12 03:00:44,619] {logging_mixin.py:112} INFO - Running %s on host 
%s <TaskInstance: 
Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 
2020-05-12T04:00:00+00:00 [running]> kafka02.cloud.ibm.com
   [2020-05-12 03:00:45,043] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:45,043] {base_hook.py:87} INFO - Using connection to: id: spark_default. 
Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: 
None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
   [2020-05-12 03:00:45,071] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:45,071] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit 
--master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
 --conf 
spark.kubernetes.container.image.pullSecrets=artifactory-container-registry 
--conf spark.submit.deployMode=cluster --conf 
spark.kubernetes.report.interval=2 --conf 
spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf 
spark.executor.userClassPathFirst=true --conf 
spark.driver.userClassPathFirst=true --conf 
spark.sql.parquet.compression.codec=gzip --conf 
spark.sql.session.timeZone=America/New_York --conf 
spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf 
spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf 
spark.driver.cores=1 --conf spark.executor.cores=2 --conf 
spark.kubernetes.executor.request.cores=0.6 --conf 
spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G 
--conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf 
spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf 
spark.sql.optimizer.maxIterations=1000 --conf 
spark.kubernetes.namespace=batch-pipeline-qa --files 
cos://dsw-data-project-qa.service/config/dsw_config.conf --jars 
cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar
 --name Rmt_Extract --class 
com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob 
cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 
3 8
   [2020-05-12 03:00:47,353] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:47,352] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:47 WARN 
NativeCodeLoader: Unable to load native-hadoop library for your platform... 
using builtin-java classes where applicable
   [2020-05-12 03:00:49,643] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:49,643] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could 
be found for logger (io.fabric8.kubernetes.client.Config).
   [2020-05-12 03:00:49,643] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:49,643] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize 
the log4j system properly.
   [2020-05-12 03:00:49,644] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:49,644] {spark_submit_hook.py:479} INFO - log4j:WARN See 
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
   [2020-05-12 03:00:50,889] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,888] {spark_submit_hook.py:479} INFO - Using Spark's default log4j 
profile: org/apache/spark/log4j-defaults.properties
   
   [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,908] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:50 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,908] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
rmtextract-1589270449587-driver
   [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,908] {spark_submit_hook.py:479} INFO - pod name: 
rmtextract-1589270449587-driver
   [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,908] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,908] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,909] {spark_submit_hook.py:479} INFO - pod uid: 
57056ebb-771a-4894-b1c3-398073967a6b
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,909] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-12T08:00:50Z
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,909] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,909] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,909] {spark_submit_hook.py:479} INFO - node name: N/A
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,909] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,909] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,910] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,910] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-12 03:00:50,912] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,912] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:50 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,912] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
rmtextract-1589270449587-driver
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,913] {spark_submit_hook.py:479} INFO - pod name: 
rmtextract-1589270449587-driver
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,913] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,913] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,913] {spark_submit_hook.py:479} INFO - pod uid: 
57056ebb-771a-4894-b1c3-398073967a6b
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,913] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-12T08:00:50Z
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,913] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,914] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,914] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,914] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,914] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,914] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:50,914] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-12 03:00:51,163] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,163] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:51 INFO Client: 
Waiting for application Rmt_Extract to finish...
   [2020-05-12 03:00:51,406] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,406] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:51 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,407] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
rmtextract-1589270449587-driver
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,407] {spark_submit_hook.py:479} INFO - pod name: 
rmtextract-1589270449587-driver
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,407] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,407] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,407] {spark_submit_hook.py:479} INFO - pod uid: 
57056ebb-771a-4894-b1c3-398073967a6b
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,407] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-12T08:00:50Z
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,407] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,408] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,408] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,408] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,408] {spark_submit_hook.py:479} INFO - container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,408] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 
03:00:51,408] {spark_submit_hook.py:479} INFO - status: 
[ContainerStatus(containerID=null, 
image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
restartCount=0, state=ContainerState(running=null, terminated=null, 
waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, 
additionalProperties={}), additionalProperties={}), additionalProperties={})]
   
   [2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,811] {spark_submit_hook.py:479} INFO - 20/05/12 03:01:07 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,811] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
rmtextract-1589270449587-driver
   [2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,811] {spark_submit_hook.py:479} INFO - pod name: 
rmtextract-1589270449587-driver
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,811] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,812] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,812] {spark_submit_hook.py:479} INFO - pod uid: 
57056ebb-771a-4894-b1c3-398073967a6b
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,812] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-12T08:00:50Z
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,812] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,812] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,812] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,812] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
   [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,813] {spark_submit_hook.py:479} INFO - container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,813] {spark_submit_hook.py:479} INFO - phase: Running
   [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 
03:01:07,813] {spark_submit_hook.py:479} INFO - status: 
[ContainerStatus(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6,
 image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
restartCount=0, 
state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-12T08:01:07Z,
 additionalProperties={}), terminated=null, waiting=null, 
additionalProperties={}), additionalProperties={})]
   
   [2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,876] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,876] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
rmtextract-1589270449587-driver
   [2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,876] {spark_submit_hook.py:479} INFO - pod name: 
rmtextract-1589270449587-driver
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,876] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,877] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,877] {spark_submit_hook.py:479} INFO - pod uid: 
57056ebb-771a-4894-b1c3-398073967a6b
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,877] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-12T08:00:50Z
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,877] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,877] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,877] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,878] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
   [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,878] {spark_submit_hook.py:479} INFO - container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,878] {spark_submit_hook.py:479} INFO - phase: Succeeded
   [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,878] {spark_submit_hook.py:479} INFO - status: 
[ContainerStatus(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6,
 image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
restartCount=0, state=ContainerState(running=null, 
terminated=ContainerStateTerminated(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6,
 exitCode=0, finishedAt=2020-05-12T08:52:03Z, message=null, reason=Completed, 
signal=null, startedAt=2020-05-12T08:01:07Z, additionalProperties={}), 
waiting=null, additionalProperties={}), additionalProperties={})]
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,882] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO 
LoggingPodStatusWatcherImpl: Container final statuses:
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,883] {spark_submit_hook.py:479} INFO -
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,883] {spark_submit_hook.py:479} INFO -
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,883] {spark_submit_hook.py:479} INFO - Container name: 
spark-kubernetes-driver
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,883] {spark_submit_hook.py:479} INFO - Container image: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,883] {spark_submit_hook.py:479} INFO - Container state: Terminated
   [2020-05-12 03:52:13,884] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,883] {spark_submit_hook.py:479} INFO - Exit code: 0
   [2020-05-12 03:52:13,884] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,884] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO Client: 
Application Rmt_Extract finished.
   [2020-05-12 03:52:13,889] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,889] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO 
ShutdownHookManager: Shutdown hook called
   [2020-05-12 03:52:13,890] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:13,890] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO 
ShutdownHookManager: Deleting directory 
/tmp/spark-8c6f511c-4c0f-451e-b686-7d0f444d8f59
   [2020-05-12 03:52:14,006] {taskinstance.py:1065} INFO - Marking task as 
SUCCESS.dag_id=Transformation_Rmt_Master_Load_Sequence_Adhoc_Main, 
task_id=Rmt_Extract, execution_date=20200512T040000, 
start_date=20200512T080043, end_date=20200512T085214
   [2020-05-12 03:52:19,162] {logging_mixin.py:112} INFO - [2020-05-12 
03:52:19,161] {local_task_job.py:103} INFO - Task exited with return code 0
   
   ```
   
   ### Scenario 2:job failed , log return from airflow ,get right status:
   
   ```log
   [dawany@dawany-inf env_qa]$ kubectl describe pod 
merge-rptg-appl-dscr-1589248907865-driver -n batch-pipeline-qa
   Name:         merge-rptg-appl-dscr-1589248907865-driver
   Namespace:    batch-pipeline-qa
   Priority:     0
   Node:         10.74.200.186/10.74.200.186
   Start Time:   Mon, 11 May 2020 22:01:52 -0400
   Labels:       spark-app-selector=spark-1e9e33df2d5c488dbc0555a618577f40
                 spark-role=driver
   Annotations:  kubernetes.io/psp: db2oltp-dev-psp
   Status:       Failed
   IP:           172.30.21.6
   IPs:          <none>
   Containers:
     spark-kubernetes-driver:
       Container ID:  
containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31
       Image:         
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
       Image ID:      
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885
       Ports:         7078/TCP, 7079/TCP, 4040/TCP
       Host Ports:    0/TCP, 0/TCP, 0/TCP
       Args:
         driver
         --properties-file
         /opt/spark/conf/spark.properties
         --class
         com.ibm.cio.dswim.ingest.CommonMerge
         spark-internal
         -t
         odsqa.shar1.rptg_appl_dscr
         -b
         dsw-data-drop-qa
         -d
         dashdb
         -k
         XREF
       State:          Terminated
         Reason:       Error
         Exit Code:    1
         Started:      Mon, 11 May 2020 22:02:19 -0400
         Finished:     Mon, 11 May 2020 22:05:03 -0400
       Ready:          False
       Restart Count:  0
       Limits:
         memory:  896Mi
       Requests:
         cpu:     1
         memory:  896Mi
       Environment:
         SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
         SPARK_LOCAL_DIRS:           
/var/data/spark-8688577b-07b4-4cf5-b362-1a15003420ae
         SPARK_CONF_DIR:             /opt/spark/conf
       Mounts:
         /opt/spark/conf from spark-conf-volume (rw)
         /opt/spark/secrets from dswsecret-volume (rw)
         /var/data/spark-8688577b-07b4-4cf5-b362-1a15003420ae from 
spark-local-dir-1 (rw)
         /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz 
(ro)
   Conditions:
     Type              Status
     Initialized       True
     Ready             False
     ContainersReady   False
     PodScheduled      True
   Volumes:
     spark-local-dir-1:
       Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
       Medium:
       SizeLimit:  <unset>
     dswsecret-volume:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  dswsecret
       Optional:    false
     spark-conf-volume:
       Type:      ConfigMap (a volume populated by a ConfigMap)
       Name:      merge-rptg-appl-dscr-1589248907865-driver-conf-map
       Optional:  false
     spark-token-fpqpz:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  spark-token-fpqpz
       Optional:    false
   QoS Class:       Burstable
   Node-Selectors:  <none>
   Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                    node.kubernetes.io/unreachable:NoExecute for 600s
   Events:          <none>
   [dawany@dawany-inf env_qa]$
   
   ```
   
   ```log
   *** Reading local file: 
/home/airflow/airflow/logs/Batch_xref.rptg_appl_dscr/merge_rptg_appl_dscr/2020-05-11T02:01:00+00:00/1.log
   [2020-05-11 21:01:41,692] {taskinstance.py:669} INFO - Dependencies all met 
for <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 
2020-05-11T02:01:00+00:00 [queued]>
   [2020-05-11 21:01:41,801] {taskinstance.py:669} INFO - Dependencies all met 
for <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 
2020-05-11T02:01:00+00:00 [queued]>
   [2020-05-11 21:01:41,801] {taskinstance.py:879} INFO -
   
--------------------------------------------------------------------------------
   [2020-05-11 21:01:41,801] {taskinstance.py:880} INFO - Starting attempt 1 of 
2
   [2020-05-11 21:01:41,802] {taskinstance.py:881} INFO -
   
--------------------------------------------------------------------------------
   [2020-05-11 21:01:41,911] {taskinstance.py:900} INFO - Executing 
<Task(SparkSubmitOperator): merge_rptg_appl_dscr> on 2020-05-11T02:01:00+00:00
   [2020-05-11 21:01:41,919] {standard_task_runner.py:53} INFO - Started 
process 37536 to run task
   [2020-05-11 21:01:42,508] {logging_mixin.py:112} INFO - Running %s on host 
%s <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 
2020-05-11T02:01:00+00:00 [running]> kafka02.cloud.ibm.com
   [2020-05-11 21:01:43,042] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:43,042] {base_hook.py:87} INFO - Using connection to: id: spark_default. 
Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: 
None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
   [2020-05-11 21:01:43,055] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:43,055] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit 
--master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
 --conf 
spark.kubernetes.container.image.pullSecrets=artifactory-container-registry 
--conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf 
spark.kubernetes.executor.request.cores=0.5 --conf 
spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf 
spark.kubernetes.namespace=batch-pipeline-qa --files 
cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf
 --jars 
cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar
 --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class 
com.ibm.cio.dswim.ingest.CommonMerge 
cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t 
odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF
   [2020-05-11 21:01:45,650] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:45,649] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:45 WARN 
NativeCodeLoader: Unable to load native-hadoop library for your platform... 
using builtin-java classes where applicable
   [2020-05-11 21:01:47,988] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:47,988] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could 
be found for logger (io.fabric8.kubernetes.client.Config).
   [2020-05-11 21:01:47,989] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:47,989] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize 
the log4j system properly.
   [2020-05-11 21:01:47,989] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:47,989] {spark_submit_hook.py:479} INFO - log4j:WARN See 
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
   [2020-05-11 21:01:49,267] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,266] {spark_submit_hook.py:479} INFO - Using Spark's default log4j 
profile: org/apache/spark/log4j-defaults.properties
   
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,282] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,283] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,283] {spark_submit_hook.py:479} INFO - pod name: 
merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,283] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,283] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,283] {spark_submit_hook.py:479} INFO - pod uid: 
f01dcbd9-fe6d-421a-99ea-7749f6b79345
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,284] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-12T02:01:48Z
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,284] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,284] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,284] {spark_submit_hook.py:479} INFO - node name: N/A
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,284] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,284] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,284] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 21:01:49,285] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,285] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,291] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,292] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,292] {spark_submit_hook.py:479} INFO - pod name: 
merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,292] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,292] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,292] {spark_submit_hook.py:479} INFO - pod uid: 
f01dcbd9-fe6d-421a-99ea-7749f6b79345
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,292] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-12T02:01:48Z
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,293] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,293] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,293] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,293] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,293] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,293] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 21:01:49,294] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,293] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-11 21:01:49,550] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:49,550] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO Client: 
Waiting for application merge-rptg-appl-dscr to finish...
   [2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,413] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:53 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,413] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,413] {spark_submit_hook.py:479} INFO - pod name: 
merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,413] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,414] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,414] {spark_submit_hook.py:479} INFO - pod uid: 
f01dcbd9-fe6d-421a-99ea-7749f6b79345
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,414] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-12T02:01:48Z
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,414] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,414] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,414] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
   [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,414] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
   [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,415] {spark_submit_hook.py:479} INFO - container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
   [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,415] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 
21:01:53,415] {spark_submit_hook.py:479} INFO - status: 
[ContainerStatus(containerID=null, 
image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, 
imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
restartCount=0, state=ContainerState(running=null, terminated=null, 
waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, 
additionalProperties={}), additionalProperties={}), additionalProperties={})]
   
   [2020-05-11 21:02:20,393] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,393] {spark_submit_hook.py:479} INFO - 20/05/11 21:02:20 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,393] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,394] {spark_submit_hook.py:479} INFO - pod name: 
merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,394] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,394] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,394] {spark_submit_hook.py:479} INFO - pod uid: 
f01dcbd9-fe6d-421a-99ea-7749f6b79345
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,394] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-12T02:01:48Z
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,394] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,395] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,395] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,395] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,395] {spark_submit_hook.py:479} INFO - container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,395] {spark_submit_hook.py:479} INFO - phase: Running
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 
21:02:20,395] {spark_submit_hook.py:479} INFO - status: 
[ContainerStatus(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31,
 image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, 
imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
restartCount=0, 
state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-12T02:02:19Z,
 additionalProperties={}), terminated=null, waiting=null, 
additionalProperties={}), additionalProperties={})]
   
   [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,953] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,953] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,953] {spark_submit_hook.py:479} INFO - pod name: 
merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,953] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,954] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,954] {spark_submit_hook.py:479} INFO - pod uid: 
f01dcbd9-fe6d-421a-99ea-7749f6b79345
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,954] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-12T02:01:48Z
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,954] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,954] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,954] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,954] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
   [2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,955] {spark_submit_hook.py:479} INFO - container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
   [2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,955] {spark_submit_hook.py:479} INFO - phase: Failed
   [2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,955] {spark_submit_hook.py:479} INFO - status: 
[ContainerStatus(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31,
 image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, 
imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
restartCount=0, state=ContainerState(running=null, 
terminated=ContainerStateTerminated(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31,
 exitCode=1, finishedAt=2020-05-12T02:05:03Z, message=null, reason=Error, 
signal=null, startedAt=2020-05-12T02:02:19Z, additionalProperties={}), 
waiting=null, additionalProperties={}), additionalProperties={})]
   [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,960] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO 
LoggingPodStatusWatcherImpl: Container final statuses:
   [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,960] {spark_submit_hook.py:479} INFO -
   [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,960] {spark_submit_hook.py:479} INFO -
   [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,960] {spark_submit_hook.py:479} INFO - Container name: 
spark-kubernetes-driver
   [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,960] {spark_submit_hook.py:479} INFO - Container image: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
   [2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,961] {spark_submit_hook.py:479} INFO - Container state: Terminated
   [2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,961] {spark_submit_hook.py:479} INFO - Exit code: 1
   [2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,961] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO Client: 
Application merge-rptg-appl-dscr finished.
   [2020-05-11 21:06:00,968] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,968] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO 
ShutdownHookManager: Shutdown hook called
   [2020-05-11 21:06:00,969] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:00,969] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO 
ShutdownHookManager: Deleting directory 
/tmp/spark-88cb9622-28cd-42e4-aaea-48a2206ae942
   [2020-05-11 21:06:01,098] {taskinstance.py:1145} ERROR - Cannot execute: 
spark-submit --master 
k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf 
spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
 --conf 
spark.kubernetes.container.image.pullSecrets=artifactory-container-registry 
--conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf 
spark.kubernetes.executor.request.cores=0.5 --conf 
spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf 
spark.kubernetes.namespace=batch-pipeline-qa --files 
cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf
 --jars 
cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar
 --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class 
com.ibm.cio.dswim.ingest.CommonMerge 
cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t 
odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF. Error code 
is: 0.
   
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py",
 line 983, in _run_raw_task
       result = task_copy.execute(context=context)
     File 
"/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py",
 line 187, in execute
       self._hook.submit(self._application)
     File 
"/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py",
 line 405, in submit
       self._mask_cmd(spark_submit_cmd), returncode
   airflow.exceptions.AirflowException: Cannot execute: spark-submit --master 
k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf 
spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
 --conf 
spark.kubernetes.container.image.pullSecrets=artifactory-container-registry 
--conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf 
spark.kubernetes.executor.request.cores=0.5 --conf 
spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf 
spark.kubernetes.namespace=batch-pipeline-qa --files 
cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf
 --jars 
cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar
 --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class 
com.ibm.cio.dswim.ingest.CommonMerge 
cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t 
odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF. Error code 
is: 0.
   [2020-05-11 21:06:01,100] {taskinstance.py:1168} INFO - Marking task as 
UP_FOR_RETRY
   [2020-05-11 21:06:03,316] {logging_mixin.py:112} INFO - [2020-05-11 
21:06:03,313] {local_task_job.py:103} INFO - Task exited with return code 1
   
   ```
   
   ### Scenario 3:job success, log return from airflow ,get wrong status(This 
Scenario is what we want to analysis)
   
   ```log
   [dawany@dawany-inf env_qa]$ kubectl describe pod 
rmtextract-1589227268554-driver -n batch-pipeline-qa
   Name:         rmtextract-1589227268554-driver
   Namespace:    batch-pipeline-qa
   Priority:     0
   Node:         10.93.122.236/10.93.122.236
   Start Time:   Mon, 11 May 2020 16:01:10 -0400
   Labels:       spark-app-selector=spark-836f53b29a274eabbeba208d27e242de
                 spark-role=driver
   Annotations:  kubernetes.io/psp: db2oltp-dev-psp
   Status:       Succeeded
   IP:           172.30.244.35
   IPs:          <none>
   Containers:
     spark-kubernetes-driver:
       Container ID:  
containerd://9608ccf92d1067645b6cbfcd289e6c99c76cd57463ed7c1fb71352a38a27a58c
       Image:         
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
       Image ID:      
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
       Ports:         7078/TCP, 7079/TCP, 4040/TCP
       Host Ports:    0/TCP, 0/TCP, 0/TCP
       Args:
         driver
         --properties-file
         /opt/spark/conf/spark.properties
         --class
         com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob
         spark-internal
         0
         3
         8
       State:          Terminated
         Reason:       Completed
         Exit Code:    0
         Started:      Mon, 11 May 2020 16:01:31 -0400
         Finished:     Mon, 11 May 2020 16:50:14 -0400
       Ready:          False
       Restart Count:  0
       Limits:
         memory:  11Gi
       Requests:
         cpu:     1
         memory:  11Gi
       Environment:
         SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
         SPARK_LOCAL_DIRS:           
/var/data/spark-2bb8b417-ca61-4e7a-a4b2-fa0a695a1109
         SPARK_CONF_DIR:             /opt/spark/conf
       Mounts:
         /opt/spark/conf from spark-conf-volume (rw)
         /opt/spark/secrets from dswsecret-volume (rw)
         /var/data/spark-2bb8b417-ca61-4e7a-a4b2-fa0a695a1109 from 
spark-local-dir-1 (rw)
         /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz 
(ro)
   Conditions:
     Type              Status
     Initialized       True
     Ready             False
     ContainersReady   False
     PodScheduled      True
   Volumes:
     spark-local-dir-1:
       Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
       Medium:
       SizeLimit:  <unset>
     dswsecret-volume:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  dswsecret
       Optional:    false
     spark-conf-volume:
       Type:      ConfigMap (a volume populated by a ConfigMap)
       Name:      rmtextract-1589227268554-driver-conf-map
       Optional:  false
     spark-token-fpqpz:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  spark-token-fpqpz
       Optional:    false
   QoS Class:       Burstable
   Node-Selectors:  <none>
   Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                    node.kubernetes.io/unreachable:NoExecute for 600s
   Events:          <none>
   
   ```
   
   ```log
   *** Reading local file: 
/home/airflow/airflow/logs/Transformation_Rmt_Master_Load_Sequence_Adhoc_Main/Rmt_Extract/2020-05-11T16:00:00+00:00/1.log
   [2020-05-11 15:01:02,589] {taskinstance.py:669} INFO - Dependencies all met 
for <TaskInstance: 
Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 
2020-05-11T16:00:00+00:00 [queued]>
   [2020-05-11 15:01:02,722] {taskinstance.py:669} INFO - Dependencies all met 
for <TaskInstance: 
Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 
2020-05-11T16:00:00+00:00 [queued]>
   [2020-05-11 15:01:02,722] {taskinstance.py:879} INFO -
   
--------------------------------------------------------------------------------
   [2020-05-11 15:01:02,722] {taskinstance.py:880} INFO - Starting attempt 1 of 
2
   [2020-05-11 15:01:02,722] {taskinstance.py:881} INFO -
   
--------------------------------------------------------------------------------
   [2020-05-11 15:01:02,826] {taskinstance.py:900} INFO - Executing 
<Task(SparkSubmitOperator): Rmt_Extract> on 2020-05-11T16:00:00+00:00
   [2020-05-11 15:01:02,831] {standard_task_runner.py:53} INFO - Started 
process 39984 to run task
   [2020-05-11 15:01:03,415] {logging_mixin.py:112} INFO - Running %s on host 
%s <TaskInstance: 
Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 
2020-05-11T16:00:00+00:00 [running]> kafka02.cloud.ibm.com
   [2020-05-11 15:01:03,970] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:03,969] {base_hook.py:87} INFO - Using connection to: id: spark_default. 
Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: 
None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
   [2020-05-11 15:01:04,000] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:04,000] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit 
--master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
 --conf 
spark.kubernetes.container.image.pullSecrets=artifactory-container-registry 
--conf spark.submit.deployMode=cluster --conf 
spark.kubernetes.report.interval=2 --conf 
spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf 
spark.executor.userClassPathFirst=true --conf 
spark.driver.userClassPathFirst=true --conf 
spark.sql.parquet.compression.codec=gzip --conf 
spark.sql.session.timeZone=America/New_York --conf 
spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf 
spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf 
spark.driver.cores=1 --conf spark.executor.cores=2 --conf 
spark.kubernetes.executor.request.cores=0.6 --conf 
spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G 
--conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf 
spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf 
spark.sql.optimizer.maxIterations=1000 --conf 
spark.kubernetes.namespace=batch-pipeline-qa --files 
cos://dsw-data-project-qa.service/config/dsw_config.conf --jars 
cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar
 --name Rmt_Extract --class 
com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob 
cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 
3 8
   [2020-05-11 15:01:06,349] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:06,348] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:06 WARN 
NativeCodeLoader: Unable to load native-hadoop library for your platform... 
using builtin-java classes where applicable
   [2020-05-11 15:01:08,607] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:08,607] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could 
be found for logger (io.fabric8.kubernetes.client.Config).
   [2020-05-11 15:01:08,608] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:08,607] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize 
the log4j system properly.
   [2020-05-11 15:01:08,608] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:08,608] {spark_submit_hook.py:479} INFO - log4j:WARN See 
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
   [2020-05-11 15:01:09,773] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,773] {spark_submit_hook.py:479} INFO - Using Spark's default log4j 
profile: org/apache/spark/log4j-defaults.properties
   
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,784] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:09 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,785] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
rmtextract-1589227268554-driver
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,785] {spark_submit_hook.py:479} INFO - pod name: 
rmtextract-1589227268554-driver
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,785] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,785] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,785] {spark_submit_hook.py:479} INFO - pod uid: 
3684e5db-8f69-4b78-be4b-6669e42806e6
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,786] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-11T20:01:09Z
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,786] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,786] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,786] {spark_submit_hook.py:479} INFO - node name: N/A
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,786] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,786] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,786] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 15:01:09,787] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,787] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,796] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:09 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,797] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
rmtextract-1589227268554-driver
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,797] {spark_submit_hook.py:479} INFO - pod name: 
rmtextract-1589227268554-driver
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,797] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,797] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,797] {spark_submit_hook.py:479} INFO - pod uid: 
3684e5db-8f69-4b78-be4b-6669e42806e6
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,797] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-11T20:01:09Z
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,798] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,798] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,798] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,798] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,798] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,798] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 15:01:09,799] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:09,798] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-11 15:01:10,080] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,080] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:10 INFO Client: 
Waiting for application Rmt_Extract to finish...
   [2020-05-11 15:01:10,307] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,307] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:10 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,307] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
rmtextract-1589227268554-driver
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,308] {spark_submit_hook.py:479} INFO - pod name: 
rmtextract-1589227268554-driver
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,308] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,308] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,308] {spark_submit_hook.py:479} INFO - pod uid: 
3684e5db-8f69-4b78-be4b-6669e42806e6
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,308] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-11T20:01:09Z
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,308] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,309] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,309] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,309] {spark_submit_hook.py:479} INFO - start time: 2020-05-11T20:01:10Z
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,309] {spark_submit_hook.py:479} INFO - container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,309] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:10,309] {spark_submit_hook.py:479} INFO - status: 
[ContainerStatus(containerID=null, 
image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
restartCount=0, state=ContainerState(running=null, terminated=null, 
waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, 
additionalProperties={}), additionalProperties={}), additionalProperties={})]
   
   [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,411] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:32 INFO 
LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,411] {spark_submit_hook.py:462} INFO - Identified spark driver pod: 
rmtextract-1589227268554-driver
   [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,411] {spark_submit_hook.py:479} INFO - pod name: 
rmtextract-1589227268554-driver
   [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,411] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,412] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> 
spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,412] {spark_submit_hook.py:479} INFO - pod uid: 
3684e5db-8f69-4b78-be4b-6669e42806e6
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,412] {spark_submit_hook.py:479} INFO - creation time: 
2020-05-11T20:01:09Z
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,412] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,412] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, 
dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,412] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,412] {spark_submit_hook.py:479} INFO - start time: 2020-05-11T20:01:10Z
   [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,413] {spark_submit_hook.py:479} INFO - container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,413] {spark_submit_hook.py:479} INFO - phase: Running
   [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 
15:01:32,413] {spark_submit_hook.py:479} INFO - status: 
[ContainerStatus(containerID=containerd://9608ccf92d1067645b6cbfcd289e6c99c76cd57463ed7c1fb71352a38a27a58c,
 image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
restartCount=0, 
state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-11T20:01:31Z,
 additionalProperties={}), terminated=null, waiting=null, 
additionalProperties={}), additionalProperties={})]
   
   [2020-05-11 15:49:57,933] {logging_mixin.py:112} INFO - [2020-05-11 
15:49:57,933] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO 
LoggingPodStatusWatcherImpl: Container final statuses:
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 
15:49:57,934] {spark_submit_hook.py:479} INFO -
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 
15:49:57,934] {spark_submit_hook.py:479} INFO -
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 
15:49:57,934] {spark_submit_hook.py:479} INFO - Container name: 
spark-kubernetes-driver
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 
15:49:57,934] {spark_submit_hook.py:479} INFO - Container image: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 
15:49:57,934] {spark_submit_hook.py:479} INFO - Container state: Running
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 
15:49:57,934] {spark_submit_hook.py:479} INFO - Container started at: 
2020-05-11T20:01:31Z
   [2020-05-11 15:49:57,935] {logging_mixin.py:112} INFO - [2020-05-11 
15:49:57,934] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO Client: 
Application Rmt_Extract finished.
   [2020-05-11 15:49:57,936] {logging_mixin.py:112} INFO - [2020-05-11 
15:49:57,936] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO 
ShutdownHookManager: Shutdown hook called
   [2020-05-11 15:49:57,937] {logging_mixin.py:112} INFO - [2020-05-11 
15:49:57,937] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO 
ShutdownHookManager: Deleting directory 
/tmp/spark-5bfba41b-8cab-429b-b64f-dcb4f52c4d3a
   [2020-05-11 15:49:58,101] {taskinstance.py:1145} ERROR - Cannot execute: 
spark-submit --master 
k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf 
spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
 --conf 
spark.kubernetes.container.image.pullSecrets=artifactory-container-registry 
--conf spark.submit.deployMode=cluster --conf 
spark.kubernetes.report.interval=2 --conf 
spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf 
spark.executor.userClassPathFirst=true --conf 
spark.driver.userClassPathFirst=true --conf 
spark.sql.parquet.compression.codec=gzip --conf 
spark.sql.session.timeZone=America/New_York --conf 
spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf 
spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf 
spark.driver.cores=1 --conf spark.executor.cores=2 --conf 
spark.kubernetes.executor.request.cores=0.6 --conf 
spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G 
--conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf 
spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf 
spark.sql.optimizer.maxIterations=1000 --conf 
spark.kubernetes.namespace=batch-pipeline-qa --files 
cos://dsw-data-project-qa.service/config/dsw_config.conf --jars 
cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar
 --name Rmt_Extract --class 
com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob 
cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 
3 8. Error code is: 0.
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py",
 line 983, in _run_raw_task
       result = task_copy.execute(context=context)
     File 
"/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py",
 line 187, in execute
       self._hook.submit(self._application)
     File 
"/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py",
 line 405, in submit
       self._mask_cmd(spark_submit_cmd), returncode
   airflow.exceptions.AirflowException: Cannot execute: spark-submit --master 
k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf 
spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
 --conf 
spark.kubernetes.container.image.pullSecrets=artifactory-container-registry 
--conf spark.submit.deployMode=cluster --conf 
spark.kubernetes.report.interval=2 --conf 
spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf 
spark.executor.userClassPathFirst=true --conf 
spark.driver.userClassPathFirst=true --conf 
spark.sql.parquet.compression.codec=gzip --conf 
spark.sql.session.timeZone=America/New_York --conf 
spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf 
spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf 
spark.driver.cores=1 --conf spark.executor.cores=2 --conf 
spark.kubernetes.executor.request.cores=0.6 --conf 
spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G 
--conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf 
spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf 
spark.sql.optimizer.maxIterations=1000 --conf 
spark.kubernetes.namespace=batch-pipeline-qa --files 
cos://dsw-data-project-qa.service/config/dsw_config.conf --jars 
cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar
 --name Rmt_Extract --class 
com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob 
cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 
3 8. Error code is: 0.
   [2020-05-11 15:49:58,102] {taskinstance.py:1168} INFO - Marking task as 
UP_FOR_RETRY
   [2020-05-11 15:50:00,600] {logging_mixin.py:112} INFO - [2020-05-11 
15:50:00,599] {local_task_job.py:103} INFO - Task exited with return code 1
   
   ```
   
   ## submit job in Scenario 3, by manual, get k8s logs
   
   ### Scenario 3:log return from k8s
   
   ```log
   [dawany@dawany-inf env_qa]$ kubectl describe pod -n batch-pipeline-qa  
testquoterptfact-1589301844228-driver
   Name:         testquoterptfact-1589301844228-driver
   Namespace:    batch-pipeline-qa
   Priority:     0
   Node:         10.74.200.157/10.74.200.157
   Start Time:   Tue, 12 May 2020 12:44:09 -0400
   Labels:       spark-app-selector=spark-6ddf413798da4c5d83645a7bc760a925
                 spark-role=driver
   Annotations:  kubernetes.io/psp: db2oltp-dev-psp
   Status:       Succeeded
   IP:           172.30.0.6
   IPs:          <none>
   Containers:
     spark-kubernetes-driver:
       Container ID:  
containerd://d5479e5ff5c582db541e8b545953981d705024f73861a0fddd506a3b11999e4b
       Image:         
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
       Image ID:      
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
       Ports:         7078/TCP, 7079/TCP, 4040/TCP
       Host Ports:    0/TCP, 0/TCP, 0/TCP
       Args:
         driver
         --properties-file
         /opt/spark/conf/spark.properties
         --class
         com.ibm.cio.dswim.qrf.job.QuoteRptFactJob
         spark-internal
         1
       State:          Terminated
         Reason:       Completed
         Exit Code:    0
         Started:      Tue, 12 May 2020 12:44:37 -0400
         Finished:     Tue, 12 May 2020 14:52:57 -0400
       Ready:          False
       Restart Count:  0
       Limits:
         cpu:     4
         memory:  16896Mi
       Requests:
         cpu:     1
         memory:  16896Mi
       Environment:
         SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
         SPARK_LOCAL_DIRS:           
/var/data/spark-e4c8fd1e-8b33-449c-b460-c842df658705
         SPARK_CONF_DIR:             /opt/spark/conf
       Mounts:
         /opt/spark/conf from spark-conf-volume (rw)
         /opt/spark/secrets/ from dswsecret-volume (rw)
         /var/data/spark-e4c8fd1e-8b33-449c-b460-c842df658705 from 
spark-local-dir-1 (rw)
         /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz 
(ro)
   Conditions:
     Type              Status
     Initialized       True
     Ready             False
     ContainersReady   False
     PodScheduled      True
   Volumes:
     spark-local-dir-1:
       Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
       Medium:
       SizeLimit:  <unset>
     dswsecret-volume:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  dswsecret
       Optional:    false
     spark-conf-volume:
       Type:      ConfigMap (a volume populated by a ConfigMap)
       Name:      testquoterptfact-1589301844228-driver-conf-map
       Optional:  false
     spark-token-fpqpz:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  spark-token-fpqpz
       Optional:    false
   QoS Class:       Burstable
   Node-Selectors:  <none>
   Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                    node.kubernetes.io/unreachable:NoExecute for 600s
   Events:          <none>
   
   ```
   
   ```log
   2020-05-12 12:43:59 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   2020-05-12 12:44:06 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: testquoterptfact-1589301844228-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, 
spark-role -> driver
         pod uid: ff918848-2819-4336-a4a3-654f01dd756c
         creation time: 2020-05-12T16:44:06Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: N/A
         start time: N/A
         container images: N/A
         phase: Pending
         status: []
   2020-05-12 12:44:06 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: testquoterptfact-1589301844228-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, 
spark-role -> driver
         pod uid: ff918848-2819-4336-a4a3-654f01dd756c
         creation time: 2020-05-12T16:44:06Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.157
         start time: N/A
         container images: N/A
         phase: Pending
         status: []
   2020-05-12 12:44:07 INFO  Client:54 - Waiting for application 
test_quote_rpt_fact to finish...
   2020-05-12 12:44:09 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: testquoterptfact-1589301844228-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, 
spark-role -> driver
         pod uid: ff918848-2819-4336-a4a3-654f01dd756c
         creation time: 2020-05-12T16:44:06Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.157
         start time: 2020-05-12T16:44:09Z
         container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         phase: Pending
         status: [ContainerStatus(containerID=null, 
image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
restartCount=0, state=ContainerState(running=null, terminated=null, 
waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, 
additionalProperties={}), additionalProperties={}), additionalProperties={})]
   2020-05-12 12:44:37 INFO  LoggingPodStatusWatcherImpl:54 - State changed, 
new state:
         pod name: testquoterptfact-1589301844228-driver
         namespace: batch-pipeline-qa
         labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, 
spark-role -> driver
         pod uid: ff918848-2819-4336-a4a3-654f01dd756c
         creation time: 2020-05-12T16:44:06Z
         service account name: spark
         volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, 
spark-token-fpqpz
         node name: 10.74.200.157
         start time: 2020-05-12T16:44:09Z
         container images: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         phase: Running
         status: 
[ContainerStatus(containerID=containerd://d5479e5ff5c582db541e8b545953981d705024f73861a0fddd506a3b11999e4b,
 image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, 
imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
restartCount=0, 
state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-12T16:44:37Z,
 additionalProperties={}), additionalProperties={}), terminated=null, 
waiting=null, additionalProperties={}), additionalProperties={})]
   2020-05-12 13:43:51 INFO  WatchConnectionManager:379 - Current reconnect 
backoff is 1000 milliseconds (T0)
   2020-05-12 13:43:53 INFO  LoggingPodStatusWatcherImpl:54 - Container final 
statuses:
   
   
         Container name: spark-kubernetes-driver
         Container image: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         Container state: Running
         Container started at: 2020-05-12T16:44:37Z
   2020-05-12 13:43:53 INFO  Client:54 - Application test_quote_rpt_fact 
finished.
   2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Shutdown hook called
   2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Deleting directory 
/tmp/spark-09a9bf16-11ee-43ea-95f0-e970b0ea7578
   ```
   
   ## Difference is here
   
   ```log
   2020-05-12 13:43:51 INFO  WatchConnectionManager:379 - Current reconnect 
backoff is 1000 milliseconds (T0)
   2020-05-12 13:43:53 INFO  LoggingPodStatusWatcherImpl:54 - Container final 
statuses:
   
   
         Container name: spark-kubernetes-driver
         Container image: 
txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
         Container state: Running
         Container started at: 2020-05-12T16:44:37Z
   2020-05-12 13:43:53 INFO  Client:54 - Application test_quote_rpt_fact 
finished.
   2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Shutdown hook called
   2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Deleting directory 
/tmp/spark-09a9bf16-11ee-43ea-95f0-e970b0ea7578
   ```
   
   ## conclusion:
   
   Compared logs details ,there is difference when log is terminated:
   
   1.  airflow side & source code
   
            log stream interrupt and no 'exit code' return, so 
'self._spark_exit_code' will be initial value 'None' then airflow will mark job 
failed though the job is still running in k8s actually.
        
           here is a similar 
issue:https://issues.apache.org/jira/browse/AIRFLOW-6244
   
   ```python
   class SparkSubmitHook
       def submit(self, application="", **kwargs):
           """
           Remote Popen to execute the spark-submit job
   
           :param application: Submitted application, jar or py file
           :type application: str
           :param kwargs: extra arguments to Popen (see subprocess.Popen)
           """
           spark_submit_cmd = self._build_spark_submit_command(application)
   
           if hasattr(self, '_env'):
               env = os.environ.copy()
               env.update(self._env)
               kwargs["env"] = env
   
           self._submit_sp = subprocess.Popen(spark_submit_cmd,
                                              stdout=subprocess.PIPE,
                                              stderr=subprocess.STDOUT,
                                              bufsize=-1,
                                              universal_newlines=True,
                                              **kwargs)
   
           self._process_spark_submit_log(iter(self._submit_sp.stdout.readline, 
''))
           returncode = self._submit_sp.wait()
   
           # Check spark-submit return code. In Kubernetes mode, also check the 
value
           # of exit code in the log, as it may differ.
           if returncode or (self._is_kubernetes and self._spark_exit_code != 
0):
               raise AirflowException(
                   "Cannot execute: {}. Error code is: {}.".format(
                       self._mask_cmd(spark_submit_cmd), returncode
                   )
               )
           ...
   
       def _process_spark_submit_log(self, itr):
           # Consume the iterator
           for line in itr:
               line = line.strip()
   
               # If we run Kubernetes cluster mode, we want to extract the 
driver pod id
               # from the logs so we can kill the application when we stop it 
unexpectedly
               elif self._is_kubernetes:
                   match = re.search(r'\s*pod name: 
((.+?)-([a-z0-9]+)-driver)', line)
                   if match:
                       self._kubernetes_driver_pod = match.groups()[0]
                       self.log.info("Identified spark driver pod: %s",
                                     self._kubernetes_driver_pod)
   
                   # Store the Spark Exit code
                   match_exit_code = re.search(r'\s*Exit code: (\d+)', line)
                   if match_exit_code:
                       self._spark_exit_code = int(match_exit_code.groups()[0])
   
               ...
   
               self.log.info(line)
   ```
   
   2.  k8s side (k8s client)
   
           2020-05-12 12:51:40 INFO WatchConnectionManager:379 - Current 
reconnect backoff is 1000 milliseconds (T0)
   
           these are nothing to worry about. This is a known occurrence in 
Kubernetes and is not an issue [0]. The API server ends watch requests when 
they are very old. The operator uses a client-go informer, which takes care of 
automatically re-listing the resource and then restarting the watch from the 
latest resource version.
   
           fabric8:https://github.com/fabric8io/kubernetes-client/issues/1075
   
           
https://stackoverflow.com/questions/52910322/kubernetes-resource-versioning/52925973#52925973
   
   ```java
   private long nextReconnectInterval() {
       int exponentOfTwo = currentReconnectAttempt.getAndIncrement();
       if (exponentOfTwo > maxIntervalExponent)
         exponentOfTwo = maxIntervalExponent;
       long ret = reconnectInterval * (1 << exponentOfTwo);
       logger.debug("Current reconnect backoff is " + ret + " milliseconds (T" 
+ exponentOfTwo + ")");
       return ret;
     }
   ```
   
   3.  spark side
   
            
http://mail-archives.apache.org/mod_mbox/spark-issues/201805.mbox/%3cjira.13158986.1526264708000.69213.1526330460...@atlassian.jira%3E
   
            
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16474770#comment-16474770
   
            pull reuqest:https://github.com/apache/spark/pull/28423
   
   ## ask for suggestions
   
   so is there some suggestions to avoid this issue? 
   
   ## actions may be considered
   
   ### [airflow][source_code_change]
   
       modify source code in airflow,use 'kubectl describe pod xxxx -n xxxx' 
every several seconds instead of log stream.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to