ywan2017 opened a new issue #8963: URL: https://github.com/apache/airflow/issues/8963
# SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception ## description I use airflow to schedule spark jobs on k8s using SparkSubmitOperator. when spark jobs run on k8s for long time (>30mins), airflow often mark job failed status when the job is still running even the job finish successfully. ### when it happen ,this exception often appears at the same time but not always ```java 20/05/20 10:49:58 INFO TaskSetManager: Finished task 7.0 in stage 91.0 (TID 13313) in 34319 ms on 172.30.238.243 (executor 5) (10/20) 20/05/20 10:49:59 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 140189600 (140238574) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:307) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:222) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101) at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 20/05/20 10:50:19 INFO TaskSetManager: Finished task 10.0 in stage 91.0 (TID 13316) in 27478 ms on 172.30.244.131 (executor 1) (11/20) 20/05/20 10:50:19 INFO TaskSetManager: Finished task 12.0 in stage 91.0 (TID 13318) in 27052 ms on 172.30.244.131 (executor 1) (12/20) ``` ## env uat env airflow version:1.10.10 python:3.6.8 k8s server 1.5 k8s client ? spark 2.4.4-h2.7 ## submit jobs by manual,get k8s logs directly ### Scenario 1: job succeed, log return from k8s ```log 2020-05-11 11:58:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2020-05-11 11:58:27 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: empreporting-1589212705263-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4 creation time: 2020-05-11T15:58:27Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: N/A start time: N/A container images: N/A phase: Pending status: [] 2020-05-11 11:58:27 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: empreporting-1589212705263-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4 creation time: 2020-05-11T15:58:27Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.157 start time: N/A container images: N/A phase: Pending status: [] 2020-05-11 11:58:28 INFO Client:54 - Waiting for application emp_reporting to finish... 2020-05-11 11:58:48 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: empreporting-1589212705263-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4 creation time: 2020-05-11T15:58:27Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.157 start time: 2020-05-11T15:58:48Z container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 phase: Pending status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] 2020-05-11 11:59:28 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: empreporting-1589212705263-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4 creation time: 2020-05-11T15:58:27Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.157 start time: 2020-05-11T15:58:48Z container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 phase: Running status: [ContainerStatus(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-11T15:59:26Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] 2020-05-11 12:01:15 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: empreporting-1589212705263-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4 creation time: 2020-05-11T15:58:27Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.157 start time: 2020-05-11T15:58:48Z container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 phase: Succeeded status: [ContainerStatus(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, exitCode=0, finishedAt=Time(time=2020-05-11T16:01:08Z, additionalProperties={}), message=null, reason=Completed, signal=null, startedAt=Time(time=2020-05-11T15:59:26Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})] 2020-05-11 12:01:15 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses: Container name: spark-kubernetes-driver Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 Container state: Terminated Exit code: 0 2020-05-11 12:01:15 INFO Client:54 - Application emp_reporting finished. 2020-05-11 12:01:15 INFO ShutdownHookManager:54 - Shutdown hook called 2020-05-11 12:01:15 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-f3b0955d-fe7c-4858-a1c6-f70597d42104 ``` ### Scenario 2:job failed, log return from k8s ```log 2020-05-11 11:54:10 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2020-05-11 11:54:16 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: empreporting-1589212454526-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2 creation time: 2020-05-11T15:54:16Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: N/A start time: N/A container images: N/A phase: Pending status: [] 2020-05-11 11:54:16 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: empreporting-1589212454526-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2 creation time: 2020-05-11T15:54:16Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.186 start time: N/A container images: N/A phase: Pending status: [] 2020-05-11 11:54:17 INFO Client:54 - Waiting for application emp_reporting to finish... 2020-05-11 11:54:17 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: empreporting-1589212454526-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2 creation time: 2020-05-11T15:54:16Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.186 start time: 2020-05-11T15:54:17Z container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 phase: Pending status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] 2020-05-11 11:55:04 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: empreporting-1589212454526-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2 creation time: 2020-05-11T15:54:16Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.186 start time: 2020-05-11T15:54:17Z container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 phase: Running status: [ContainerStatus(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-11T15:55:03Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] 2020-05-11 11:55:29 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: empreporting-1589212454526-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2 creation time: 2020-05-11T15:54:16Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.186 start time: 2020-05-11T15:54:17Z container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 phase: Failed status: [ContainerStatus(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, exitCode=1, finishedAt=Time(time=2020-05-11T15:55:27Z, additionalProperties={}), message=null, reason=Error, signal=null, startedAt=Time(time=2020-05-11T15:55:03Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})] 2020-05-11 11:55:29 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses: Container name: spark-kubernetes-driver Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 Container state: Terminated Exit code: 1 2020-05-11 11:55:29 INFO Client:54 - Application emp_reporting finished. 2020-05-11 11:55:29 INFO ShutdownHookManager:54 - Shutdown hook called 2020-05-11 11:55:29 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-8ba3c338-3eae-4bc7-9c25-3146642b9207 ``` ## airflow logs examples ### Scenario 1: job success, log return from airflow ,get right status: ```log [dawany@dawany-inf env_qa]$ kubectl describe pod rmtextract-1589270449587-driver -n batch-pipeline-qa Name: rmtextract-1589270449587-driver Namespace: batch-pipeline-qa Priority: 0 Node: 10.93.122.236/10.93.122.236 Start Time: Tue, 12 May 2020 04:00:51 -0400 Labels: spark-app-selector=spark-2f7caeac92c94710ab54cf40b5ff29e5 spark-role=driver Annotations: kubernetes.io/psp: db2oltp-dev-psp Status: Succeeded IP: 172.30.244.25 IPs: <none> Containers: spark-kubernetes-driver: Container ID: containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6 Image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 Image ID: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e Ports: 7078/TCP, 7079/TCP, 4040/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: driver --properties-file /opt/spark/conf/spark.properties --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob spark-internal 0 3 8 State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 12 May 2020 04:01:07 -0400 Finished: Tue, 12 May 2020 04:52:03 -0400 Ready: False Restart Count: 0 Limits: memory: 11Gi Requests: cpu: 1 memory: 11Gi Environment: SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP) SPARK_LOCAL_DIRS: /var/data/spark-02fa9b25-af0b-430d-ab75-aee12dd217a7 SPARK_CONF_DIR: /opt/spark/conf Mounts: /opt/spark/conf from spark-conf-volume (rw) /opt/spark/secrets from dswsecret-volume (rw) /var/data/spark-02fa9b25-af0b-430d-ab75-aee12dd217a7 from spark-local-dir-1 (rw) /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: spark-local-dir-1: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> dswsecret-volume: Type: Secret (a volume populated by a Secret) SecretName: dswsecret Optional: false spark-conf-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: rmtextract-1589270449587-driver-conf-map Optional: false spark-token-fpqpz: Type: Secret (a volume populated by a Secret) SecretName: spark-token-fpqpz Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 600s node.kubernetes.io/unreachable:NoExecute for 600s Events: <none> ``` ```log *** Reading local file: /home/airflow/airflow/logs/Transformation_Rmt_Master_Load_Sequence_Adhoc_Main/Rmt_Extract/2020-05-12T04:00:00+00:00/1.log [2020-05-12 03:00:43,970] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [queued]> [2020-05-12 03:00:44,074] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [queued]> [2020-05-12 03:00:44,074] {taskinstance.py:879} INFO - -------------------------------------------------------------------------------- [2020-05-12 03:00:44,074] {taskinstance.py:880} INFO - Starting attempt 1 of 2 [2020-05-12 03:00:44,074] {taskinstance.py:881} INFO - -------------------------------------------------------------------------------- [2020-05-12 03:00:44,173] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): Rmt_Extract> on 2020-05-12T04:00:00+00:00 [2020-05-12 03:00:44,176] {standard_task_runner.py:53} INFO - Started process 11051 to run task [2020-05-12 03:00:44,619] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [running]> kafka02.cloud.ibm.com [2020-05-12 03:00:45,043] {logging_mixin.py:112} INFO - [2020-05-12 03:00:45,043] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX [2020-05-12 03:00:45,071] {logging_mixin.py:112} INFO - [2020-05-12 03:00:45,071] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8 [2020-05-12 03:00:47,353] {logging_mixin.py:112} INFO - [2020-05-12 03:00:47,352] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [2020-05-12 03:00:49,643] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,643] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config). [2020-05-12 03:00:49,643] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,643] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly. [2020-05-12 03:00:49,644] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,644] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. [2020-05-12 03:00:50,889] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,888] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:50 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - node name: N/A [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - start time: N/A [2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - container images: N/A [2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,910] {spark_submit_hook.py:479} INFO - phase: Pending [2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,910] {spark_submit_hook.py:479} INFO - status: [] [2020-05-12 03:00:50,912] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,912] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:50 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,912] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236 [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - start time: N/A [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - container images: N/A [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - phase: Pending [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - status: [] [2020-05-12 03:00:51,163] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,163] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:51 INFO Client: Waiting for application Rmt_Extract to finish... [2020-05-12 03:00:51,406] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,406] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:51 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236 [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - phase: Pending [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] [2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - 20/05/12 03:01:07 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver [2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236 [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - phase: Running [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-12T08:01:07Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] [2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver [2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236 [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - phase: Succeeded [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, exitCode=0, finishedAt=2020-05-12T08:52:03Z, message=null, reason=Completed, signal=null, startedAt=2020-05-12T08:01:07Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})] [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,882] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO LoggingPodStatusWatcherImpl: Container final statuses: [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container state: Terminated [2020-05-12 03:52:13,884] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Exit code: 0 [2020-05-12 03:52:13,884] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,884] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO Client: Application Rmt_Extract finished. [2020-05-12 03:52:13,889] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,889] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO ShutdownHookManager: Shutdown hook called [2020-05-12 03:52:13,890] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,890] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO ShutdownHookManager: Deleting directory /tmp/spark-8c6f511c-4c0f-451e-b686-7d0f444d8f59 [2020-05-12 03:52:14,006] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=Transformation_Rmt_Master_Load_Sequence_Adhoc_Main, task_id=Rmt_Extract, execution_date=20200512T040000, start_date=20200512T080043, end_date=20200512T085214 [2020-05-12 03:52:19,162] {logging_mixin.py:112} INFO - [2020-05-12 03:52:19,161] {local_task_job.py:103} INFO - Task exited with return code 0 ``` ### Scenario 2:job failed , log return from airflow ,get right status: ```log [dawany@dawany-inf env_qa]$ kubectl describe pod merge-rptg-appl-dscr-1589248907865-driver -n batch-pipeline-qa Name: merge-rptg-appl-dscr-1589248907865-driver Namespace: batch-pipeline-qa Priority: 0 Node: 10.74.200.186/10.74.200.186 Start Time: Mon, 11 May 2020 22:01:52 -0400 Labels: spark-app-selector=spark-1e9e33df2d5c488dbc0555a618577f40 spark-role=driver Annotations: kubernetes.io/psp: db2oltp-dev-psp Status: Failed IP: 172.30.21.6 IPs: <none> Containers: spark-kubernetes-driver: Container ID: containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31 Image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 Image ID: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885 Ports: 7078/TCP, 7079/TCP, 4040/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: driver --properties-file /opt/spark/conf/spark.properties --class com.ibm.cio.dswim.ingest.CommonMerge spark-internal -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF State: Terminated Reason: Error Exit Code: 1 Started: Mon, 11 May 2020 22:02:19 -0400 Finished: Mon, 11 May 2020 22:05:03 -0400 Ready: False Restart Count: 0 Limits: memory: 896Mi Requests: cpu: 1 memory: 896Mi Environment: SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP) SPARK_LOCAL_DIRS: /var/data/spark-8688577b-07b4-4cf5-b362-1a15003420ae SPARK_CONF_DIR: /opt/spark/conf Mounts: /opt/spark/conf from spark-conf-volume (rw) /opt/spark/secrets from dswsecret-volume (rw) /var/data/spark-8688577b-07b4-4cf5-b362-1a15003420ae from spark-local-dir-1 (rw) /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: spark-local-dir-1: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> dswsecret-volume: Type: Secret (a volume populated by a Secret) SecretName: dswsecret Optional: false spark-conf-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: merge-rptg-appl-dscr-1589248907865-driver-conf-map Optional: false spark-token-fpqpz: Type: Secret (a volume populated by a Secret) SecretName: spark-token-fpqpz Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 600s node.kubernetes.io/unreachable:NoExecute for 600s Events: <none> [dawany@dawany-inf env_qa]$ ``` ```log *** Reading local file: /home/airflow/airflow/logs/Batch_xref.rptg_appl_dscr/merge_rptg_appl_dscr/2020-05-11T02:01:00+00:00/1.log [2020-05-11 21:01:41,692] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [queued]> [2020-05-11 21:01:41,801] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [queued]> [2020-05-11 21:01:41,801] {taskinstance.py:879} INFO - -------------------------------------------------------------------------------- [2020-05-11 21:01:41,801] {taskinstance.py:880} INFO - Starting attempt 1 of 2 [2020-05-11 21:01:41,802] {taskinstance.py:881} INFO - -------------------------------------------------------------------------------- [2020-05-11 21:01:41,911] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): merge_rptg_appl_dscr> on 2020-05-11T02:01:00+00:00 [2020-05-11 21:01:41,919] {standard_task_runner.py:53} INFO - Started process 37536 to run task [2020-05-11 21:01:42,508] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [running]> kafka02.cloud.ibm.com [2020-05-11 21:01:43,042] {logging_mixin.py:112} INFO - [2020-05-11 21:01:43,042] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX [2020-05-11 21:01:43,055] {logging_mixin.py:112} INFO - [2020-05-11 21:01:43,055] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF [2020-05-11 21:01:45,650] {logging_mixin.py:112} INFO - [2020-05-11 21:01:45,649] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [2020-05-11 21:01:47,988] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,988] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config). [2020-05-11 21:01:47,989] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,989] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly. [2020-05-11 21:01:47,989] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,989] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. [2020-05-11 21:01:49,267] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,266] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,282] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345 [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - node name: N/A [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - start time: N/A [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - container images: N/A [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - phase: Pending [2020-05-11 21:01:49,285] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,285] {spark_submit_hook.py:479} INFO - status: [] [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,291] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345 [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186 [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - start time: N/A [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - container images: N/A [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - phase: Pending [2020-05-11 21:01:49,294] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - status: [] [2020-05-11 21:01:49,550] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,550] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO Client: Waiting for application merge-rptg-appl-dscr to finish... [2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:53 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver [2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345 [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186 [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - phase: Pending [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] [2020-05-11 21:02:20,393] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,393] {spark_submit_hook.py:479} INFO - 20/05/11 21:02:20 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,393] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345 [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186 [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - phase: Running [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-12T02:02:19Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345 [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186 [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z [2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 [2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - phase: Failed [2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, exitCode=1, finishedAt=2020-05-12T02:05:03Z, message=null, reason=Error, signal=null, startedAt=2020-05-12T02:02:19Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})] [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO LoggingPodStatusWatcherImpl: Container final statuses: [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 [2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - Container state: Terminated [2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - Exit code: 1 [2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO Client: Application merge-rptg-appl-dscr finished. [2020-05-11 21:06:00,968] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,968] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO ShutdownHookManager: Shutdown hook called [2020-05-11 21:06:00,969] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,969] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-88cb9622-28cd-42e4-aaea-48a2206ae942 [2020-05-11 21:06:01,098] {taskinstance.py:1145} ERROR - Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF. Error code is: 0. Traceback (most recent call last): File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task result = task_copy.execute(context=context) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 187, in execute self._hook.submit(self._application) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 405, in submit self._mask_cmd(spark_submit_cmd), returncode airflow.exceptions.AirflowException: Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF. Error code is: 0. [2020-05-11 21:06:01,100] {taskinstance.py:1168} INFO - Marking task as UP_FOR_RETRY [2020-05-11 21:06:03,316] {logging_mixin.py:112} INFO - [2020-05-11 21:06:03,313] {local_task_job.py:103} INFO - Task exited with return code 1 ``` ### Scenario 3:job success, log return from airflow ,get wrong status(This Scenario is what we want to analysis) ```log [dawany@dawany-inf env_qa]$ kubectl describe pod rmtextract-1589227268554-driver -n batch-pipeline-qa Name: rmtextract-1589227268554-driver Namespace: batch-pipeline-qa Priority: 0 Node: 10.93.122.236/10.93.122.236 Start Time: Mon, 11 May 2020 16:01:10 -0400 Labels: spark-app-selector=spark-836f53b29a274eabbeba208d27e242de spark-role=driver Annotations: kubernetes.io/psp: db2oltp-dev-psp Status: Succeeded IP: 172.30.244.35 IPs: <none> Containers: spark-kubernetes-driver: Container ID: containerd://9608ccf92d1067645b6cbfcd289e6c99c76cd57463ed7c1fb71352a38a27a58c Image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 Image ID: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e Ports: 7078/TCP, 7079/TCP, 4040/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: driver --properties-file /opt/spark/conf/spark.properties --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob spark-internal 0 3 8 State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 11 May 2020 16:01:31 -0400 Finished: Mon, 11 May 2020 16:50:14 -0400 Ready: False Restart Count: 0 Limits: memory: 11Gi Requests: cpu: 1 memory: 11Gi Environment: SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP) SPARK_LOCAL_DIRS: /var/data/spark-2bb8b417-ca61-4e7a-a4b2-fa0a695a1109 SPARK_CONF_DIR: /opt/spark/conf Mounts: /opt/spark/conf from spark-conf-volume (rw) /opt/spark/secrets from dswsecret-volume (rw) /var/data/spark-2bb8b417-ca61-4e7a-a4b2-fa0a695a1109 from spark-local-dir-1 (rw) /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: spark-local-dir-1: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> dswsecret-volume: Type: Secret (a volume populated by a Secret) SecretName: dswsecret Optional: false spark-conf-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: rmtextract-1589227268554-driver-conf-map Optional: false spark-token-fpqpz: Type: Secret (a volume populated by a Secret) SecretName: spark-token-fpqpz Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 600s node.kubernetes.io/unreachable:NoExecute for 600s Events: <none> ``` ```log *** Reading local file: /home/airflow/airflow/logs/Transformation_Rmt_Master_Load_Sequence_Adhoc_Main/Rmt_Extract/2020-05-11T16:00:00+00:00/1.log [2020-05-11 15:01:02,589] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [queued]> [2020-05-11 15:01:02,722] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [queued]> [2020-05-11 15:01:02,722] {taskinstance.py:879} INFO - -------------------------------------------------------------------------------- [2020-05-11 15:01:02,722] {taskinstance.py:880} INFO - Starting attempt 1 of 2 [2020-05-11 15:01:02,722] {taskinstance.py:881} INFO - -------------------------------------------------------------------------------- [2020-05-11 15:01:02,826] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): Rmt_Extract> on 2020-05-11T16:00:00+00:00 [2020-05-11 15:01:02,831] {standard_task_runner.py:53} INFO - Started process 39984 to run task [2020-05-11 15:01:03,415] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [running]> kafka02.cloud.ibm.com [2020-05-11 15:01:03,970] {logging_mixin.py:112} INFO - [2020-05-11 15:01:03,969] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX [2020-05-11 15:01:04,000] {logging_mixin.py:112} INFO - [2020-05-11 15:01:04,000] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8 [2020-05-11 15:01:06,349] {logging_mixin.py:112} INFO - [2020-05-11 15:01:06,348] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [2020-05-11 15:01:08,607] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,607] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config). [2020-05-11 15:01:08,608] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,607] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly. [2020-05-11 15:01:08,608] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,608] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. [2020-05-11 15:01:09,773] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,773] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,784] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:09 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6 [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - node name: N/A [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - start time: N/A [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - container images: N/A [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - phase: Pending [2020-05-11 15:01:09,787] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,787] {spark_submit_hook.py:479} INFO - status: [] [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,796] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:09 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6 [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236 [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - start time: N/A [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - container images: N/A [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - phase: Pending [2020-05-11 15:01:09,799] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - status: [] [2020-05-11 15:01:10,080] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,080] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:10 INFO Client: Waiting for application Rmt_Extract to finish... [2020-05-11 15:01:10,307] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,307] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:10 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,307] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6 [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236 [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - start time: 2020-05-11T20:01:10Z [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - phase: Pending [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:32 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6 [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - service account name: spark [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236 [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - start time: 2020-05-11T20:01:10Z [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - phase: Running [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://9608ccf92d1067645b6cbfcd289e6c99c76cd57463ed7c1fb71352a38a27a58c, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-11T20:01:31Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] [2020-05-11 15:49:57,933] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,933] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO LoggingPodStatusWatcherImpl: Container final statuses: [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container state: Running [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container started at: 2020-05-11T20:01:31Z [2020-05-11 15:49:57,935] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO Client: Application Rmt_Extract finished. [2020-05-11 15:49:57,936] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,936] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO ShutdownHookManager: Shutdown hook called [2020-05-11 15:49:57,937] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,937] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-5bfba41b-8cab-429b-b64f-dcb4f52c4d3a [2020-05-11 15:49:58,101] {taskinstance.py:1145} ERROR - Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8. Error code is: 0. Traceback (most recent call last): File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task result = task_copy.execute(context=context) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 187, in execute self._hook.submit(self._application) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 405, in submit self._mask_cmd(spark_submit_cmd), returncode airflow.exceptions.AirflowException: Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8. Error code is: 0. [2020-05-11 15:49:58,102] {taskinstance.py:1168} INFO - Marking task as UP_FOR_RETRY [2020-05-11 15:50:00,600] {logging_mixin.py:112} INFO - [2020-05-11 15:50:00,599] {local_task_job.py:103} INFO - Task exited with return code 1 ``` ## submit job in Scenario 3, by manual, get k8s logs ### Scenario 3:log return from k8s ```log [dawany@dawany-inf env_qa]$ kubectl describe pod -n batch-pipeline-qa testquoterptfact-1589301844228-driver Name: testquoterptfact-1589301844228-driver Namespace: batch-pipeline-qa Priority: 0 Node: 10.74.200.157/10.74.200.157 Start Time: Tue, 12 May 2020 12:44:09 -0400 Labels: spark-app-selector=spark-6ddf413798da4c5d83645a7bc760a925 spark-role=driver Annotations: kubernetes.io/psp: db2oltp-dev-psp Status: Succeeded IP: 172.30.0.6 IPs: <none> Containers: spark-kubernetes-driver: Container ID: containerd://d5479e5ff5c582db541e8b545953981d705024f73861a0fddd506a3b11999e4b Image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 Image ID: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e Ports: 7078/TCP, 7079/TCP, 4040/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: driver --properties-file /opt/spark/conf/spark.properties --class com.ibm.cio.dswim.qrf.job.QuoteRptFactJob spark-internal 1 State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 12 May 2020 12:44:37 -0400 Finished: Tue, 12 May 2020 14:52:57 -0400 Ready: False Restart Count: 0 Limits: cpu: 4 memory: 16896Mi Requests: cpu: 1 memory: 16896Mi Environment: SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP) SPARK_LOCAL_DIRS: /var/data/spark-e4c8fd1e-8b33-449c-b460-c842df658705 SPARK_CONF_DIR: /opt/spark/conf Mounts: /opt/spark/conf from spark-conf-volume (rw) /opt/spark/secrets/ from dswsecret-volume (rw) /var/data/spark-e4c8fd1e-8b33-449c-b460-c842df658705 from spark-local-dir-1 (rw) /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: spark-local-dir-1: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> dswsecret-volume: Type: Secret (a volume populated by a Secret) SecretName: dswsecret Optional: false spark-conf-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: testquoterptfact-1589301844228-driver-conf-map Optional: false spark-token-fpqpz: Type: Secret (a volume populated by a Secret) SecretName: spark-token-fpqpz Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 600s node.kubernetes.io/unreachable:NoExecute for 600s Events: <none> ``` ```log 2020-05-12 12:43:59 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2020-05-12 12:44:06 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: testquoterptfact-1589301844228-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver pod uid: ff918848-2819-4336-a4a3-654f01dd756c creation time: 2020-05-12T16:44:06Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: N/A start time: N/A container images: N/A phase: Pending status: [] 2020-05-12 12:44:06 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: testquoterptfact-1589301844228-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver pod uid: ff918848-2819-4336-a4a3-654f01dd756c creation time: 2020-05-12T16:44:06Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.157 start time: N/A container images: N/A phase: Pending status: [] 2020-05-12 12:44:07 INFO Client:54 - Waiting for application test_quote_rpt_fact to finish... 2020-05-12 12:44:09 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: testquoterptfact-1589301844228-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver pod uid: ff918848-2819-4336-a4a3-654f01dd756c creation time: 2020-05-12T16:44:06Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.157 start time: 2020-05-12T16:44:09Z container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 phase: Pending status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] 2020-05-12 12:44:37 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: testquoterptfact-1589301844228-driver namespace: batch-pipeline-qa labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver pod uid: ff918848-2819-4336-a4a3-654f01dd756c creation time: 2020-05-12T16:44:06Z service account name: spark volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz node name: 10.74.200.157 start time: 2020-05-12T16:44:09Z container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 phase: Running status: [ContainerStatus(containerID=containerd://d5479e5ff5c582db541e8b545953981d705024f73861a0fddd506a3b11999e4b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-12T16:44:37Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] 2020-05-12 13:43:51 INFO WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0) 2020-05-12 13:43:53 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses: Container name: spark-kubernetes-driver Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 Container state: Running Container started at: 2020-05-12T16:44:37Z 2020-05-12 13:43:53 INFO Client:54 - Application test_quote_rpt_fact finished. 2020-05-12 13:43:53 INFO ShutdownHookManager:54 - Shutdown hook called 2020-05-12 13:43:53 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-09a9bf16-11ee-43ea-95f0-e970b0ea7578 ``` ## Difference is here ```log 2020-05-12 13:43:51 INFO WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0) 2020-05-12 13:43:53 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses: Container name: spark-kubernetes-driver Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 Container state: Running Container started at: 2020-05-12T16:44:37Z 2020-05-12 13:43:53 INFO Client:54 - Application test_quote_rpt_fact finished. 2020-05-12 13:43:53 INFO ShutdownHookManager:54 - Shutdown hook called 2020-05-12 13:43:53 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-09a9bf16-11ee-43ea-95f0-e970b0ea7578 ``` ## conclusion: Compared logs details ,there is difference when log is terminated: 1. airflow side & source code log stream interrupt and no 'exit code' return, so 'self._spark_exit_code' will be initial value 'None' then airflow will mark job failed though the job is still running in k8s actually. here is a similar issue:https://issues.apache.org/jira/browse/AIRFLOW-6244 ```python class SparkSubmitHook def submit(self, application="", **kwargs): """ Remote Popen to execute the spark-submit job :param application: Submitted application, jar or py file :type application: str :param kwargs: extra arguments to Popen (see subprocess.Popen) """ spark_submit_cmd = self._build_spark_submit_command(application) if hasattr(self, '_env'): env = os.environ.copy() env.update(self._env) kwargs["env"] = env self._submit_sp = subprocess.Popen(spark_submit_cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=-1, universal_newlines=True, **kwargs) self._process_spark_submit_log(iter(self._submit_sp.stdout.readline, '')) returncode = self._submit_sp.wait() # Check spark-submit return code. In Kubernetes mode, also check the value # of exit code in the log, as it may differ. if returncode or (self._is_kubernetes and self._spark_exit_code != 0): raise AirflowException( "Cannot execute: {}. Error code is: {}.".format( self._mask_cmd(spark_submit_cmd), returncode ) ) ... def _process_spark_submit_log(self, itr): # Consume the iterator for line in itr: line = line.strip() # If we run Kubernetes cluster mode, we want to extract the driver pod id # from the logs so we can kill the application when we stop it unexpectedly elif self._is_kubernetes: match = re.search(r'\s*pod name: ((.+?)-([a-z0-9]+)-driver)', line) if match: self._kubernetes_driver_pod = match.groups()[0] self.log.info("Identified spark driver pod: %s", self._kubernetes_driver_pod) # Store the Spark Exit code match_exit_code = re.search(r'\s*Exit code: (\d+)', line) if match_exit_code: self._spark_exit_code = int(match_exit_code.groups()[0]) ... self.log.info(line) ``` 2. k8s side (k8s client) 2020-05-12 12:51:40 INFO WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0) these are nothing to worry about. This is a known occurrence in Kubernetes and is not an issue [0]. The API server ends watch requests when they are very old. The operator uses a client-go informer, which takes care of automatically re-listing the resource and then restarting the watch from the latest resource version. fabric8:https://github.com/fabric8io/kubernetes-client/issues/1075 https://stackoverflow.com/questions/52910322/kubernetes-resource-versioning/52925973#52925973 ```java private long nextReconnectInterval() { int exponentOfTwo = currentReconnectAttempt.getAndIncrement(); if (exponentOfTwo > maxIntervalExponent) exponentOfTwo = maxIntervalExponent; long ret = reconnectInterval * (1 << exponentOfTwo); logger.debug("Current reconnect backoff is " + ret + " milliseconds (T" + exponentOfTwo + ")"); return ret; } ``` 3. spark side http://mail-archives.apache.org/mod_mbox/spark-issues/201805.mbox/%3cjira.13158986.1526264708000.69213.1526330460...@atlassian.jira%3E https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16474770#comment-16474770 pull reuqest:https://github.com/apache/spark/pull/28423 ## ask for suggestions so is there some suggestions to avoid this issue? ## actions may be considered ### [airflow][source_code_change] modify source code in airflow,use 'kubectl describe pod xxxx -n xxxx' every several seconds instead of log stream. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org