attilapiros commented on pull request #31561: URL: https://github.com/apache/spark/pull/31561#issuecomment-779774930
Our logging is correct. Please check this part from https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39755/: ``` ===== TEST OUTPUT FOR o.a.s.deploy.k8s.integrationtest.KubernetesSuite: 'Test basic decommissioning' ===== 21/02/16 02:55:01.511 ScalaTest-main-running-KubernetesSuite INFO SparkAppLauncher: Launching a spark app with arguments SparkAppArguments(local:///opt/spark/tests/decommissioning.py,,[Ljava.lang.String;@22a0d4ea) and conf Map(spark.storage.decommission.rddBlocks.enabled -> true, spark.testing -> false, spark.kubernetes.driver.pod.name -> spark-test-app-c0fe036152a04b9baff24fe8373f8086, spark.kubernetes.driver.label.spark-app-locator -> 5aac85aec8f7408aaed061c5e64a295e, spark.storage.decommission.enabled -> true, spark.authenticate -> true, spark.executor.instances -> 3, spark.storage.decommission.shuffleBlocks.enabled -> true, spark.kubernetes.submission.waitAppCompletion -> false, spark.kubernetes.executor.label.spark-app-locator -> 5aac85aec8f7408aaed061c5e64a295e, spark.kubernetes.namespace -> b9dea872ae834407b989c472db0b536d, spark.kubernetes.authenticate.driver.serviceAccountName -> default, spark.app.name -> spark-test-app, spark.ui.enabled -> true, spark.storage.decommissi on.replicationReattemptInterval -> 1, spark.kubernetes.container.image -> docker.io/kubespark/spark-py:3.2.0-SNAPSHOT_d60a9058-c4ef-4c9e-bdeb-ec14355772e8, spark.master -> k8s://https://192.168.39.195:8443/, spark.decommission.enabled -> true, spark.executor.cores -> 1) 21/02/16 02:55:01.512 ScalaTest-main-running-KubernetesSuite INFO SparkAppLauncher: Launching a spark app with command line: /home/jenkins/workspace/SparkPullRequestBuilder-K8s/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/spark-submit --deploy-mode cluster --master k8s://https://192.168.39.195:8443/ --conf spark.storage.decommission.rddBlocks.enabled=true --conf spark.testing=false --conf spark.kubernetes.driver.pod.name=spark-test-app-c0fe036152a04b9baff24fe8373f8086 --conf spark.kubernetes.driver.label.spark-app-locator=5aac85aec8f7408aaed061c5e64a295e --conf spark.storage.decommission.enabled=true --conf spark.authenticate=true --conf spark.executor.instances=3 --conf spark.storage.decommission.shuffleBlocks.enabled=true --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.executor.label.spark-app-locator=5aac85aec8f7408aaed061c5e64a295e --conf spark.kubernetes.namespace=b9dea872ae834407b989c472db0b536d --conf spark .kubernetes.authenticate.driver.serviceAccountName=default --conf spark.app.name=spark-test-app --conf spark.ui.enabled=true --conf spark.storage.decommission.replicationReattemptInterval=1 --conf spark.kubernetes.container.image=docker.io/kubespark/spark-py:3.2.0-SNAPSHOT_d60a9058-c4ef-4c9e-bdeb-ec14355772e8 --conf spark.master=k8s://https://192.168.39.195:8443/ --conf spark.decommission.enabled=true --conf spark.executor.cores=1 local:///opt/spark/tests/decommissioning.py ... 21/02/16 02:55:05.338 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: 21/02/16 02:55:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-fbb3578e-2b89-45b9-934a-2ead64293b27 21/02/16 02:56:01.511 OkHttp WebSocket https://192.168.39.195:8443/... WARN WatchConnectionManager: Exec Failure java.net.SocketTimeoutException: sent ping but didn't receive pong within 30000ms (after 0 successful ping/pongs) at okhttp3.internal.ws.RealWebSocket.writePingFrame(RealWebSocket.java:546) at okhttp3.internal.ws.RealWebSocket$PingRunnable.run(RealWebSocket.java:530) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 21/02/16 02:58:13.067 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: ===== EXTRA LOGS FOR THE FAILED TEST 21/02/16 02:58:13.096 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: BEGIN driver POD log ++ id -u + myuid=185 ++ id -g + mygid=0 + set +e ++ getent passwd 185 + uidentry= + set -e + '[' -z '' ']' + '[' -w /etc/passwd ']' + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' + SPARK_CLASSPATH=':/opt/spark/jars/*' + env + grep SPARK_JAVA_OPT_ + sort -t_ -k4 -n + sed 's/[^=]*=\(.*\)/\1/g' + readarray -t SPARK_EXECUTOR_JAVA_OPTS + '[' -n '' ']' + '[' -z ']' + '[' -z ']' + '[' -n '' ']' + '[' -z ']' + '[' -z x ']' + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' + case "$1" in + shift 1 + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///opt/spark/tests/decommissioning.py 21/02/16 10:55:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting decom test Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 21/02/16 10:55:08 INFO SparkContext: Running Spark version 3.2.0-SNAPSHOT 21/02/16 10:55:08 INFO ResourceUtils: ============================================================== 21/02/16 10:55:08 INFO ResourceUtils: No custom resources configured for spark.driver. 21/02/16 10:55:08 INFO ResourceUtils: ============================================================== 21/02/16 10:55:08 INFO SparkContext: Submitted application: DecomTest 21/02/16 10:55:08 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 21/02/16 10:55:09 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor 21/02/16 10:55:09 INFO ResourceProfileManager: Added ResourceProfile id: 0 21/02/16 10:55:09 INFO SecurityManager: Changing view acls to: 185,jenkins 21/02/16 10:55:09 INFO SecurityManager: Changing modify acls to: 185,jenkins ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
