Reamer commented on a change in pull request #34143:
URL: https://github.com/apache/spark/pull/34143#discussion_r721146856



##########
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
##########
@@ -88,7 +88,6 @@ private[spark] object SparkKubernetesClientFactory extends 
Logging {
     val config = new ConfigBuilder(autoConfigure(kubeContext.getOrElse(null)))
       .withApiVersion("v1")
       .withMasterUrl(master)
-      .withWebsocketPingInterval(0)

Review comment:
       > Are you using this code in your production?
   
   Today I took the time to [build Spark with the 
fix](https://github.com/apache/spark/compare/v3.1.2...Reamer:k8s_api_websocket_ping_3_1)
 by myself and with that I can confirm the fix is working in my production 
environment.
   
   I was able to reproduce the erroneous behavior using the environment 
variable `CUBERNET_WEBSOCKET_PING_INTERVAL`. Other users can adjust this 
environment variable for their needs, but I think that the default value from 
the library is more than sufficient.
   
   ```
   pdallig@W-PDALLIG:~/Apps/spark-3.1.2-bin-hadoop3.2$ export 
KUBERNETES_WEBSOCKET_PING_INTERVAL=0
   pdallig@W-PDALLIG:~/Apps/spark-3.1.2-bin-hadoop3.2$ ./bin/pyspark
   Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
   [GCC 8.4.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   21/10/04 10:16:29 WARN Utils: Your hostname, W-PDALLIG resolves to a 
loopback address: 127.0.1.1; using 172.16.56.172 instead (on interface eno1)
   21/10/04 10:16:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
another address
   WARNING: An illegal reflective access operation has occurred
   WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
(file:/home/pdallig/Apps/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar)
 to constructor java.nio.DirectByteBuffer(long,int)
   WARNING: Please consider reporting this to the maintainers of 
org.apache.spark.unsafe.Platform
   WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
   WARNING: All illegal access operations will be denied in a future release
   21/10/04 10:16:29 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   21/10/04 10:16:31 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file
   21/10/04 10:16:31 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/home/pdallig/Apps/spark-3.1.2-bin-hadoop3.2/conf) : 
log4j.properties,spark-env.sh
   21/10/04 10:16:31 INFO BasicExecutorFeatureStep: Decommissioning not 
enabled, skipping shutdown script
   21/10/04 10:16:31 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/home/pdallig/Apps/spark-3.1.2-bin-hadoop3.2/conf) : 
log4j.properties,spark-env.sh
   21/10/04 10:16:32 INFO KubernetesClientUtils: Spark configuration files 
loaded from Some(/home/pdallig/Apps/spark-3.1.2-bin-hadoop3.2/conf) : 
log4j.properties,spark-env.sh
   21/10/04 10:16:32 INFO BasicExecutorFeatureStep: Decommissioning not 
enabled, skipping shutdown script
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 3.1.2
         /_/
   
   Using Python version 3.6.9 (default, Jan 26 2021 15:33:00)
   Spark context Web UI available at http://172.16.56.172:4040
   Spark context available as 'sc' (master = 
k8s://https://api.ocp.cloud.mycompany.com:443, app id = 
spark-application-1633335391515).
   SparkSession available as 'spark'.
   >>> 21/10/04 10:17:15 WARN WatchConnectionManager: Exec Failure
   java.io.EOFException
        at okio.RealBufferedSource.require(RealBufferedSource.java:61)
        at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
        at 
okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
        at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
        at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
        at 
okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
   21/10/04 10:17:56 WARN WatchConnectionManager: Exec Failure
   java.io.EOFException
        at okio.RealBufferedSource.require(RealBufferedSource.java:61)
        at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
        at 
okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
        at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
        at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
        at 
okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
   
   Traceback (most recent call last):
     File 
"/home/pdallig/Apps/spark-3.1.2-bin-hadoop3.2/python/pyspark/context.py", line 
285, in signal_handler
       raise KeyboardInterrupt()
   KeyboardInterrupt
   >>> 
   21/10/04 10:18:04 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
has been closed (this is expected if the application is shutting down.)
   ```
   @dongjoon-hyun How can I help further so that the pull request is merged? 
Should I prepare pull requests for the stable branches?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to