Hi all, I’m evaluating enabling TLS (certificate-based TLS) for Spark RPC and shuffle traffic in our environments. I see Spark 4.0 added TLS/SSL support for RPC and block transfers (see SPARK-44937 and 4.0 security docs), but we are running Spark 3.* in prod.
Before I invest in a backport effort, I wanted to check whether anyone in the community has: * Already backported SPARK-44937 (or parts of it) to a Spark 3.x branch, and can share patches/PRs or distribution packages; or * Implemented an alternative production solution (e.g., proxies/Envoy/stunnel, cluster-level IPsec/VPN, distro-specific patches) to achieve TLS for executor <-> executor / shuffle / RPC traffic while staying on Spark 3.x. Details about our environment (for context): * Spark version: 3.1,3.3 * External shuffle service: yes, celeborn * Deploy mode: YARN * Java: Java 11 and Java 17 Questions: 1. Has anyone backported SPARK-44937 (or shipped the TLS changes on a 3.x branch)? If so, can you share links to patches or build artifacts? 2. If you used a proxy/sidecar approach (Envoy/stunnel/etc.), could you share architecture notes and perf tradeoffs? 3. Any compatibility gotchas we should watch for (external shuffle service mismatch, mixed-version clusters, etc.)? Links: * SPARK-44937: https://issues.apache.org/jira/browse/SPARK-44937 * Spark 4.0 security: https://spark.apache.org/docs/4.0.0/security.html * Spark 3.x encryption docs: https://downloads.apache.org/spark/docs/3.3.4/security.html Thanks — any pointers appreciated Guruprasad
