Hi all,

I’m evaluating enabling TLS (certificate-based TLS) for Spark RPC and
shuffle traffic in our environments. I see Spark 4.0 added TLS/SSL support
for RPC and block transfers (see SPARK-44937 and 4.0 security docs), but we
are running Spark 3.* in prod.

Before I invest in a backport effort, I wanted to check whether anyone in
the community has:

  * Already backported SPARK-44937 (or parts of it) to a Spark 3.x branch,
and can share patches/PRs or distribution packages; or
  * Implemented an alternative production solution (e.g.,
proxies/Envoy/stunnel, cluster-level IPsec/VPN, distro-specific patches) to
achieve TLS for executor <-> executor / shuffle / RPC traffic while staying
on Spark 3.x.

Details about our environment (for context):
  * Spark version: 3.1,3.3
  * External shuffle service: yes, celeborn
  * Deploy mode: YARN
  * Java: Java 11 and Java 17

Questions:
  1. Has anyone backported SPARK-44937 (or shipped the TLS changes on a 3.x
branch)? If so, can you share links to patches or build artifacts?
  2. If you used a proxy/sidecar approach (Envoy/stunnel/etc.), could you
share architecture notes and perf tradeoffs?
  3. Any compatibility gotchas we should watch for (external shuffle
service mismatch, mixed-version clusters, etc.)?

Links:
  * SPARK-44937: https://issues.apache.org/jira/browse/SPARK-44937
  * Spark 4.0 security: https://spark.apache.org/docs/4.0.0/security.html
  * Spark 3.x encryption docs:
https://downloads.apache.org/spark/docs/3.3.4/security.html

Thanks — any pointers appreciated
Guruprasad

Reply via email to