This is a duplicate of my stack overflow question here: https://stackoverflow.com/questions/57881044/verifying-in-transit-encryption-for-spark-shuffle
I'm running Spark over YARN on AWS EMR 5.20. I've followed the following guide for running in-transit encryption for spark shuffle: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/configuring-spark/content/configuring_spark_for_wire_encryption.html First off, this doc *only* refers to self-signed certs, and we're using a CA-signed cert. No big deal, I put the CA cert in the truststore. Unfortunately, I'm not in a position to use the built-in Amazon In Transit encryption, nor can I use Spark Defaults as we send along our spark assembly with our jobs to allow multiple versions to be used. The piece I'm tacking on to our spark jobs looks like this: spark.shuffle.encryption.enabled=true spark.ssl.enabled=true spark.ssl.keyPassword=***** spark.ssl.keyStore="/opt/my-cluster/keystore.jks" spark.ssl.keyStorePassword=***** spark.ssl.protocol=TLS spark.ssl.trustStore="/opt/my-cluster/truststore.jks” spark.ssl.trustStorePassword=***** spark.authenticate=true spark.network.crypto.enabled=true spark.enableSaslEncryption=true spark.ui.https.enabled=true spark.io.encryption.enabled=true spark.network.sasl.serverAlwaysEncrypt=true Jobs are running fine. I'm running a simple job I'm assuming will force a shuffle. Here's the code: import org.apache.spark.sql.SparkSession import scala.util.Random object SparkShuffleTest { def main(args: Array[String]) { val randomText = for (i <- Range(0,100000)) yield Random.nextPrintableChar() val spark = SparkSession.builder.appName("Simple Application").getOrCreate() val logData = spark.sparkContext.parallelize(randomText) val pairs = logData.map(c => (c, 1)) pairs.foreach(println(_)) val outputs = pairs.reduceByKey(_ + _).collect() outputs.foreach({case (a, b) => println(s"$a:$b")}) println("Outputs collected...") println(outputs) spark.stop() } } *So, here's the tough part:* If I screw around with the location of the keystore and change it to a bogus name, my jobs fail, as they should, because they can't find a valid keystore. However, if I do this to the *truststore*, there's no failure. It's like it's not even reading the truststore. How can I actually get this to encrypt, or what am I configuring wrong? Obviously, if I'm giving it a bogus truststore, it ought to fail at encrypting shuffle. Does that just not throw an error at all? Thanks!