Github user redsanket commented on a diff in the pull request:
https://github.com/apache/spark/pull/23166#discussion_r237652232
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
---
@@ -708,16 +709,36 @@ private[spark] class PythonBroadcast(@transient var
path: String) extends Serial
override def handleConnection(sock: Socket): Unit = {
val env = SparkEnv.get
val in = sock.getInputStream()
- val dir = new File(Utils.getLocalDir(env.conf))
- val file = File.createTempFile("broadcast", "", dir)
- path = file.getAbsolutePath
- val out = env.serializerManager.wrapForEncryption(new
FileOutputStream(path))
+ val abspath = new File(path).getAbsolutePath
+ val out = env.serializerManager.wrapForEncryption(new
FileOutputStream(abspath))
--- End diff --
In the old version, we generated a random path with encryption turned off,
so with encryption off it reads and writes from random path. When encryption
related code was written we introduced a new "broadcast" path, the problem is
when we tried to decrypt it on the driver side, it looks at the random path
reference lying around and tries to decrypt from it but the actual data is in
the new "broadcast" path location. So, by just passing the random reference, we
make sure all the places are in sync with and without encryption
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]