Github user redsanket commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23166#discussion_r237652232
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
    @@ -708,16 +709,36 @@ private[spark] class PythonBroadcast(@transient var 
path: String) extends Serial
           override def handleConnection(sock: Socket): Unit = {
             val env = SparkEnv.get
             val in = sock.getInputStream()
    -        val dir = new File(Utils.getLocalDir(env.conf))
    -        val file = File.createTempFile("broadcast", "", dir)
    -        path = file.getAbsolutePath
    -        val out = env.serializerManager.wrapForEncryption(new 
FileOutputStream(path))
    +        val abspath = new File(path).getAbsolutePath
    +        val out = env.serializerManager.wrapForEncryption(new 
FileOutputStream(abspath))
    --- End diff --
    
    In the old version, we generated a random path with encryption turned off, 
so with encryption off it reads and writes from random path. When encryption 
related code was written we introduced a new "broadcast" path, the problem is 
when we tried to decrypt it on the driver side, it looks at the random path 
reference lying around and tries to decrypt from it but the actual data is in 
the new "broadcast" path location. So, by just passing the random reference, we 
make sure all the places are in sync with and without encryption


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to