[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

kellyzly Thu, 19 Mar 2015 21:56:23 -0700

Github user kellyzly commented on the pull request:

    https://github.com/apache/spark/pull/4491#issuecomment-83903992
  
    @steveloughran:  Thanks for your valuables suggestions. 
     *  I will update latest crypto code in hadoop latest trunk and rewrite it 
to scala. I looked into HADOOP-11710 and found that it added "synchronized" in 
"flush", "close", "write" function in CryptoOutputStream.java to guarantee the 
thread safe. But I found that the outputStream 
["TimeTrackingOutputStream"](https://github.com/kellyzly/spark/blob/8a74eea7d926507242c50b28c56962b1f1db256a/core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala#L87)
 spark shuffle uses is not thread safe which means the situation when 
multi-threads write will not happen.
    
     > or use the YARN NM via whatever extension points need to be added.The 
latter tactic will avoid any native library path setup issues, and will allow 
alternative deployments (standalone, mesos) to switch to an encrypted shuffle 
later
     
     I don't understand the latter tactic, can extension points in YARN NM can 
resolve the native library path setup issues? if can, can you provide some 
links which describes more detaily?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

Reply via email to