Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/4491#issuecomment-83615286
  
    1. the `"InterfaceAudience.Private"` tags in Hadoop are a "please don't 
use` hint, although if you look at YARN AMs, they end up importing & using 
stuff which is tagged that way; you can't current do an AM which uses it. What 
it does mean is: they may be unintentionally changed, including signatures and 
semantics, and if they break your code, it's your responsibility to find that 
out and complain before the next hadoop release ships. Summary: test against 
hadoop trunk or at least beta releases.
    
    2. The crypto code is still encountering a few stabilisation problems 
related to multithreading, stuff that doesn't show up in the unit tests. The 
code in 2.6 has already be supplanted by the code in branch-2/trunk. Forking 
off your own code means tracking those changes and keeping in sync...keeping 
the code in Java would aid diffing and cherry picking there. Even without 
trying to handle the quirks of the extended Hadoop streams, concurrency issues 
like [HADOOP-11710](https://issues.apache.org/jira/browse/HADOOP-11710) may 
matter.
    
    3. There's also the problem that encryption performance comes from native 
binaries; which means for YARN deployments: either bind to the hadoop.so/.dll 
on the PATH , or push up a new version & extend PATH in container launch 
contexts, and on other deployments come up with new solutions. If you can stick 
to JCE routines (as this patch does) life may be simpler.
    
    A standalone security JAR+ library would be better, with code shared by 
both Hadoop & other apps. You could talk to the Hadoop project about isolating 
it in Hadoop itself, though that will imply a separate native build & lib, etc.
    
    The other tactic is to make the shuffle mechanism more pluggable, and on 
YARN clusters switch to an encrypted shuffle provided by a separate library, or 
use the YARN NM via whatever extension points need to be added. The latter 
tactic will avoid any native library path setup issues, and will allow 
alternative deployments (standalone, mesos) to switch to an encrypted shuffle 
later
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to