Hi Larry, Thanks for your thoughts on that. "How Knox fits into the traditional hadoop programming model" is exactly the question.
Few of the java client libs are wrapped in equivalent http-oriented lib, and we certainly don't want the apps running inside Yarn to request the Knox's API (it would be a severe bottleneck). That's my personal assumption, correct me if I am wrong. So, the point here is : if we lock a production cluster with Knox+ strict firewalling, we still must allow access to the ports of the core services on the development environments. Consequences : 1. As you said, one will probably have to limit the dev envs to subsets of (or anonymized) data. 2. When the tested app will be deployed in production, it will be very, very hard to troubleshoot (through Knox again) in case we encounter bugs at run-time. 3. I guess that different security policies and deployment methods among the clusters would be very confusing for the devs, and the scientists : "On the dev you have an edge node, but not in production. Good luck dear scientist" 4. Again, Hadoop administration, even with a proxied Ambari UI and some enhancements to Knox-shell, would be extremely challenging (impossible ?) as well. For these reasons, I am very confident in that we will continue to encourage the usage of Knox for it's strengths : - Single REST API Access Point - Centralized authentication for Hadoop REST/HTTP services but I am still wondering if the following promises written by vendors are realistic, and how ? - Eliminates SSH edge node risks - Hides Network Topology Another side-effect, is that the integration with the Hadoop Rest Apis becomes a restrictive requirement for most of our tools (ETL, Viz, Datascience, from vendors and community). I hope I have correctly explained my interrogations, with a not too bad english :-) I am interested for any feedback on these concerns. Thanks for reading. Damien 2017-03-15 0:53 GMT+01:00 larry mccay <[email protected]>: > Hi Damien - > > Interesting questions... > > I suspect that development environments are quite varying in configuration > but for the most part that they are not typical production deployment > configurations. > > With recent focus on the KnoxShell DSL and SDK classes it makes sense to > try and determine what the programming model is for the use of those > aspects of Knox. However, the question you ask is how Knox fits into the > traditional hadoop programming model, environment and flow. > > If you have anything particular in mind, I would be interested in hearing > what you think. > > Perimeter security is certainly achievable but I guess there are valid > questions as to what sort of deployments are generally available for such > development. If you need access to the actual data does it push you to > development in production-like environments? > > Again, I'm not sure what you have in mind here but interested to hear more. > > thanks, > > --larry > > > On Tue, Mar 14, 2017 at 5:54 PM, Damien Claveau <[email protected]> > wrote: > >> Hi, >> >> First time emailing the user mailing list. >> >> We currently use Knox successfully on several Kerberized clusters in >> production, >> >> and mainly use it to integrate with external client applications (such as >> ETL and Viz tools), >> >> We would like to promote and generalize the concept of a single Rest access >> point for all services, >> >> then, in an ideal world, ban access from the outside world to the RPC and >> Thrift interfaces of the core hadoop services. >> >> >> The question is ... >> >> Even if we can deploy binaries, scripts, workflows to hdfs and submit or >> schedule them through Knox, >> >> At the very beginning, the developpers of course have to code apps (say >> Spark jobs) >> that are designed to run natively inside the cluster (and will use Java >> client libs to access the Thrift interfaces). >> >> How do you deal with that need ? >> Do they develop on sandboxed environments or their own laptop without Knox, >> and so Knox only applies to the production/target clusters ? >> Is the promise of a "Perimeter Level Security" really achievable ? >> >> Thank you for your feedback. >> >> Damien Claveau >> >> France >> >> >> >> >> > -- *Damien Claveau* *MOBILE* 06 60 31 47 84 • *E-MAIL* [email protected]
