Hi John, Your use case is interesting. I’m certainly not an expert in the network aspects of what you are trying to do, but I can take a short at the related Drill issues.
Drill’s primary use case is connecting via the Drill client (typically via JDBC or ODBC.) The Drill client handles security. It also allows SQL sessions, and hence session options. Your use case is based on the REST API. At present, the REST API is best described as a “prototype.” REST supports username/password login, and sessions associated with the login (on a single Drillbit). Sessions never timeout (as far as I can tell.) More importantly, the REST API returns all query results in a single message, encoded as JSON. This is great for small queries, but does not scale well when returning millions of large rows. (Hint: we are looking for contributions to improve the REST API!) As Keys pointed out, the important question is this: does your app need session state other than security? If so, then you need to consider overall SQL session state, not just SSL connections. If your script does “ALTER SESSION” followed by a query, then the ALTER SESSION might be sent to node A, with the query going to node B. Node B does not know about the session on A, and so results will be different than what you expect. The same is true with temp tables. Said another way, you’d like your scripts to do round-robin per *request*, but Drill is designed to do round-robin *per session.* (The Drill client, when using ZK, does random selection of nodes that achieves roughly the same result.) In short, your use case is clear, but is not supported today in Drill. Putting this together: 1. Sessions must be sticky to a single Drillbit so that session state, temp tables and so on are persisted (on that one Drillbit.) 2. If a session on one Drillbit drops, the app must establish a new session on another Drillbit. That involves not just security tokens and cookies, but also resetting session options, rebuilding temp tables, etc. 3. Since the app has to handle session recreation when switching Drillbits, the security issue, while a nuisance, is a necessary result of switching sessions. 4. (As Keys points out,) changing sessions is a rare event (due to timeouts, node failures, etc.) so session recovery should be rare. The only way to make sessions “portable” is to create a shared, global session shared across Drillbits, which is what you are proposing. Doing so is non-trivial: it requires a global session registry (or a way of synchronizing session state). Such sharing is not supported in Drill’s distributed, shared-nothing architecture. Could we add it? Probably, but not in the short term. If we ever find the need for a “metastore” (or central work scheduler), then at that time Drill would have a mechanism to support session portability; but that is a ways off. For the short term, can you perhaps rethink the use case given that sessions are local? How will your app handle failover? Is the security issue as much of a problem when seen as part of session recreation? (I’m not an expert here; I’m asking how this might work: are there things, short of persistent sessions, we can do to help?) You mentioned Drill-on-YARN (DoY). DoY is an interesting question. On the surface, REST works the same on DoY as in “regular” Drill: the REST endpoint doesn’t care how the Drillbit was launched. Whatever works with regular Drill will work with DoY. Under DOY, J/ODBC clients work as usual: they maintain a session with one Drillbit, and use ZK to find a fall-back Drillbit if the first one fails (with the need for the client to re-establish the SQL session state by resending session options, etc.) Can we improve this? Yes, if we did the work described earlier. (BTW: I’m still looking for volunteers to help with code reviews so we can contribute DoY to Apache Drill…) We have not yet looked into the security setup for DoY. (We wanted to get the security fully working with Drill itself first.) You raise some good issues that we must wrestle with as we enhance DoY to use the security features that are becoming available in Drill itself. Thanks, - Paul > On Jun 23, 2017, at 9:50 AM, John Omernik <[email protected]> wrote: > > That makes sense, ya, I would love to hear about the challenges of this in > general from the Drill folks. > > Also, I wonder if Paul R at MapR has any thoughts in how something like > this would be handled in the Drill on Yarn Setup. > > > John >
