I think we are on the same page regarding SSL.

Regarding (1) it's best that I defer to the drill experts but I will mention 
that sharing session state can greatly complicate scalability. Since switching 
drillbits should be a rare event, it is probably more scalable to send back to 
the client a token which represents the authenticated identity (encrypted and 
signed of course). Then should that show up at another drillbit, the user 
authentication state can be reestablished. Other state such as caches would be 
lost of course. I don't know enough about Drill internals of course - there may 
be other state issues beyond just authentication.

Keys
_______________________________
Keys Botzum
Distinguished Engineer, Field Engineering
kbot...@maprtech.com<mailto:kbot...@maprtech.com>
443-718-0098
MapR Technologies
http://www.mapr.com



On Jun 23, 2017, at 11:33 AM, John Omernik 
<j...@omernik.com<mailto:j...@omernik.com>> wrote:

So a few things

1. The issue is that as is,  SSL stuff works fine, but when the IP address
that DNS returns for the hostname changes, the session is invalidated and I
am forced to logon again... this is annoying and loses session context
information.  If I try to lay out my cluster differently, i.e. using the
wildcard certs and the different marathon layout, I then have different
issues. I can connect by IP, but then I lose the SSL Validity.  That's
where the context for SSL comes into play.  My main issue is with the IP
returned for a DNS request changing during the course of a session,
invalidating it.  I think what it comes down to for me is this statement:

As a user I connect to a drill cluster

A simple statement, but what that means is as a user or an admin, my
users/code accessing the cluster shouldn't have to care which individual
node they connect to, they are connecting to a cluster.  This is over
simplifying things, but session ids managed by the cluster via Zookeeper
would solve this.


2. I am looking at doing the SSL handling overrides n my python code,
requests has some handlers for SSL and I was looking to address this,
however, there is bug in how it works because it drops my custom port
value... I am working on this now with the python requests folks.  (i.e.
the custom handlers would work, but only if I was connecting port 443)





On Fri, Jun 23, 2017 at 9:52 AM, Keys Botzum 
<kbot...@mapr.com<mailto:kbot...@mapr.com>> wrote:

There is something here I'm not understanding. In the below the hostname
is always the same so there should be no problem as long as all drillbits
share a common signer.

I'm also just not following how certificate authentication issues are even
linked to the Drill session issues. Whether or not there is a Drill
session, the SSL handshake rules still apply. Or there is something here I
just don't understand - quite possibly of course. I'm just focused on the
SSL issue as this I understand very well.

Incidentally, regarding hostname verification, I'm not familiar with what
controls you have but many libraries (including Java) give you the ability
to write your own SSL verifier which is called only when the default
hostname verification fails. In that code you can implement different
rules. Perhaps you can find a rule that meets your needs (such as a common
signer for all Drillbits). Remember that certificate hostname validation is
just a convention. There is nothing about SSL that makes this necessary.
Here's the Java version: https://docs.oracle.com/
javase/7/docs/api/javax/net/ssl/HostnameVerifier.html. In case you are
curious, this is how MapR's maprlogin works with HTTPS even though we use
IP address by default.

Keys
_______________________________
Keys Botzum
Distinguished Engineer, Field Engineering
kbot...@maprtech.com<mailto:kbot...@maprtech.com><mailto:kbot...@maprtech.com>
443-718-0098
MapR Technologies
http://www.mapr.com



On Jun 23, 2017, at 10:22 AM, John Omernik <j...@omernik.com<mailto:john@
omernik.com>> wrote:

The wild card certificate isn't a problem on it's own, it's using it in a
manner that allows me to maintain all of the various features I want.  Let
me lay this out,

In marathon I have a task, it runs a drill bit.  Since that task is located
at the node prod/drillprod (for my env it's role/instanceid) the domain
name is setup to be

drillprod-prod.marathon.mesos.

I can run X number of instances of that task. I tell Marathon to make them
"host unique" so no two drill bits end up on the same node.  This gives me
a few things

1. If choose there to be 3 drillbits running, they go and run, and I don't
have to worry about them.  If I have to reboot the node one of them is on,
Marathon says "oh look I am only running 2, let's spin up another, and then
I get my required 3 bits running automatically.

2. They use a common config directory located in MapR-FS this is really
nice because I don't have to maintain separate configurations for each
drill bit.

3. The name above, drillprod-prod.marathon.mesos, using nslookup returns

Name: drillprod-prod.marathon.mesos

Address: 192.168.0.105

Name: drillprod-prod.marathon.mesos

Address: 192.168.0.103

Name: drillprod-prod.marathon.mesos

Address: 192.168.0.104


Which is desired. When I have a client connect, I can program in a single
name (drillprod-prod.marathon.mesos) into my script and never have to
worry
about where the bits run on the cluster.  It looks it up and works great.
This has been my standard MO for scripts that do short lived things... I
haven't had an issue until this new use case came up. (long running
sessions for use in analytic notebooks is the use case BTW, just not super
relevant to go into details on that here)


Because of the DNS naming, my scripts get tossed around to different bits
depending on how the DNS round robin provides the IP which is desired for
various scripts.  The issue comes into play when I make a session
connection, and for some reason, (maybe after a cache time out or
something) python's requests object makes the next request, but does a DNS
lookup first causing the IP to change, and the session to invalidate. Not
awesome when working in a notebook.

The wildcard DNS "could" work, but there are some gotchas... I could create
an application folder in marathon with the same name, prod/drillprod, and
then in there I could create a task with the hostname for each host.

However, this would then make me loose on the HAness of my setup. If I am
trying to run 3 instances of bits, on nodes node104, node103, and node105,
and I need to reboot node105, in my setup, node102 could get the new bit
the dns name auto updates and I maintain HA with simplicity, however, with
a wildcard cert, I would still need to manually spin up a new instance to
maintain three instances.

In addition, I would have to get a list of the three nodes running to pick
one to connect to.  Lots of complex orchestration to use wild card certs to
maintain HA.

The reverse proxy will work for me, I can program nginx to pin
connections.  Thus, I can have it base which backend it goes to based on
the JSESSIONID, that should work, but I don't like it because it requires
another component running in my network, not bad for me, I can easily run
that on Zeta that won't be an issue at all, but as a whole, it's not ideal
for Drill users.

Thus I am back to the idea of Drill somehow maintaining a global state.
This is also important for Drill on Yarn setups (unless there is some sort
of application container proxy back to the bits). If you want to have
security (SSL) with hostnames, the session maintenance must be addressed.

So that's why I toss it out here... this is a desirable feature I would
imagine, even if people are not asking for it now, it may not because they
don't need it, but in their testing of Drill, and how they using it now, it
may not come up... when they have multiple people and services hitting
drill end points pointing them individual nodes for SSL management etc,
becomes a nightmare... thus, as a thought exercise, could be securely
maintain valid session ideas in Zookeeper for nodes to check on? What would
an ideal setup for something like that be?




On Fri, Jun 23, 2017 at 7:07 AM, Keys Botzum <kbot...@mapr.com<mailto:
kbot...@mapr.com>> wrote:

Why is a wildcard certificate a problem? They are quite common. One just
needs all of the Drillbits to share a common domain for the wildcard to be
easy and thus avoid having to list individual hosts.

Are you saying that you can't use hostnames and must use IPs?

In case I'm not clear, here's an example of what I'm saying.

this is good with wildcards: drill1.mydrill.corp.com<http:/
/drill1.mydrill.corp.com><http:/
/drill1.mydrill.corp.com<http://drill1.mydrill.corp.com>>,
drill2.mydrill.corp.com<http://drill2.mydrill.corp.com><http:/
/drill2.mydrill.corp.com<http://drill2.mydrill.corp.com>>,
drill3.mydrill.corp.com<http://drill3.mydrill.corp.com><http:/
/drill3.mydrill.corp.com<http://drill3.mydrill.corp.com>>,
drill4.mydrill.corp.com<http://drill4.mydrill.corp.com><http:/
/drill4.mydrill.corp.com<http://drill4.mydrill.corp.com>>,
this is bad with wildcards: drill1, drill2, drill3, drill4


Keys
_______________________________
Keys Botzum
MapR Technologies



On Jun 22, 2017, at 8:24 PM, John Omernik <j...@omernik.com<mailto:john@
omernik.com><mailto:john@
omernik.com<http://omernik.com>>> wrote:

Would there be interest in finding a way to globalize this? This is
challenging for me and others that may run drill with multi Tennant
orchestrators.  In my particular setup, each node running drill gets added
to an a record automatically giving me HA and distribution of Rest API
queries.  It also allows me to have a single certificate for my cluster
rather than managing certificates on a individual basis.   I set things up
to connect via IP but then I had certificate mismatch warnings. My goal is
to find a way to connect to the rest API , while maintaining a session to
single node, with out sacrificing HA and balancing and with compromising
ssl security.   I know it's a tall order, but if there I ideas outside of a
global state management I am all ears.

Note some ideas I've also considered:

1.  using a load balancer that would allow me to pin connections.  Not
ideal because it's another service to manage but it would work.

2. There may be a way to hack things with a wild card cert but it's seems
complicated and fragile.

On Jun 22, 2017 5:47 PM, "Sorabh Hamirwasia" <shamirwa...@mapr.com<mailto:
shamirwa...@mapr.com><mailto:
shamirwa...@mapr.com<mailto:shamirwa...@mapr.com>>> wrote:

Hi John,
As Paul mentioned session ID's are not global. Each session is part of the
BitToUserConnection instance created for a connection between Drillbit and
client. Hence it's local to that Drillbit only and the lifetime of the
session is tied to lifetime of the connection. You can find the code here<
https://github.com/apache/drill/blob/master/exec/
java-exec/src/main/java/org/apache/drill/exec/rpc/user/
UserServer.java#L102>.

Thanks,
Sorabh

________________________________
From: Paul Rogers <prog...@mapr.com>
Sent: Thursday, June 22, 2017 2:19:50 PM
To: user@drill.apache.org
Subject: Re: Drill Session ID between Nodes

Hi John,

I do not believe that session IDs are global. Each Drillbit maintains its
own concept of sessions. A global session would require some centralized
registry of sessions, which Drill does not have.

Would be great if someone can confirm…

- Paul

On Jun 22, 2017, at 12:14 PM, John Omernik <j...@omernik.com> wrote:

When I log onto a drill node, and get Session Id, if I connect to another
drill node in the cluster will the session id be valid?

I am guessing not, but want to validate.

My conumdrum, I have my Drill cluster running in such a way that the
connections to the nodes are load balanced via DNS. However, if I get a
DNS
IP while in session it appears to invalidate, and thus forces me to log
on...







Reply via email to