Rereading your post, I think there is some concern between embedded/single
drillbit mode, and distributed mode.
When you run multiple drillbits in distributed mode, you will (should) be
enabling authentication. Thus each user will log in to "a" drill bit.
There is no concern on which one, it's
You can have one user logged in using store.format of CSV in one query,
while another user uses store.format of parquet at the same time. The work
from query one, whether bit 1 or 2 will know to store that query as csv and
the work from query two, where ever it is, will be parquet.
Essentially,
Nice! So, to clarify, is this accurate?
1) login, alter session set to define store.format for that user
2) session stickiness (i.e. HAProxy or whatever else will support this)
will ensure that user gets same planner/drillbit, which contains the
session info
3) planner will farm out work to
Hmmm
So, if user 1 sets the store.format to CSV on Drillbit 1 and work gets
farmed out to Drillbit 2, this session setting will "travel" with the
user from drillbit to drillbit? We were originally thinking that this
would be the case if the session information was retained in Zookeeper,
Hi Joe,
To answer our question about how options "travel"...
Drill maintains system options in ZK. Session options are maintained per
connection on the Foreman Drillbit to which the user connects. This is why a
simple round-robin load balancer does not work: why load balancing has to be
Paul, let's talk about this race condition you mention
Let's use a real option here for clarity. store.format.
SYSTEM store.format is parquet
Scenario 1: I log on, I set SESSION store.format to csv and run CREATE
TABLE foo as select * from bar. The SESSION variable is read from my
login and
Hi All,
To summarize, SESSION options are part of the query plan and distributed along
with the query (not through ZK.) So, scenario 1 will always be fine. Since, for
SESSION options there is only one distribution path, everything Just Works.
Session options are set per connection, and tend to
Thanks everybody for all of your thoughtful insight and contributions
here. This has been enormously helpful!
Perhaps it would be good to document some basic HA recipes, in addition
to explaining these underlying concepts? For example, HAProxy + sticky
sessions + Drill, Traefik + sticky