Hi ,
I do use alter session for change the store format and it all works well .
I have scheduled ETL running but at times I have a use case to provide the
file in csv format so I use sqlline to do so.
and It doesnt even affect my other scheduled ETLs too.
For me drill set up distributed through
Thanks everybody for all of your thoughtful insight and contributions
here. This has been enormously helpful!
Perhaps it would be good to document some basic HA recipes, in addition
to explaining these underlying concepts? For example, HAProxy + sticky
sessions + Drill, Traefik + sticky
Hi All,
To summarize, SESSION options are part of the query plan and distributed along
with the query (not through ZK.) So, scenario 1 will always be fine. Since, for
SESSION options there is only one distribution path, everything Just Works.
Session options are set per connection, and tend to
Paul, let's talk about this race condition you mention
Let's use a real option here for clarity. store.format.
SYSTEM store.format is parquet
Scenario 1: I log on, I set SESSION store.format to csv and run CREATE
TABLE foo as select * from bar. The SESSION variable is read from my
login and
Hi Joe,
To answer our question about how options "travel"...
Drill maintains system options in ZK. Session options are maintained per
connection on the Foreman Drillbit to which the user connects. This is why a
simple round-robin load balancer does not work: why load balancing has to be
Nice! So, to clarify, is this accurate?
1) login, alter session set to define store.format for that user
2) session stickiness (i.e. HAProxy or whatever else will support this)
will ensure that user gets same planner/drillbit, which contains the
session info
3) planner will farm out work to
You can have one user logged in using store.format of CSV in one query,
while another user uses store.format of parquet at the same time. The work
from query one, whether bit 1 or 2 will know to store that query as csv and
the work from query two, where ever it is, will be parquet.
Essentially,
Hmmm
So, if user 1 sets the store.format to CSV on Drillbit 1 and work gets
farmed out to Drillbit 2, this session setting will "travel" with the
user from drillbit to drillbit? We were originally thinking that this
would be the case if the session information was retained in Zookeeper,
Rereading your post, I think there is some concern between embedded/single
drillbit mode, and distributed mode.
When you run multiple drillbits in distributed mode, you will (should) be
enabling authentication. Thus each user will log in to "a" drill bit.
There is no concern on which one, it's
Thanks for your response John!
We are using Drill both in an ETL context, as well as for general
warehouse queries. One Drill user uses store format set to Parquet while
the other uses store format set to CSV to read and write from HDFS. We
are currently using Kubernetes Services rather than
Are these ETL ish type queries? store.format should only apply when Drill
is writing data, when it is reading, it uses the filenames and other hints
to read.
Thus, if you do HA, say with DNS (like like in the other thread) and prior
to running your CREATE TABLE AS (I Am assuming this is what you
11 matches
Mail list logo