Re: Session handling with multiple drillbits

2018-09-10 Thread Divya Gehlot
Hi , I do use alter session for change the store format and it all works well . I have scheduled ETL running but at times I have a use case to provide the file in csv format so I use sqlline to do so. and It doesnt even affect my other scheduled ETLs too. For me drill set up distributed through

Re: Session handling with multiple drillbits

2018-09-05 Thread Joe Auty
Thanks everybody for all of your thoughtful insight and contributions here. This has been enormously helpful! Perhaps it would be good to document some basic HA recipes, in addition to explaining these underlying concepts? For example, HAProxy + sticky sessions + Drill, Traefik + sticky

Re: Session handling with multiple drillbits

2018-09-05 Thread Paul Rogers
Hi All, To summarize, SESSION options are part of the query plan and distributed along with the query (not through ZK.) So, scenario 1 will always be fine. Since, for SESSION options there is only one distribution path, everything Just Works. Session options are set per connection, and tend to

Re: Session handling with multiple drillbits

2018-09-05 Thread John Omernik
Paul, let's talk about this race condition you mention Let's use a real option here for clarity. store.format. SYSTEM store.format is parquet Scenario 1: I log on, I set SESSION store.format to csv and run CREATE TABLE foo as select * from bar. The SESSION variable is read from my login and

Re: Session handling with multiple drillbits

2018-09-05 Thread Paul Rogers
Hi Joe, To answer our question about how options "travel"... Drill maintains system options in ZK. Session options are maintained per connection on the Foreman Drillbit to which the user connects. This is why a simple round-robin load balancer does not work: why load balancing has to be

Re: Session handling with multiple drillbits

2018-09-05 Thread Joe Auty
Nice! So, to clarify, is this accurate? 1) login, alter session set to define store.format for that user 2) session stickiness (i.e. HAProxy or whatever else will support this) will ensure that user gets same planner/drillbit, which contains the session info 3) planner will farm out work to

Re: Session handling with multiple drillbits

2018-09-05 Thread John Omernik
You can have one user logged in using store.format of CSV in one query, while another user uses store.format of parquet at the same time. The work from query one, whether bit 1 or 2 will know to store that query as csv and the work from query two, where ever it is, will be parquet. Essentially,

Re: Session handling with multiple drillbits

2018-09-05 Thread Joe Auty
Hmmm So, if user 1 sets the store.format to CSV on Drillbit 1 and work gets farmed out to Drillbit 2, this session setting will "travel" with the user from drillbit to drillbit? We were originally thinking that this would be the case if the session information was retained in Zookeeper,

Re: Session handling with multiple drillbits

2018-09-05 Thread John Omernik
Rereading your post, I think there is some concern between embedded/single drillbit mode, and distributed mode. When you run multiple drillbits in distributed mode, you will (should) be enabling authentication. Thus each user will log in to "a" drill bit. There is no concern on which one, it's

Re: Session handling with multiple drillbits

2018-09-04 Thread Joe Auty
Thanks for your response John! We are using Drill both in an ETL context, as well as for general warehouse queries. One Drill user uses store format set to Parquet while the other uses store format set to CSV to read and write from HDFS. We are currently using Kubernetes Services rather than

Re: Session handling with multiple drillbits

2018-09-04 Thread John Omernik
Are these ETL ish type queries? store.format should only apply when Drill is writing data, when it is reading, it uses the filenames and other hints to read. Thus, if you do HA, say with DNS (like like in the other thread) and prior to running your CREATE TABLE AS (I Am assuming this is what you