Are these ETL ish type queries?  store.format should only apply when Drill
is writing data, when it is reading, it uses the filenames and other hints
to read.

Thus, if you do HA, say with DNS (like like in the other thread) and prior
to running your CREATE TABLE AS (I Am assuming this is what you are doing)
you can do ALTER SESSION set store.format = 'parquet'

Instead of setting the ALTER SYSTEM, you can use ALTER SESSION so it only
applies to the current session, regardless of foreman.

John


On Tue, Sep 4, 2018 at 1:00 PM, Joe Auty <j...@thinkdataworks.com> wrote:

> Hello,
>
> We need to have some queries executed with store.format set to parquet and
> some with this option set to CSV. To date we have experimented with setting
> the store format for sessions controlled by using two separate user logins
> as a sort of context switch, but I'm wondering if the group here might have
> suggestions for a better way to handle this, particularly one that will
> scale a little better for us?
>
> The main problem we have with this approach is in introducing multiple
> drillbits/HA and assuring that the session and the settings we need are
> respected across all drillbits (whether with an HAProxy + sticky session
> approach or any other approach). There is a more general thread (which I've
> chosen not to hijack) about HA Drill from a more general standpoint, you
> might think of my question here as being similar, but with the need for a
> context switch to support multiple Drill configurations/session options.
>
> Here are the various attempts and approaches we have come up with so far.
> I'm wondering if you'd have any general advice as to which approach would
> be best for us to take, considering future plans for Drill itself. For
> example, if need be we can write our own plugin(s) if this is the smartest
> approach:
>
> - embedded the store.format option into the query itself by chaining
> multiple queries together separated by a comma (it appears that this
> doesn't work at all)
> - look into writing some sort of plugin to allow us to scale our current
> approach somehow (I realize that this is vague)
> - a "foreman" approach where we stick with our current approach and direct
> all requests to our "foreman"/master with the hope and expectation that it
> will farm out work to the workers/slaves
> - multiple clusters set with different settings
>
> Each of these approaches seems to have its pros and cons. To reiterate:
> what approach do you think would be the smartest and most future-proof
> approach for us to take?
>
> Thanks in advance!
>

Reply via email to