I can add a JIRA on that if you'd like. Based on what I am seeing though, that setting should be on a per drill bit basis, and things should spell correctly right?
Any ways to force test this that you can think of ? On Mon, Nov 9, 2015 at 1:52 PM, Jacques Nadeau <[email protected]> wrote: > The sys.boot data is the current node's configuration as seen from > sys.drillbits. Right now, we don't have a way of looking across nodes. Good > feature request though. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Mon, Nov 9, 2015 at 11:00 AM, John Omernik <[email protected]> wrote: > > > That's actually a good follow-up, if sys.boot (and others) are drillbit > > specific, do we have a way to query sys.boot across all bits to show > > values? > > > > > > > > On Mon, Nov 9, 2015 at 12:58 PM, John Omernik <[email protected]> wrote: > > > > > I did > > > > > > select * from boot where name like '%sort%' > > > > > > and I did get > > > > > > string_val - [ # env var DRILL_SPILLLOC > > > "/var/mapr/local/onedrillbithostname/drillspill" > > > ] > > > > > > onedrillbithostname is one of my 5 nodes, I suppose this is only going > to > > > be the foreman of the query (or perhaps the node that zk gave the JDBC > > > connect to connect to) That seems well and good then. Looks like the > > value > > > is getting propagated well, I'd love to see data in those directories, > to > > > ensure I don't have some silly situation where the since sys.boot only > > > shows one value, that all the nodes try to use the same spill location > > > (i.e. proving that each bit is truly writing to it's own location) but > > this > > > all looks very promising. > > > > > > > > > > > > > > > On Mon, Nov 9, 2015 at 12:51 PM, John Omernik <[email protected]> > wrote: > > > > > >> Just for future, readers of the user list, Jacques posted to the JIRA > > >> that HOCON variables likely already have this. I created some scripts > > for > > >> my MapR setup (MapRTech folks, please take a look at the JIRA, I > think I > > >> created effectively a local volume correctly using maprcli in the > > >> drill-env.sh) > > >> > > >> The one last piece of the puzzle is how to test this is working, the > > >> drillbits start with no errors, but I'd like to validate it's all > > working > > >> as intended i.e. can I force my memory low? What type of query would > > cause > > >> a spill? If Drill tries to not use spill as much as possible this may > be > > >> hard to prove...perhaps a query that shows this is setup right under > the > > >> hood? > > >> > > >> John > > >> > > >> On Mon, Nov 9, 2015 at 8:28 AM, John Omernik <[email protected]> > wrote: > > >> > > >>> Hey all, after speaking to Andries, I've gone ahead and created a > JIRA > > >>> to support variables in the drill-override.conf file: > > >>> > > >>> https://issues.apache.org/jira/browse/DRILL-4052 > > >>> > > >>> This will be a huge help and a great flexibility for administrators > > >>> looking to organize their drill clusters. Please comment with ideas > > if you > > >>> have thoughts on the subject! > > >>> > > >>> John > > >>> > > >>> > > >>> > > >>> On Fri, Sep 25, 2015 at 10:20 AM, Andy Pernsteiner < > > >>> [email protected]> wrote: > > >>> > > >>>> That was considered,and we may elect to do that or some variation > > >>>> (separate mount point going to the target directory per node, but > > where the > > >>>> local mount point is identical across all cluster nodes). I was > > hoping > > >>>> that Drill had a way of parsing options within the config. If not, > > I’ll > > >>>> file a JIRA for enhancement, since this sort of thing would be > useful > > for a > > >>>> number of scenarios. > > >>>> > > >>>> > > >>>> > > >>>> Andy Pernsteiner > > >>>> Manager, Field Enablement > > >>>> ph: 206.228.0737 > > >>>> > > >>>> www.mapr.com > > >>>> Now Available - Free Hadoop On-Demand Training > > >>>> > > >>>> > > >>>> > > >>>> From: kbotzum <[email protected]> > > >>>> Reply: [email protected] <[email protected]>> > > >>>> Date: September 24, 2015 at 5:36:31 PM > > >>>> To: [email protected] <[email protected]>> > > >>>> Cc: Andries Engelbrecht <[email protected]>> > > >>>> Subject: Re: Setting drill.exec.sort.external.spill.directories > > >>>> > > >>>> How about a symbolic link from the local file system on each node to > > >>>> the node specific tmp dir? A little hacky but workable. You could do > > that > > >>>> once and then copy the drill config without concern. > > >>>> > > >>>> fyi, many eons ago a file system known as AFS had special vars that > > >>>> would expand in pathnames to handle this type of thing > transparently. > > My > > >>>> memory is fuzzy but I think we had @sys, @host, and probably a few > > others. > > >>>> > > >>>> Keys > > >>>> _______________________________ > > >>>> Keys Botzum > > >>>> Senior Principal Technologist > > >>>> [email protected] > > >>>> 443-718-0098 > > >>>> MapR Technologies > > >>>> http://www.mapr.com > > >>>> > > >>>> > > >>>> > > >>>> On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner < > > >>>> [email protected]> wrote: > > >>>> > > >>>> > One question for those in the know: Is there a way to use shell > (or > > >>>> other) > > >>>> > variables in these options? I'd much prefer $HOSTNAME , as opposed > > to > > >>>> > having to set the variable differently on each node in my cluster. > > >>>> > > > >>>> > > > >>>> > > > >>>> > On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner < > > >>>> [email protected] > > >>>> >> wrote: > > >>>> > > > >>>> >> So, I *think* i got things working, I had some inconsistencies on > > >>>> what I > > >>>> >> would see depending on which user I had launched sqlline as, but > I > > >>>> can’t > > >>>> >> reproduce reliably. > > >>>> >> > > >>>> >> In any case, here’s what I put in the config: > > >>>> >> > > >>>> >> drill.exec: { > > >>>> >> cluster-id: "se1-drillbits", > > >>>> >> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181 > ", > > >>>> >> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles", > > >>>> >> * sort.external.spill.directories: [ > > >>>> >> "/var/mapr/local/se-node10.se.lab/drillspill" ],* > > >>>> >> * sort.external.spill.fs: "maprfs:///",* > > >>>> >> impersonation: { > > >>>> >> enabled: true, > > >>>> >> max_chained_user_hops: 3 > > >>>> >> } > > >>>> >> } > > >>>> >> > > >>>> >> Note: putting a shell variable ($HOSTNAME) did not seem to work ( > > >>>> I’d get > > >>>> >> errors when running queries that resulted in a spill to disk, > > >>>> complaining > > >>>> >> about directory permissions, likely because it couldn’t resolve > the > > >>>> path). > > >>>> >> > > >>>> >> If I can figure out the original issue I had (e.g.: if I can > > >>>> reproduce), I > > >>>> >> will file a JIRA. > > >>>> >> > > >>>> >> > > >>>> >> > > >>>> >> Andy Pernsteiner > > >>>> >> Manager, Field Enablement > > >>>> >> ph: 206.228.0737 > > >>>> >> > > >>>> >> www.mapr.com > > >>>> >> > > >>>> >> Now Available - Free Hadoop On-Demand Training > > >>>> >> < > > >>>> > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >>>> > > > >>>> >> > > >>>> >> > > >>>> >> From: Andries Engelbrecht <[email protected]> > > >>>> >> <[email protected]> > > >>>> >> Reply: [email protected] <[email protected]>> > > >>>> >> <[email protected]> > > >>>> >> Date: September 24, 2015 at 4:21:50 PM > > >>>> >> To: [email protected] <[email protected]>> < > > >>>> [email protected]> > > >>>> >> Subject: Re: Setting drill.exec.sort.external.spill.directories > > >>>> >> > > >>>> >> Maybe try > > >>>> >> > > >>>> >> sort.external.spill.directories: [ > > >>>> "/var/mapr/local/$hostname/drillspill" > > >>>> >> ], > > >>>> >> > > >>>> >> —Andries > > >>>> >> > > >>>> >>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner < > > >>>> >> [email protected]> wrote: > > >>>> >>> > > >>>> >>> I’m trying to do some experimentation and set the > > >>>> >> drill.exec.sort.external.spill.directories value. Since this > option > > >>>> appears > > >>>> >> as a ‘boot’ option ( > > https://drill.apache.org/docs/start-up-options/ > > >>>> ) , > > >>>> >> I believe the right way is to set this in drill-override.conf on > > >>>> each node. > > >>>> >>> > > >>>> >>> I tried doing this via the following: > > >>>> >>> > > >>>> >>> > > >>>> >>> drill.exec: { > > >>>> >>> cluster-id: "se1-drillbits", > > >>>> >>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181 > ", > > >>>> >>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles", > > >>>> >>> sort.external.spill.directories: [ > > "/var/mapr/$hostname/drillspill" > > >>>> ], > > >>>> >>> sort.external.spill.fs: "maprfs:///", > > >>>> >>> impersonation: { > > >>>> >>> enabled: true, > > >>>> >>> max_chained_user_hops: 3 > > >>>> >>> } > > >>>> >>> } > > >>>> >>> > > >>>> >>> I also tried setting via: > > >>>> >>> > > >>>> >>> sort: { > > >>>> >>> purge.threshold : 100, > > >>>> >>> external: { > > >>>> >>> batch.size : 4000, > > >>>> >>> spill: { > > >>>> >>> batch.size : 4000, > > >>>> >>> group.size : 100, > > >>>> >>> threshold : 200, > > >>>> >>> directories : [ "/var/mapr/$hostname/drillspill" ], > > >>>> >>> fs : “maprfs:///" > > >>>> >>> } > > >>>> >>> } > > >>>> >>> }, > > >>>> >>> > > >>>> >>> > > >>>> >>> But then looking at the sys.boot table after restarting the > drill > > >>>> bits, > > >>>> >> I still see the default values: > > >>>> >>> > > >>>> >>> 0: jdbc:drill:> select * from sys.boot where name like > '%spill%'; > > >>>> >>> > > >>>> >> > > >>>> > > > +------+------+------+--------+---------+------------+----------+-----------+ > > >>>> >> > > >>>> >>> | name | kind | type | status | num_val | string_val | bool_val > | > > >>>> >> float_val | > > >>>> >>> > > >>>> >> > > >>>> > > > +------+------+------+--------+---------+------------+----------+-----------+ > > >>>> >> > > >>>> >>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT | > BOOT | > > >>>> 4000 > > >>>> >> | null | null | null | > > >>>> >>> | drill.exec.sort.external.spill.directories | STRING | BOOT | > > BOOT > > >>>> | > > >>>> >> null | [ > > >>>> >>> # > > >>>> >> > > >>>> > > > jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf: > > >>>> >> 145 > > >>>> >>> "/tmp/drill/spill" > > >>>> >>> ] | null | null | > > >>>> >>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | > null > > | > > >>>> >> "file:///" | null | null | > > >>>> >>> | drill.exec.sort.external.spill.group.size | LONG | BOOT | > BOOT | > > >>>> 40000 > > >>>> >> | null | null | null | > > >>>> >>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT > | > > >>>> 40000 > > >>>> >> | null | null | null | > > >>>> >>> > > >>>> >> > > >>>> > > > +------+------+------+--------+---------+------------+----------+-----------+ > > >>>> >> > > >>>> >>> > > >>>> >>> Note that I’ve tried removing the shell ‘$hostname’ variable (in > > >>>> case it > > >>>> >> causes issues), no dice. > > >>>> >>> > > >>>> >>> What’s the right way to set these values? > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> Andy Pernsteiner > > >>>> >>> Manager, Field Enablement > > >>>> >>> ph: 206.228.0737 > > >>>> >>> > > >>>> >>> www.mapr.com > > >>>> >>> Now Available - Free Hadoop On-Demand Training > > >>>> >>> > > >>>> >>> > > >>>> >> > > >>>> >> > > >>>> > > > >>>> > > > >>>> > -- > > >>>> > Andy Pernsteiner > > >>>> > Manager, Field Enablement > > >>>> > ph: 206.228.0737 > > >>>> > > > >>>> > www.mapr.com > > >>>> > > > >>>> > Now Available - Free Hadoop On-Demand Training > > >>>> > < > > >>>> > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >>>> > > > >>>> > > >>>> > > >>> > > >> > > > > > >
