The sys.boot data is the current node's configuration as seen from sys.drillbits. Right now, we don't have a way of looking across nodes. Good feature request though.
-- Jacques Nadeau CTO and Co-Founder, Dremio On Mon, Nov 9, 2015 at 11:00 AM, John Omernik <[email protected]> wrote: > That's actually a good follow-up, if sys.boot (and others) are drillbit > specific, do we have a way to query sys.boot across all bits to show > values? > > > > On Mon, Nov 9, 2015 at 12:58 PM, John Omernik <[email protected]> wrote: > > > I did > > > > select * from boot where name like '%sort%' > > > > and I did get > > > > string_val - [ # env var DRILL_SPILLLOC > > "/var/mapr/local/onedrillbithostname/drillspill" > > ] > > > > onedrillbithostname is one of my 5 nodes, I suppose this is only going to > > be the foreman of the query (or perhaps the node that zk gave the JDBC > > connect to connect to) That seems well and good then. Looks like the > value > > is getting propagated well, I'd love to see data in those directories, to > > ensure I don't have some silly situation where the since sys.boot only > > shows one value, that all the nodes try to use the same spill location > > (i.e. proving that each bit is truly writing to it's own location) but > this > > all looks very promising. > > > > > > > > > > On Mon, Nov 9, 2015 at 12:51 PM, John Omernik <[email protected]> wrote: > > > >> Just for future, readers of the user list, Jacques posted to the JIRA > >> that HOCON variables likely already have this. I created some scripts > for > >> my MapR setup (MapRTech folks, please take a look at the JIRA, I think I > >> created effectively a local volume correctly using maprcli in the > >> drill-env.sh) > >> > >> The one last piece of the puzzle is how to test this is working, the > >> drillbits start with no errors, but I'd like to validate it's all > working > >> as intended i.e. can I force my memory low? What type of query would > cause > >> a spill? If Drill tries to not use spill as much as possible this may be > >> hard to prove...perhaps a query that shows this is setup right under the > >> hood? > >> > >> John > >> > >> On Mon, Nov 9, 2015 at 8:28 AM, John Omernik <[email protected]> wrote: > >> > >>> Hey all, after speaking to Andries, I've gone ahead and created a JIRA > >>> to support variables in the drill-override.conf file: > >>> > >>> https://issues.apache.org/jira/browse/DRILL-4052 > >>> > >>> This will be a huge help and a great flexibility for administrators > >>> looking to organize their drill clusters. Please comment with ideas > if you > >>> have thoughts on the subject! > >>> > >>> John > >>> > >>> > >>> > >>> On Fri, Sep 25, 2015 at 10:20 AM, Andy Pernsteiner < > >>> [email protected]> wrote: > >>> > >>>> That was considered,and we may elect to do that or some variation > >>>> (separate mount point going to the target directory per node, but > where the > >>>> local mount point is identical across all cluster nodes). I was > hoping > >>>> that Drill had a way of parsing options within the config. If not, > I’ll > >>>> file a JIRA for enhancement, since this sort of thing would be useful > for a > >>>> number of scenarios. > >>>> > >>>> > >>>> > >>>> Andy Pernsteiner > >>>> Manager, Field Enablement > >>>> ph: 206.228.0737 > >>>> > >>>> www.mapr.com > >>>> Now Available - Free Hadoop On-Demand Training > >>>> > >>>> > >>>> > >>>> From: kbotzum <[email protected]> > >>>> Reply: [email protected] <[email protected]>> > >>>> Date: September 24, 2015 at 5:36:31 PM > >>>> To: [email protected] <[email protected]>> > >>>> Cc: Andries Engelbrecht <[email protected]>> > >>>> Subject: Re: Setting drill.exec.sort.external.spill.directories > >>>> > >>>> How about a symbolic link from the local file system on each node to > >>>> the node specific tmp dir? A little hacky but workable. You could do > that > >>>> once and then copy the drill config without concern. > >>>> > >>>> fyi, many eons ago a file system known as AFS had special vars that > >>>> would expand in pathnames to handle this type of thing transparently. > My > >>>> memory is fuzzy but I think we had @sys, @host, and probably a few > others. > >>>> > >>>> Keys > >>>> _______________________________ > >>>> Keys Botzum > >>>> Senior Principal Technologist > >>>> [email protected] > >>>> 443-718-0098 > >>>> MapR Technologies > >>>> http://www.mapr.com > >>>> > >>>> > >>>> > >>>> On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner < > >>>> [email protected]> wrote: > >>>> > >>>> > One question for those in the know: Is there a way to use shell (or > >>>> other) > >>>> > variables in these options? I'd much prefer $HOSTNAME , as opposed > to > >>>> > having to set the variable differently on each node in my cluster. > >>>> > > >>>> > > >>>> > > >>>> > On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner < > >>>> [email protected] > >>>> >> wrote: > >>>> > > >>>> >> So, I *think* i got things working, I had some inconsistencies on > >>>> what I > >>>> >> would see depending on which user I had launched sqlline as, but I > >>>> can’t > >>>> >> reproduce reliably. > >>>> >> > >>>> >> In any case, here’s what I put in the config: > >>>> >> > >>>> >> drill.exec: { > >>>> >> cluster-id: "se1-drillbits", > >>>> >> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181", > >>>> >> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles", > >>>> >> * sort.external.spill.directories: [ > >>>> >> "/var/mapr/local/se-node10.se.lab/drillspill" ],* > >>>> >> * sort.external.spill.fs: "maprfs:///",* > >>>> >> impersonation: { > >>>> >> enabled: true, > >>>> >> max_chained_user_hops: 3 > >>>> >> } > >>>> >> } > >>>> >> > >>>> >> Note: putting a shell variable ($HOSTNAME) did not seem to work ( > >>>> I’d get > >>>> >> errors when running queries that resulted in a spill to disk, > >>>> complaining > >>>> >> about directory permissions, likely because it couldn’t resolve the > >>>> path). > >>>> >> > >>>> >> If I can figure out the original issue I had (e.g.: if I can > >>>> reproduce), I > >>>> >> will file a JIRA. > >>>> >> > >>>> >> > >>>> >> > >>>> >> Andy Pernsteiner > >>>> >> Manager, Field Enablement > >>>> >> ph: 206.228.0737 > >>>> >> > >>>> >> www.mapr.com > >>>> >> > >>>> >> Now Available - Free Hadoop On-Demand Training > >>>> >> < > >>>> > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > >>>> > > >>>> >> > >>>> >> > >>>> >> From: Andries Engelbrecht <[email protected]> > >>>> >> <[email protected]> > >>>> >> Reply: [email protected] <[email protected]>> > >>>> >> <[email protected]> > >>>> >> Date: September 24, 2015 at 4:21:50 PM > >>>> >> To: [email protected] <[email protected]>> < > >>>> [email protected]> > >>>> >> Subject: Re: Setting drill.exec.sort.external.spill.directories > >>>> >> > >>>> >> Maybe try > >>>> >> > >>>> >> sort.external.spill.directories: [ > >>>> "/var/mapr/local/$hostname/drillspill" > >>>> >> ], > >>>> >> > >>>> >> —Andries > >>>> >> > >>>> >>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner < > >>>> >> [email protected]> wrote: > >>>> >>> > >>>> >>> I’m trying to do some experimentation and set the > >>>> >> drill.exec.sort.external.spill.directories value. Since this option > >>>> appears > >>>> >> as a ‘boot’ option ( > https://drill.apache.org/docs/start-up-options/ > >>>> ) , > >>>> >> I believe the right way is to set this in drill-override.conf on > >>>> each node. > >>>> >>> > >>>> >>> I tried doing this via the following: > >>>> >>> > >>>> >>> > >>>> >>> drill.exec: { > >>>> >>> cluster-id: "se1-drillbits", > >>>> >>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181", > >>>> >>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles", > >>>> >>> sort.external.spill.directories: [ > "/var/mapr/$hostname/drillspill" > >>>> ], > >>>> >>> sort.external.spill.fs: "maprfs:///", > >>>> >>> impersonation: { > >>>> >>> enabled: true, > >>>> >>> max_chained_user_hops: 3 > >>>> >>> } > >>>> >>> } > >>>> >>> > >>>> >>> I also tried setting via: > >>>> >>> > >>>> >>> sort: { > >>>> >>> purge.threshold : 100, > >>>> >>> external: { > >>>> >>> batch.size : 4000, > >>>> >>> spill: { > >>>> >>> batch.size : 4000, > >>>> >>> group.size : 100, > >>>> >>> threshold : 200, > >>>> >>> directories : [ "/var/mapr/$hostname/drillspill" ], > >>>> >>> fs : “maprfs:///" > >>>> >>> } > >>>> >>> } > >>>> >>> }, > >>>> >>> > >>>> >>> > >>>> >>> But then looking at the sys.boot table after restarting the drill > >>>> bits, > >>>> >> I still see the default values: > >>>> >>> > >>>> >>> 0: jdbc:drill:> select * from sys.boot where name like '%spill%'; > >>>> >>> > >>>> >> > >>>> > +------+------+------+--------+---------+------------+----------+-----------+ > >>>> >> > >>>> >>> | name | kind | type | status | num_val | string_val | bool_val | > >>>> >> float_val | > >>>> >>> > >>>> >> > >>>> > +------+------+------+--------+---------+------------+----------+-----------+ > >>>> >> > >>>> >>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT | BOOT | > >>>> 4000 > >>>> >> | null | null | null | > >>>> >>> | drill.exec.sort.external.spill.directories | STRING | BOOT | > BOOT > >>>> | > >>>> >> null | [ > >>>> >>> # > >>>> >> > >>>> > jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf: > >>>> >> 145 > >>>> >>> "/tmp/drill/spill" > >>>> >>> ] | null | null | > >>>> >>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null > | > >>>> >> "file:///" | null | null | > >>>> >>> | drill.exec.sort.external.spill.group.size | LONG | BOOT | BOOT | > >>>> 40000 > >>>> >> | null | null | null | > >>>> >>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT | > >>>> 40000 > >>>> >> | null | null | null | > >>>> >>> > >>>> >> > >>>> > +------+------+------+--------+---------+------------+----------+-----------+ > >>>> >> > >>>> >>> > >>>> >>> Note that I’ve tried removing the shell ‘$hostname’ variable (in > >>>> case it > >>>> >> causes issues), no dice. > >>>> >>> > >>>> >>> What’s the right way to set these values? > >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> Andy Pernsteiner > >>>> >>> Manager, Field Enablement > >>>> >>> ph: 206.228.0737 > >>>> >>> > >>>> >>> www.mapr.com > >>>> >>> Now Available - Free Hadoop On-Demand Training > >>>> >>> > >>>> >>> > >>>> >> > >>>> >> > >>>> > > >>>> > > >>>> > -- > >>>> > Andy Pernsteiner > >>>> > Manager, Field Enablement > >>>> > ph: 206.228.0737 > >>>> > > >>>> > www.mapr.com > >>>> > > >>>> > Now Available - Free Hadoop On-Demand Training > >>>> > < > >>>> > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > >>>> > > >>>> > >>>> > >>> > >> > > >
