Hey all, after speaking to Andries, I've gone ahead and created a JIRA to support variables in the drill-override.conf file:
https://issues.apache.org/jira/browse/DRILL-4052 This will be a huge help and a great flexibility for administrators looking to organize their drill clusters. Please comment with ideas if you have thoughts on the subject! John On Fri, Sep 25, 2015 at 10:20 AM, Andy Pernsteiner < [email protected]> wrote: > That was considered,and we may elect to do that or some variation > (separate mount point going to the target directory per node, but where the > local mount point is identical across all cluster nodes). I was hoping > that Drill had a way of parsing options within the config. If not, I’ll > file a JIRA for enhancement, since this sort of thing would be useful for a > number of scenarios. > > > > Andy Pernsteiner > Manager, Field Enablement > ph: 206.228.0737 > > www.mapr.com > Now Available - Free Hadoop On-Demand Training > > > > From: kbotzum <[email protected]> > Reply: [email protected] <[email protected]>> > Date: September 24, 2015 at 5:36:31 PM > To: [email protected] <[email protected]>> > Cc: Andries Engelbrecht <[email protected]>> > Subject: Re: Setting drill.exec.sort.external.spill.directories > > How about a symbolic link from the local file system on each node to the > node specific tmp dir? A little hacky but workable. You could do that once > and then copy the drill config without concern. > > fyi, many eons ago a file system known as AFS had special vars that would > expand in pathnames to handle this type of thing transparently. My memory > is fuzzy but I think we had @sys, @host, and probably a few others. > > Keys > _______________________________ > Keys Botzum > Senior Principal Technologist > [email protected] > 443-718-0098 > MapR Technologies > http://www.mapr.com > > > > On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner <[email protected]> > wrote: > > > One question for those in the know: Is there a way to use shell (or > other) > > variables in these options? I'd much prefer $HOSTNAME , as opposed to > > having to set the variable differently on each node in my cluster. > > > > > > > > On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner < > [email protected] > >> wrote: > > > >> So, I *think* i got things working, I had some inconsistencies on what I > >> would see depending on which user I had launched sqlline as, but I can’t > >> reproduce reliably. > >> > >> In any case, here’s what I put in the config: > >> > >> drill.exec: { > >> cluster-id: "se1-drillbits", > >> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181", > >> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles", > >> * sort.external.spill.directories: [ > >> "/var/mapr/local/se-node10.se.lab/drillspill" ],* > >> * sort.external.spill.fs: "maprfs:///",* > >> impersonation: { > >> enabled: true, > >> max_chained_user_hops: 3 > >> } > >> } > >> > >> Note: putting a shell variable ($HOSTNAME) did not seem to work ( I’d > get > >> errors when running queries that resulted in a spill to disk, > complaining > >> about directory permissions, likely because it couldn’t resolve the > path). > >> > >> If I can figure out the original issue I had (e.g.: if I can > reproduce), I > >> will file a JIRA. > >> > >> > >> > >> Andy Pernsteiner > >> Manager, Field Enablement > >> ph: 206.228.0737 > >> > >> www.mapr.com > >> > >> Now Available - Free Hadoop On-Demand Training > >> < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > >> > >> > >> From: Andries Engelbrecht <[email protected]> > >> <[email protected]> > >> Reply: [email protected] <[email protected]>> > >> <[email protected]> > >> Date: September 24, 2015 at 4:21:50 PM > >> To: [email protected] <[email protected]>> < > [email protected]> > >> Subject: Re: Setting drill.exec.sort.external.spill.directories > >> > >> Maybe try > >> > >> sort.external.spill.directories: [ > "/var/mapr/local/$hostname/drillspill" > >> ], > >> > >> —Andries > >> > >>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner < > >> [email protected]> wrote: > >>> > >>> I’m trying to do some experimentation and set the > >> drill.exec.sort.external.spill.directories value. Since this option > appears > >> as a ‘boot’ option ( https://drill.apache.org/docs/start-up-options/ ) > , > >> I believe the right way is to set this in drill-override.conf on each > node. > >>> > >>> I tried doing this via the following: > >>> > >>> > >>> drill.exec: { > >>> cluster-id: "se1-drillbits", > >>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181", > >>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles", > >>> sort.external.spill.directories: [ "/var/mapr/$hostname/drillspill" ], > >>> sort.external.spill.fs: "maprfs:///", > >>> impersonation: { > >>> enabled: true, > >>> max_chained_user_hops: 3 > >>> } > >>> } > >>> > >>> I also tried setting via: > >>> > >>> sort: { > >>> purge.threshold : 100, > >>> external: { > >>> batch.size : 4000, > >>> spill: { > >>> batch.size : 4000, > >>> group.size : 100, > >>> threshold : 200, > >>> directories : [ "/var/mapr/$hostname/drillspill" ], > >>> fs : “maprfs:///" > >>> } > >>> } > >>> }, > >>> > >>> > >>> But then looking at the sys.boot table after restarting the drill bits, > >> I still see the default values: > >>> > >>> 0: jdbc:drill:> select * from sys.boot where name like '%spill%'; > >>> > >> > +------+------+------+--------+---------+------------+----------+-----------+ > >> > >>> | name | kind | type | status | num_val | string_val | bool_val | > >> float_val | > >>> > >> > +------+------+------+--------+---------+------------+----------+-----------+ > >> > >>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT | BOOT | 4000 > >> | null | null | null | > >>> | drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT | > >> null | [ > >>> # > >> > jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf: > >> 145 > >>> "/tmp/drill/spill" > >>> ] | null | null | > >>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null | > >> "file:///" | null | null | > >>> | drill.exec.sort.external.spill.group.size | LONG | BOOT | BOOT | > 40000 > >> | null | null | null | > >>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT | 40000 > >> | null | null | null | > >>> > >> > +------+------+------+--------+---------+------------+----------+-----------+ > >> > >>> > >>> Note that I’ve tried removing the shell ‘$hostname’ variable (in case > it > >> causes issues), no dice. > >>> > >>> What’s the right way to set these values? > >>> > >>> > >>> > >>> > >>> > >>> > >>> Andy Pernsteiner > >>> Manager, Field Enablement > >>> ph: 206.228.0737 > >>> > >>> www.mapr.com > >>> Now Available - Free Hadoop On-Demand Training > >>> > >>> > >> > >> > > > > > > -- > > Andy Pernsteiner > > Manager, Field Enablement > > ph: 206.228.0737 > > > > www.mapr.com > > > > Now Available - Free Hadoop On-Demand Training > > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > >
