That's actually a good follow-up, if sys.boot (and others) are drillbit specific, do we have a way to query sys.boot across all bits to show values?
On Mon, Nov 9, 2015 at 12:58 PM, John Omernik <[email protected]> wrote: > I did > > select * from boot where name like '%sort%' > > and I did get > > string_val - [ # env var DRILL_SPILLLOC > "/var/mapr/local/onedrillbithostname/drillspill" > ] > > onedrillbithostname is one of my 5 nodes, I suppose this is only going to > be the foreman of the query (or perhaps the node that zk gave the JDBC > connect to connect to) That seems well and good then. Looks like the value > is getting propagated well, I'd love to see data in those directories, to > ensure I don't have some silly situation where the since sys.boot only > shows one value, that all the nodes try to use the same spill location > (i.e. proving that each bit is truly writing to it's own location) but this > all looks very promising. > > > > > On Mon, Nov 9, 2015 at 12:51 PM, John Omernik <[email protected]> wrote: > >> Just for future, readers of the user list, Jacques posted to the JIRA >> that HOCON variables likely already have this. I created some scripts for >> my MapR setup (MapRTech folks, please take a look at the JIRA, I think I >> created effectively a local volume correctly using maprcli in the >> drill-env.sh) >> >> The one last piece of the puzzle is how to test this is working, the >> drillbits start with no errors, but I'd like to validate it's all working >> as intended i.e. can I force my memory low? What type of query would cause >> a spill? If Drill tries to not use spill as much as possible this may be >> hard to prove...perhaps a query that shows this is setup right under the >> hood? >> >> John >> >> On Mon, Nov 9, 2015 at 8:28 AM, John Omernik <[email protected]> wrote: >> >>> Hey all, after speaking to Andries, I've gone ahead and created a JIRA >>> to support variables in the drill-override.conf file: >>> >>> https://issues.apache.org/jira/browse/DRILL-4052 >>> >>> This will be a huge help and a great flexibility for administrators >>> looking to organize their drill clusters. Please comment with ideas if you >>> have thoughts on the subject! >>> >>> John >>> >>> >>> >>> On Fri, Sep 25, 2015 at 10:20 AM, Andy Pernsteiner < >>> [email protected]> wrote: >>> >>>> That was considered,and we may elect to do that or some variation >>>> (separate mount point going to the target directory per node, but where the >>>> local mount point is identical across all cluster nodes). I was hoping >>>> that Drill had a way of parsing options within the config. If not, I’ll >>>> file a JIRA for enhancement, since this sort of thing would be useful for a >>>> number of scenarios. >>>> >>>> >>>> >>>> Andy Pernsteiner >>>> Manager, Field Enablement >>>> ph: 206.228.0737 >>>> >>>> www.mapr.com >>>> Now Available - Free Hadoop On-Demand Training >>>> >>>> >>>> >>>> From: kbotzum <[email protected]> >>>> Reply: [email protected] <[email protected]>> >>>> Date: September 24, 2015 at 5:36:31 PM >>>> To: [email protected] <[email protected]>> >>>> Cc: Andries Engelbrecht <[email protected]>> >>>> Subject: Re: Setting drill.exec.sort.external.spill.directories >>>> >>>> How about a symbolic link from the local file system on each node to >>>> the node specific tmp dir? A little hacky but workable. You could do that >>>> once and then copy the drill config without concern. >>>> >>>> fyi, many eons ago a file system known as AFS had special vars that >>>> would expand in pathnames to handle this type of thing transparently. My >>>> memory is fuzzy but I think we had @sys, @host, and probably a few others. >>>> >>>> Keys >>>> _______________________________ >>>> Keys Botzum >>>> Senior Principal Technologist >>>> [email protected] >>>> 443-718-0098 >>>> MapR Technologies >>>> http://www.mapr.com >>>> >>>> >>>> >>>> On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner < >>>> [email protected]> wrote: >>>> >>>> > One question for those in the know: Is there a way to use shell (or >>>> other) >>>> > variables in these options? I'd much prefer $HOSTNAME , as opposed to >>>> > having to set the variable differently on each node in my cluster. >>>> > >>>> > >>>> > >>>> > On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner < >>>> [email protected] >>>> >> wrote: >>>> > >>>> >> So, I *think* i got things working, I had some inconsistencies on >>>> what I >>>> >> would see depending on which user I had launched sqlline as, but I >>>> can’t >>>> >> reproduce reliably. >>>> >> >>>> >> In any case, here’s what I put in the config: >>>> >> >>>> >> drill.exec: { >>>> >> cluster-id: "se1-drillbits", >>>> >> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181", >>>> >> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles", >>>> >> * sort.external.spill.directories: [ >>>> >> "/var/mapr/local/se-node10.se.lab/drillspill" ],* >>>> >> * sort.external.spill.fs: "maprfs:///",* >>>> >> impersonation: { >>>> >> enabled: true, >>>> >> max_chained_user_hops: 3 >>>> >> } >>>> >> } >>>> >> >>>> >> Note: putting a shell variable ($HOSTNAME) did not seem to work ( >>>> I’d get >>>> >> errors when running queries that resulted in a spill to disk, >>>> complaining >>>> >> about directory permissions, likely because it couldn’t resolve the >>>> path). >>>> >> >>>> >> If I can figure out the original issue I had (e.g.: if I can >>>> reproduce), I >>>> >> will file a JIRA. >>>> >> >>>> >> >>>> >> >>>> >> Andy Pernsteiner >>>> >> Manager, Field Enablement >>>> >> ph: 206.228.0737 >>>> >> >>>> >> www.mapr.com >>>> >> >>>> >> Now Available - Free Hadoop On-Demand Training >>>> >> < >>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >>>> > >>>> >> >>>> >> >>>> >> From: Andries Engelbrecht <[email protected]> >>>> >> <[email protected]> >>>> >> Reply: [email protected] <[email protected]>> >>>> >> <[email protected]> >>>> >> Date: September 24, 2015 at 4:21:50 PM >>>> >> To: [email protected] <[email protected]>> < >>>> [email protected]> >>>> >> Subject: Re: Setting drill.exec.sort.external.spill.directories >>>> >> >>>> >> Maybe try >>>> >> >>>> >> sort.external.spill.directories: [ >>>> "/var/mapr/local/$hostname/drillspill" >>>> >> ], >>>> >> >>>> >> —Andries >>>> >> >>>> >>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner < >>>> >> [email protected]> wrote: >>>> >>> >>>> >>> I’m trying to do some experimentation and set the >>>> >> drill.exec.sort.external.spill.directories value. Since this option >>>> appears >>>> >> as a ‘boot’ option ( https://drill.apache.org/docs/start-up-options/ >>>> ) , >>>> >> I believe the right way is to set this in drill-override.conf on >>>> each node. >>>> >>> >>>> >>> I tried doing this via the following: >>>> >>> >>>> >>> >>>> >>> drill.exec: { >>>> >>> cluster-id: "se1-drillbits", >>>> >>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181", >>>> >>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles", >>>> >>> sort.external.spill.directories: [ "/var/mapr/$hostname/drillspill" >>>> ], >>>> >>> sort.external.spill.fs: "maprfs:///", >>>> >>> impersonation: { >>>> >>> enabled: true, >>>> >>> max_chained_user_hops: 3 >>>> >>> } >>>> >>> } >>>> >>> >>>> >>> I also tried setting via: >>>> >>> >>>> >>> sort: { >>>> >>> purge.threshold : 100, >>>> >>> external: { >>>> >>> batch.size : 4000, >>>> >>> spill: { >>>> >>> batch.size : 4000, >>>> >>> group.size : 100, >>>> >>> threshold : 200, >>>> >>> directories : [ "/var/mapr/$hostname/drillspill" ], >>>> >>> fs : “maprfs:///" >>>> >>> } >>>> >>> } >>>> >>> }, >>>> >>> >>>> >>> >>>> >>> But then looking at the sys.boot table after restarting the drill >>>> bits, >>>> >> I still see the default values: >>>> >>> >>>> >>> 0: jdbc:drill:> select * from sys.boot where name like '%spill%'; >>>> >>> >>>> >> >>>> +------+------+------+--------+---------+------------+----------+-----------+ >>>> >> >>>> >>> | name | kind | type | status | num_val | string_val | bool_val | >>>> >> float_val | >>>> >>> >>>> >> >>>> +------+------+------+--------+---------+------------+----------+-----------+ >>>> >> >>>> >>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT | BOOT | >>>> 4000 >>>> >> | null | null | null | >>>> >>> | drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT >>>> | >>>> >> null | [ >>>> >>> # >>>> >> >>>> jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf: >>>> >> 145 >>>> >>> "/tmp/drill/spill" >>>> >>> ] | null | null | >>>> >>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null | >>>> >> "file:///" | null | null | >>>> >>> | drill.exec.sort.external.spill.group.size | LONG | BOOT | BOOT | >>>> 40000 >>>> >> | null | null | null | >>>> >>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT | >>>> 40000 >>>> >> | null | null | null | >>>> >>> >>>> >> >>>> +------+------+------+--------+---------+------------+----------+-----------+ >>>> >> >>>> >>> >>>> >>> Note that I’ve tried removing the shell ‘$hostname’ variable (in >>>> case it >>>> >> causes issues), no dice. >>>> >>> >>>> >>> What’s the right way to set these values? >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> Andy Pernsteiner >>>> >>> Manager, Field Enablement >>>> >>> ph: 206.228.0737 >>>> >>> >>>> >>> www.mapr.com >>>> >>> Now Available - Free Hadoop On-Demand Training >>>> >>> >>>> >>> >>>> >> >>>> >> >>>> > >>>> > >>>> > -- >>>> > Andy Pernsteiner >>>> > Manager, Field Enablement >>>> > ph: 206.228.0737 >>>> > >>>> > www.mapr.com >>>> > >>>> > Now Available - Free Hadoop On-Demand Training >>>> > < >>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >>>> > >>>> >>>> >>> >> >
