Does the config in the JIRA create the volumes on MapR-FS? Reduce the direct memory in drill-env.sh to 2G or less and do a large sort (order by) and see if it forces it to spill.
The solution does look interesting and will be very helpful on large nodes with higher MapR-FS throughput. —Andries > On Nov 9, 2015, at 10:51 AM, John Omernik <[email protected]> wrote: > > Just for future, readers of the user list, Jacques posted to the JIRA that > HOCON variables likely already have this. I created some scripts for my > MapR setup (MapRTech folks, please take a look at the JIRA, I think I > created effectively a local volume correctly using maprcli in the > drill-env.sh) > > The one last piece of the puzzle is how to test this is working, the > drillbits start with no errors, but I'd like to validate it's all working > as intended i.e. can I force my memory low? What type of query would cause > a spill? If Drill tries to not use spill as much as possible this may be > hard to prove...perhaps a query that shows this is setup right under the > hood? > > John > > On Mon, Nov 9, 2015 at 8:28 AM, John Omernik <[email protected]> wrote: > >> Hey all, after speaking to Andries, I've gone ahead and created a JIRA to >> support variables in the drill-override.conf file: >> >> https://issues.apache.org/jira/browse/DRILL-4052 >> >> This will be a huge help and a great flexibility for administrators >> looking to organize their drill clusters. Please comment with ideas if you >> have thoughts on the subject! >> >> John >> >> >> >> On Fri, Sep 25, 2015 at 10:20 AM, Andy Pernsteiner < >> [email protected]> wrote: >> >>> That was considered,and we may elect to do that or some variation >>> (separate mount point going to the target directory per node, but where the >>> local mount point is identical across all cluster nodes). I was hoping >>> that Drill had a way of parsing options within the config. If not, I’ll >>> file a JIRA for enhancement, since this sort of thing would be useful for a >>> number of scenarios. >>> >>> >>> >>> Andy Pernsteiner >>> Manager, Field Enablement >>> ph: 206.228.0737 >>> >>> www.mapr.com >>> Now Available - Free Hadoop On-Demand Training >>> >>> >>> >>> From: kbotzum <[email protected]> >>> Reply: [email protected] <[email protected]>> >>> Date: September 24, 2015 at 5:36:31 PM >>> To: [email protected] <[email protected]>> >>> Cc: Andries Engelbrecht <[email protected]>> >>> Subject: Re: Setting drill.exec.sort.external.spill.directories >>> >>> How about a symbolic link from the local file system on each node to the >>> node specific tmp dir? A little hacky but workable. You could do that once >>> and then copy the drill config without concern. >>> >>> fyi, many eons ago a file system known as AFS had special vars that would >>> expand in pathnames to handle this type of thing transparently. My memory >>> is fuzzy but I think we had @sys, @host, and probably a few others. >>> >>> Keys >>> _______________________________ >>> Keys Botzum >>> Senior Principal Technologist >>> [email protected] >>> 443-718-0098 >>> MapR Technologies >>> http://www.mapr.com >>> >>> >>> >>> On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner <[email protected]> >>> wrote: >>> >>>> One question for those in the know: Is there a way to use shell (or >>> other) >>>> variables in these options? I'd much prefer $HOSTNAME , as opposed to >>>> having to set the variable differently on each node in my cluster. >>>> >>>> >>>> >>>> On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner < >>> [email protected] >>>>> wrote: >>>> >>>>> So, I *think* i got things working, I had some inconsistencies on what >>> I >>>>> would see depending on which user I had launched sqlline as, but I >>> can’t >>>>> reproduce reliably. >>>>> >>>>> In any case, here’s what I put in the config: >>>>> >>>>> drill.exec: { >>>>> cluster-id: "se1-drillbits", >>>>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181", >>>>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles", >>>>> * sort.external.spill.directories: [ >>>>> "/var/mapr/local/se-node10.se.lab/drillspill" ],* >>>>> * sort.external.spill.fs: "maprfs:///",* >>>>> impersonation: { >>>>> enabled: true, >>>>> max_chained_user_hops: 3 >>>>> } >>>>> } >>>>> >>>>> Note: putting a shell variable ($HOSTNAME) did not seem to work ( I’d >>> get >>>>> errors when running queries that resulted in a spill to disk, >>> complaining >>>>> about directory permissions, likely because it couldn’t resolve the >>> path). >>>>> >>>>> If I can figure out the original issue I had (e.g.: if I can >>> reproduce), I >>>>> will file a JIRA. >>>>> >>>>> >>>>> >>>>> Andy Pernsteiner >>>>> Manager, Field Enablement >>>>> ph: 206.228.0737 >>>>> >>>>> www.mapr.com >>>>> >>>>> Now Available - Free Hadoop On-Demand Training >>>>> < >>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >>>> >>>>> >>>>> >>>>> From: Andries Engelbrecht <[email protected]> >>>>> <[email protected]> >>>>> Reply: [email protected] <[email protected]>> >>>>> <[email protected]> >>>>> Date: September 24, 2015 at 4:21:50 PM >>>>> To: [email protected] <[email protected]>> < >>> [email protected]> >>>>> Subject: Re: Setting drill.exec.sort.external.spill.directories >>>>> >>>>> Maybe try >>>>> >>>>> sort.external.spill.directories: [ >>> "/var/mapr/local/$hostname/drillspill" >>>>> ], >>>>> >>>>> —Andries >>>>> >>>>>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner < >>>>> [email protected]> wrote: >>>>>> >>>>>> I’m trying to do some experimentation and set the >>>>> drill.exec.sort.external.spill.directories value. Since this option >>> appears >>>>> as a ‘boot’ option ( https://drill.apache.org/docs/start-up-options/ >>> ) , >>>>> I believe the right way is to set this in drill-override.conf on each >>> node. >>>>>> >>>>>> I tried doing this via the following: >>>>>> >>>>>> >>>>>> drill.exec: { >>>>>> cluster-id: "se1-drillbits", >>>>>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181", >>>>>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles", >>>>>> sort.external.spill.directories: [ "/var/mapr/$hostname/drillspill" ], >>>>>> sort.external.spill.fs: "maprfs:///", >>>>>> impersonation: { >>>>>> enabled: true, >>>>>> max_chained_user_hops: 3 >>>>>> } >>>>>> } >>>>>> >>>>>> I also tried setting via: >>>>>> >>>>>> sort: { >>>>>> purge.threshold : 100, >>>>>> external: { >>>>>> batch.size : 4000, >>>>>> spill: { >>>>>> batch.size : 4000, >>>>>> group.size : 100, >>>>>> threshold : 200, >>>>>> directories : [ "/var/mapr/$hostname/drillspill" ], >>>>>> fs : “maprfs:///" >>>>>> } >>>>>> } >>>>>> }, >>>>>> >>>>>> >>>>>> But then looking at the sys.boot table after restarting the drill >>> bits, >>>>> I still see the default values: >>>>>> >>>>>> 0: jdbc:drill:> select * from sys.boot where name like '%spill%'; >>>>>> >>>>> >>> +------+------+------+--------+---------+------------+----------+-----------+ >>>>> >>>>>> | name | kind | type | status | num_val | string_val | bool_val | >>>>> float_val | >>>>>> >>>>> >>> +------+------+------+--------+---------+------------+----------+-----------+ >>>>> >>>>>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT | BOOT | >>> 4000 >>>>> | null | null | null | >>>>>> | drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT | >>>>> null | [ >>>>>> # >>>>> >>> jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf: >>>>> 145 >>>>>> "/tmp/drill/spill" >>>>>> ] | null | null | >>>>>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null | >>>>> "file:///" | null | null | >>>>>> | drill.exec.sort.external.spill.group.size | LONG | BOOT | BOOT | >>> 40000 >>>>> | null | null | null | >>>>>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT | >>> 40000 >>>>> | null | null | null | >>>>>> >>>>> >>> +------+------+------+--------+---------+------------+----------+-----------+ >>>>> >>>>>> >>>>>> Note that I’ve tried removing the shell ‘$hostname’ variable (in case >>> it >>>>> causes issues), no dice. >>>>>> >>>>>> What’s the right way to set these values? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Andy Pernsteiner >>>>>> Manager, Field Enablement >>>>>> ph: 206.228.0737 >>>>>> >>>>>> www.mapr.com >>>>>> Now Available - Free Hadoop On-Demand Training >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Andy Pernsteiner >>>> Manager, Field Enablement >>>> ph: 206.228.0737 >>>> >>>> www.mapr.com >>>> >>>> Now Available - Free Hadoop On-Demand Training >>>> < >>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >>>> >>> >>> >>
