Just for future, readers of the user list, Jacques posted to the JIRA that
HOCON variables likely already have this. I created some scripts for my
MapR setup (MapRTech folks, please take a look at the JIRA, I think I
created effectively a local volume correctly using maprcli in the
drill-env.sh)

The one last piece of the puzzle is how to test this is working, the
drillbits start with no errors, but I'd like to validate it's all working
as intended i.e. can I force my memory low? What type of query would cause
a spill? If Drill tries to not use spill as much as possible this may be
hard to prove...perhaps a query that shows this is setup right under the
hood?

John

On Mon, Nov 9, 2015 at 8:28 AM, John Omernik <[email protected]> wrote:

> Hey all, after speaking to Andries, I've gone ahead and created a JIRA to
> support variables in the drill-override.conf file:
>
> https://issues.apache.org/jira/browse/DRILL-4052
>
> This will be a huge help and a great flexibility for administrators
> looking to organize their drill clusters.  Please comment with ideas if you
> have thoughts on the subject!
>
> John
>
>
>
> On Fri, Sep 25, 2015 at 10:20 AM, Andy Pernsteiner <
> [email protected]> wrote:
>
>> That was considered,and we may elect to do that or some variation
>> (separate mount point going to the target directory per node, but where the
>> local mount point is identical across all cluster nodes).  I was hoping
>> that Drill had a way of parsing options within the config.  If not, I’ll
>> file a JIRA for enhancement, since this sort of thing would be useful for a
>> number of scenarios.
>>
>>
>>
>>  Andy Pernsteiner
>>  Manager, Field Enablement
>> ph: 206.228.0737
>>
>> www.mapr.com
>> Now Available - Free Hadoop On-Demand Training
>>
>>
>>
>> From: kbotzum <[email protected]>
>> Reply: [email protected] <[email protected]>>
>> Date: September 24, 2015 at 5:36:31 PM
>> To: [email protected] <[email protected]>>
>> Cc: Andries Engelbrecht <[email protected]>>
>> Subject:  Re: Setting drill.exec.sort.external.spill.directories
>>
>> How about a symbolic link from the local file system on each node to the
>> node specific tmp dir? A little hacky but workable. You could do that once
>> and then copy the drill config without concern.
>>
>> fyi, many eons ago a file system known as AFS had special vars that would
>> expand in pathnames to handle this type of thing transparently. My memory
>> is fuzzy but I think we had @sys, @host, and probably a few others.
>>
>> Keys
>> _______________________________
>> Keys Botzum
>> Senior Principal Technologist
>> [email protected]
>> 443-718-0098
>> MapR Technologies
>> http://www.mapr.com
>>
>>
>>
>> On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner <[email protected]>
>> wrote:
>>
>> > One question for those in the know: Is there a way to use shell (or
>> other)
>> > variables in these options? I'd much prefer $HOSTNAME , as opposed to
>> > having to set the variable differently on each node in my cluster.
>> >
>> >
>> >
>> > On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner <
>> [email protected]
>> >> wrote:
>> >
>> >> So, I *think* i got things working, I had some inconsistencies on what
>> I
>> >> would see depending on which user I had launched sqlline as, but I
>> can’t
>> >> reproduce reliably.
>> >>
>> >> In any case, here’s what I put in the config:
>> >>
>> >> drill.exec: {
>> >> cluster-id: "se1-drillbits",
>> >> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181",
>> >> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",
>> >> * sort.external.spill.directories: [
>> >> "/var/mapr/local/se-node10.se.lab/drillspill" ],*
>> >> * sort.external.spill.fs: "maprfs:///",*
>> >> impersonation: {
>> >> enabled: true,
>> >> max_chained_user_hops: 3
>> >> }
>> >> }
>> >>
>> >> Note: putting a shell variable ($HOSTNAME) did not seem to work ( I’d
>> get
>> >> errors when running queries that resulted in a spill to disk,
>> complaining
>> >> about directory permissions, likely because it couldn’t resolve the
>> path).
>> >>
>> >> If I can figure out the original issue I had (e.g.: if I can
>> reproduce), I
>> >> will file a JIRA.
>> >>
>> >>
>> >>
>> >> Andy Pernsteiner
>> >> Manager, Field Enablement
>> >> ph: 206.228.0737
>> >>
>> >> www.mapr.com
>> >>
>> >> Now Available - Free Hadoop On-Demand Training
>> >> <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>> >>
>> >>
>> >> From: Andries Engelbrecht <[email protected]>
>> >> <[email protected]>
>> >> Reply: [email protected] <[email protected]>>
>> >> <[email protected]>
>> >> Date: September 24, 2015 at 4:21:50 PM
>> >> To: [email protected] <[email protected]>> <
>> [email protected]>
>> >> Subject: Re: Setting drill.exec.sort.external.spill.directories
>> >>
>> >> Maybe try
>> >>
>> >> sort.external.spill.directories: [
>> "/var/mapr/local/$hostname/drillspill"
>> >> ],
>> >>
>> >> —Andries
>> >>
>> >>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner <
>> >> [email protected]> wrote:
>> >>>
>> >>> I’m trying to do some experimentation and set the
>> >> drill.exec.sort.external.spill.directories value. Since this option
>> appears
>> >> as a ‘boot’ option ( https://drill.apache.org/docs/start-up-options/
>> ) ,
>> >> I believe the right way is to set this in drill-override.conf on each
>> node.
>> >>>
>> >>> I tried doing this via the following:
>> >>>
>> >>>
>> >>> drill.exec: {
>> >>> cluster-id: "se1-drillbits",
>> >>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181",
>> >>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",
>> >>> sort.external.spill.directories: [ "/var/mapr/$hostname/drillspill" ],
>> >>> sort.external.spill.fs: "maprfs:///",
>> >>> impersonation: {
>> >>> enabled: true,
>> >>> max_chained_user_hops: 3
>> >>> }
>> >>> }
>> >>>
>> >>> I also tried setting via:
>> >>>
>> >>> sort: {
>> >>> purge.threshold : 100,
>> >>> external: {
>> >>> batch.size : 4000,
>> >>> spill: {
>> >>> batch.size : 4000,
>> >>> group.size : 100,
>> >>> threshold : 200,
>> >>> directories : [ "/var/mapr/$hostname/drillspill" ],
>> >>> fs : “maprfs:///"
>> >>> }
>> >>> }
>> >>> },
>> >>>
>> >>>
>> >>> But then looking at the sys.boot table after restarting the drill
>> bits,
>> >> I still see the default values:
>> >>>
>> >>> 0: jdbc:drill:> select * from sys.boot where name like '%spill%';
>> >>>
>> >>
>> +------+------+------+--------+---------+------------+----------+-----------+
>> >>
>> >>> | name | kind | type | status | num_val | string_val | bool_val |
>> >> float_val |
>> >>>
>> >>
>> +------+------+------+--------+---------+------------+----------+-----------+
>> >>
>> >>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT | BOOT |
>> 4000
>> >> | null | null | null |
>> >>> | drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT |
>> >> null | [
>> >>> #
>> >>
>> jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf:
>> >> 145
>> >>> "/tmp/drill/spill"
>> >>> ] | null | null |
>> >>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null |
>> >> "file:///" | null | null |
>> >>> | drill.exec.sort.external.spill.group.size | LONG | BOOT | BOOT |
>> 40000
>> >> | null | null | null |
>> >>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT |
>> 40000
>> >> | null | null | null |
>> >>>
>> >>
>> +------+------+------+--------+---------+------------+----------+-----------+
>> >>
>> >>>
>> >>> Note that I’ve tried removing the shell ‘$hostname’ variable (in case
>> it
>> >> causes issues), no dice.
>> >>>
>> >>> What’s the right way to set these values?
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Andy Pernsteiner
>> >>> Manager, Field Enablement
>> >>> ph: 206.228.0737
>> >>>
>> >>> www.mapr.com
>> >>> Now Available - Free Hadoop On-Demand Training
>> >>>
>> >>>
>> >>
>> >>
>> >
>> >
>> > --
>> > Andy Pernsteiner
>> > Manager, Field Enablement
>> > ph: 206.228.0737
>> >
>> > www.mapr.com
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>>
>

Reply via email to