Re: Setting drill.exec.sort.external.spill.directories

Andries Engelbrecht Mon, 09 Nov 2015 12:27:26 -0800

Does the config in the JIRA create the volumes on MapR-FS?

Reduce the direct memory in drill-env.sh to 2G or less and do a large sort 
(order by) and see if it forces it to spill.


The solution does look interesting and will be very helpful on large nodes with 
higher MapR-FS throughput.


—Andries


> On Nov 9, 2015, at 10:51 AM, John Omernik <[email protected]> wrote:
> 
> Just for future, readers of the user list, Jacques posted to the JIRA that
> HOCON variables likely already have this. I created some scripts for my
> MapR setup (MapRTech folks, please take a look at the JIRA, I think I
> created effectively a local volume correctly using maprcli in the
> drill-env.sh)
> 
> The one last piece of the puzzle is how to test this is working, the
> drillbits start with no errors, but I'd like to validate it's all working
> as intended i.e. can I force my memory low? What type of query would cause
> a spill? If Drill tries to not use spill as much as possible this may be
> hard to prove...perhaps a query that shows this is setup right under the
> hood?
> 
> John
> 
> On Mon, Nov 9, 2015 at 8:28 AM, John Omernik <[email protected]> wrote:
> 
>> Hey all, after speaking to Andries, I've gone ahead and created a JIRA to
>> support variables in the drill-override.conf file:
>> 
>> https://issues.apache.org/jira/browse/DRILL-4052
>> 
>> This will be a huge help and a great flexibility for administrators
>> looking to organize their drill clusters.  Please comment with ideas if you
>> have thoughts on the subject!
>> 
>> John
>> 
>> 
>> 
>> On Fri, Sep 25, 2015 at 10:20 AM, Andy Pernsteiner <
>> [email protected]> wrote:
>> 
>>> That was considered,and we may elect to do that or some variation
>>> (separate mount point going to the target directory per node, but where the
>>> local mount point is identical across all cluster nodes).  I was hoping
>>> that Drill had a way of parsing options within the config.  If not, I’ll
>>> file a JIRA for enhancement, since this sort of thing would be useful for a
>>> number of scenarios.
>>> 
>>> 
>>> 
>>> Andy Pernsteiner
>>> Manager, Field Enablement
>>> ph: 206.228.0737
>>> 
>>> www.mapr.com
>>> Now Available - Free Hadoop On-Demand Training
>>> 
>>> 
>>> 
>>> From: kbotzum <[email protected]>
>>> Reply: [email protected] <[email protected]>>
>>> Date: September 24, 2015 at 5:36:31 PM
>>> To: [email protected] <[email protected]>>
>>> Cc: Andries Engelbrecht <[email protected]>>
>>> Subject:  Re: Setting drill.exec.sort.external.spill.directories
>>> 
>>> How about a symbolic link from the local file system on each node to the
>>> node specific tmp dir? A little hacky but workable. You could do that once
>>> and then copy the drill config without concern.
>>> 
>>> fyi, many eons ago a file system known as AFS had special vars that would
>>> expand in pathnames to handle this type of thing transparently. My memory
>>> is fuzzy but I think we had @sys, @host, and probably a few others.
>>> 
>>> Keys
>>> _______________________________
>>> Keys Botzum
>>> Senior Principal Technologist
>>> [email protected]
>>> 443-718-0098
>>> MapR Technologies
>>> http://www.mapr.com
>>> 
>>> 
>>> 
>>> On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner <[email protected]>
>>> wrote:
>>> 
>>>> One question for those in the know: Is there a way to use shell (or
>>> other)
>>>> variables in these options? I'd much prefer $HOSTNAME , as opposed to
>>>> having to set the variable differently on each node in my cluster.
>>>> 
>>>> 
>>>> 
>>>> On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner <
>>> [email protected]
>>>>> wrote:
>>>> 
>>>>> So, I *think* i got things working, I had some inconsistencies on what
>>> I
>>>>> would see depending on which user I had launched sqlline as, but I
>>> can’t
>>>>> reproduce reliably.
>>>>> 
>>>>> In any case, here’s what I put in the config:
>>>>> 
>>>>> drill.exec: {
>>>>> cluster-id: "se1-drillbits",
>>>>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181",
>>>>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",
>>>>> * sort.external.spill.directories: [
>>>>> "/var/mapr/local/se-node10.se.lab/drillspill" ],*
>>>>> * sort.external.spill.fs: "maprfs:///",*
>>>>> impersonation: {
>>>>> enabled: true,
>>>>> max_chained_user_hops: 3
>>>>> }
>>>>> }
>>>>> 
>>>>> Note: putting a shell variable ($HOSTNAME) did not seem to work ( I’d
>>> get
>>>>> errors when running queries that resulted in a spill to disk,
>>> complaining
>>>>> about directory permissions, likely because it couldn’t resolve the
>>> path).
>>>>> 
>>>>> If I can figure out the original issue I had (e.g.: if I can
>>> reproduce), I
>>>>> will file a JIRA.
>>>>> 
>>>>> 
>>>>> 
>>>>> Andy Pernsteiner
>>>>> Manager, Field Enablement
>>>>> ph: 206.228.0737
>>>>> 
>>>>> www.mapr.com
>>>>> 
>>>>> Now Available - Free Hadoop On-Demand Training
>>>>> <
>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>> 
>>>>> 
>>>>> 
>>>>> From: Andries Engelbrecht <[email protected]>
>>>>> <[email protected]>
>>>>> Reply: [email protected] <[email protected]>>
>>>>> <[email protected]>
>>>>> Date: September 24, 2015 at 4:21:50 PM
>>>>> To: [email protected] <[email protected]>> <
>>> [email protected]>
>>>>> Subject: Re: Setting drill.exec.sort.external.spill.directories
>>>>> 
>>>>> Maybe try
>>>>> 
>>>>> sort.external.spill.directories: [
>>> "/var/mapr/local/$hostname/drillspill"
>>>>> ],
>>>>> 
>>>>> —Andries
>>>>> 
>>>>>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner <
>>>>> [email protected]> wrote:
>>>>>> 
>>>>>> I’m trying to do some experimentation and set the
>>>>> drill.exec.sort.external.spill.directories value. Since this option
>>> appears
>>>>> as a ‘boot’ option ( https://drill.apache.org/docs/start-up-options/
>>> ) ,
>>>>> I believe the right way is to set this in drill-override.conf on each
>>> node.
>>>>>> 
>>>>>> I tried doing this via the following:
>>>>>> 
>>>>>> 
>>>>>> drill.exec: {
>>>>>> cluster-id: "se1-drillbits",
>>>>>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181",
>>>>>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",
>>>>>> sort.external.spill.directories: [ "/var/mapr/$hostname/drillspill" ],
>>>>>> sort.external.spill.fs: "maprfs:///",
>>>>>> impersonation: {
>>>>>> enabled: true,
>>>>>> max_chained_user_hops: 3
>>>>>> }
>>>>>> }
>>>>>> 
>>>>>> I also tried setting via:
>>>>>> 
>>>>>> sort: {
>>>>>> purge.threshold : 100,
>>>>>> external: {
>>>>>> batch.size : 4000,
>>>>>> spill: {
>>>>>> batch.size : 4000,
>>>>>> group.size : 100,
>>>>>> threshold : 200,
>>>>>> directories : [ "/var/mapr/$hostname/drillspill" ],
>>>>>> fs : “maprfs:///"
>>>>>> }
>>>>>> }
>>>>>> },
>>>>>> 
>>>>>> 
>>>>>> But then looking at the sys.boot table after restarting the drill
>>> bits,
>>>>> I still see the default values:
>>>>>> 
>>>>>> 0: jdbc:drill:> select * from sys.boot where name like '%spill%';
>>>>>> 
>>>>> 
>>> +------+------+------+--------+---------+------------+----------+-----------+
>>>>> 
>>>>>> | name | kind | type | status | num_val | string_val | bool_val |
>>>>> float_val |
>>>>>> 
>>>>> 
>>> +------+------+------+--------+---------+------------+----------+-----------+
>>>>> 
>>>>>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT | BOOT |
>>> 4000
>>>>> | null | null | null |
>>>>>> | drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT |
>>>>> null | [
>>>>>> #
>>>>> 
>>> jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf:
>>>>> 145
>>>>>> "/tmp/drill/spill"
>>>>>> ] | null | null |
>>>>>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null |
>>>>> "file:///" | null | null |
>>>>>> | drill.exec.sort.external.spill.group.size | LONG | BOOT | BOOT |
>>> 40000
>>>>> | null | null | null |
>>>>>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT |
>>> 40000
>>>>> | null | null | null |
>>>>>> 
>>>>> 
>>> +------+------+------+--------+---------+------------+----------+-----------+
>>>>> 
>>>>>> 
>>>>>> Note that I’ve tried removing the shell ‘$hostname’ variable (in case
>>> it
>>>>> causes issues), no dice.
>>>>>> 
>>>>>> What’s the right way to set these values?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Andy Pernsteiner
>>>>>> Manager, Field Enablement
>>>>>> ph: 206.228.0737
>>>>>> 
>>>>>> www.mapr.com
>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Andy Pernsteiner
>>>> Manager, Field Enablement
>>>> ph: 206.228.0737
>>>> 
>>>> www.mapr.com
>>>> 
>>>> Now Available - Free Hadoop On-Demand Training
>>>> <
>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>> 
>>> 
>>> 
>>

Re: Setting drill.exec.sort.external.spill.directories

Reply via email to