That was considered,and we may elect to do that or some variation (separate 
mount point going to the target directory per node, but where the local mount 
point is identical across all cluster nodes).  I was hoping that Drill had a 
way of parsing options within the config.  If not, I’ll file a JIRA for 
enhancement, since this sort of thing would be useful for a number of scenarios.



 Andy Pernsteiner
 Manager, Field Enablement
ph: 206.228.0737

www.mapr.com
Now Available - Free Hadoop On-Demand Training



From: kbotzum <[email protected]>
Reply: [email protected] <[email protected]>>
Date: September 24, 2015 at 5:36:31 PM
To: [email protected] <[email protected]>>
Cc: Andries Engelbrecht <[email protected]>>
Subject:  Re: Setting drill.exec.sort.external.spill.directories  

How about a symbolic link from the local file system on each node to the node 
specific tmp dir? A little hacky but workable. You could do that once and then 
copy the drill config without concern.  

fyi, many eons ago a file system known as AFS had special vars that would 
expand in pathnames to handle this type of thing transparently. My memory is 
fuzzy but I think we had @sys, @host, and probably a few others.  

Keys  
_______________________________  
Keys Botzum  
Senior Principal Technologist  
[email protected]  
443-718-0098  
MapR Technologies  
http://www.mapr.com  



On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner <[email protected]> 
wrote:  

> One question for those in the know: Is there a way to use shell (or other)  
> variables in these options? I'd much prefer $HOSTNAME , as opposed to  
> having to set the variable differently on each node in my cluster.  
>  
>  
>  
> On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner <[email protected]  
>> wrote:  
>  
>> So, I *think* i got things working, I had some inconsistencies on what I  
>> would see depending on which user I had launched sqlline as, but I can’t  
>> reproduce reliably.  
>>  
>> In any case, here’s what I put in the config:  
>>  
>> drill.exec: {  
>> cluster-id: "se1-drillbits",  
>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181",  
>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",  
>> * sort.external.spill.directories: [  
>> "/var/mapr/local/se-node10.se.lab/drillspill" ],*  
>> * sort.external.spill.fs: "maprfs:///",*  
>> impersonation: {  
>> enabled: true,  
>> max_chained_user_hops: 3  
>> }  
>> }  
>>  
>> Note: putting a shell variable ($HOSTNAME) did not seem to work ( I’d get  
>> errors when running queries that resulted in a spill to disk, complaining  
>> about directory permissions, likely because it couldn’t resolve the path).  
>>  
>> If I can figure out the original issue I had (e.g.: if I can reproduce), I  
>> will file a JIRA.  
>>  
>>  
>>  
>> Andy Pernsteiner  
>> Manager, Field Enablement  
>> ph: 206.228.0737  
>>  
>> www.mapr.com  
>>  
>> Now Available - Free Hadoop On-Demand Training  
>> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
>>   
>>  
>>  
>> From: Andries Engelbrecht <[email protected]>  
>> <[email protected]>  
>> Reply: [email protected] <[email protected]>>  
>> <[email protected]>  
>> Date: September 24, 2015 at 4:21:50 PM  
>> To: [email protected] <[email protected]>> <[email protected]>  
>> Subject: Re: Setting drill.exec.sort.external.spill.directories  
>>  
>> Maybe try  
>>  
>> sort.external.spill.directories: [ "/var/mapr/local/$hostname/drillspill"  
>> ],  
>>  
>> —Andries  
>>  
>>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner <  
>> [email protected]> wrote:  
>>>  
>>> I’m trying to do some experimentation and set the  
>> drill.exec.sort.external.spill.directories value. Since this option appears  
>> as a ‘boot’ option ( https://drill.apache.org/docs/start-up-options/ ) ,  
>> I believe the right way is to set this in drill-override.conf on each node.  
>>>  
>>> I tried doing this via the following:  
>>>  
>>>  
>>> drill.exec: {  
>>> cluster-id: "se1-drillbits",  
>>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181",  
>>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",  
>>> sort.external.spill.directories: [ "/var/mapr/$hostname/drillspill" ],  
>>> sort.external.spill.fs: "maprfs:///",  
>>> impersonation: {  
>>> enabled: true,  
>>> max_chained_user_hops: 3  
>>> }  
>>> }  
>>>  
>>> I also tried setting via:  
>>>  
>>> sort: {  
>>> purge.threshold : 100,  
>>> external: {  
>>> batch.size : 4000,  
>>> spill: {  
>>> batch.size : 4000,  
>>> group.size : 100,  
>>> threshold : 200,  
>>> directories : [ "/var/mapr/$hostname/drillspill" ],  
>>> fs : “maprfs:///"  
>>> }  
>>> }  
>>> },  
>>>  
>>>  
>>> But then looking at the sys.boot table after restarting the drill bits,  
>> I still see the default values:  
>>>  
>>> 0: jdbc:drill:> select * from sys.boot where name like '%spill%';  
>>>  
>> +------+------+------+--------+---------+------------+----------+-----------+
>>   
>>  
>>> | name | kind | type | status | num_val | string_val | bool_val |  
>> float_val |  
>>>  
>> +------+------+------+--------+---------+------------+----------+-----------+
>>   
>>  
>>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT | BOOT | 4000  
>> | null | null | null |  
>>> | drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT |  
>> null | [  
>>> #  
>> jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf:
>>   
>> 145  
>>> "/tmp/drill/spill"  
>>> ] | null | null |  
>>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null |  
>> "file:///" | null | null |  
>>> | drill.exec.sort.external.spill.group.size | LONG | BOOT | BOOT | 40000  
>> | null | null | null |  
>>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT | 40000  
>> | null | null | null |  
>>>  
>> +------+------+------+--------+---------+------------+----------+-----------+
>>   
>>  
>>>  
>>> Note that I’ve tried removing the shell ‘$hostname’ variable (in case it  
>> causes issues), no dice.  
>>>  
>>> What’s the right way to set these values?  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>> Andy Pernsteiner  
>>> Manager, Field Enablement  
>>> ph: 206.228.0737  
>>>  
>>> www.mapr.com  
>>> Now Available - Free Hadoop On-Demand Training  
>>>  
>>>  
>>  
>>  
>  
>  
> --  
> Andy Pernsteiner  
> Manager, Field Enablement  
> ph: 206.228.0737  
>  
> www.mapr.com  
>  
> Now Available - Free Hadoop On-Demand Training  
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
>   

Reply via email to