That's actually a good follow-up, if sys.boot (and others) are drillbit
specific, do we have a way to query sys.boot across all bits to show values?



On Mon, Nov 9, 2015 at 12:58 PM, John Omernik <[email protected]> wrote:

> I did
>
> select * from boot where name like '%sort%'
>
> and I did get
>
> string_val - [ # env var DRILL_SPILLLOC
>                     "/var/mapr/local/onedrillbithostname/drillspill"
>         ]
>
> onedrillbithostname is one of my 5 nodes, I suppose this is only going to
> be the foreman of the query (or perhaps the node that zk gave the JDBC
> connect to connect to) That seems well and good then.  Looks like the value
> is getting propagated well, I'd love to see data in those directories, to
> ensure I don't have some silly situation where the since sys.boot only
> shows one value, that all the nodes try to use the same spill location
> (i.e. proving that each bit is truly writing to it's own location) but this
> all looks very promising.
>
>
>
>
> On Mon, Nov 9, 2015 at 12:51 PM, John Omernik <[email protected]> wrote:
>
>> Just for future, readers of the user list, Jacques posted to the JIRA
>> that HOCON variables likely already have this. I created some scripts for
>> my MapR setup (MapRTech folks, please take a look at the JIRA, I think I
>> created effectively a local volume correctly using maprcli in the
>> drill-env.sh)
>>
>> The one last piece of the puzzle is how to test this is working, the
>> drillbits start with no errors, but I'd like to validate it's all working
>> as intended i.e. can I force my memory low? What type of query would cause
>> a spill? If Drill tries to not use spill as much as possible this may be
>> hard to prove...perhaps a query that shows this is setup right under the
>> hood?
>>
>> John
>>
>> On Mon, Nov 9, 2015 at 8:28 AM, John Omernik <[email protected]> wrote:
>>
>>> Hey all, after speaking to Andries, I've gone ahead and created a JIRA
>>> to support variables in the drill-override.conf file:
>>>
>>> https://issues.apache.org/jira/browse/DRILL-4052
>>>
>>> This will be a huge help and a great flexibility for administrators
>>> looking to organize their drill clusters.  Please comment with ideas if you
>>> have thoughts on the subject!
>>>
>>> John
>>>
>>>
>>>
>>> On Fri, Sep 25, 2015 at 10:20 AM, Andy Pernsteiner <
>>> [email protected]> wrote:
>>>
>>>> That was considered,and we may elect to do that or some variation
>>>> (separate mount point going to the target directory per node, but where the
>>>> local mount point is identical across all cluster nodes).  I was hoping
>>>> that Drill had a way of parsing options within the config.  If not, I’ll
>>>> file a JIRA for enhancement, since this sort of thing would be useful for a
>>>> number of scenarios.
>>>>
>>>>
>>>>
>>>>  Andy Pernsteiner
>>>>  Manager, Field Enablement
>>>> ph: 206.228.0737
>>>>
>>>> www.mapr.com
>>>> Now Available - Free Hadoop On-Demand Training
>>>>
>>>>
>>>>
>>>> From: kbotzum <[email protected]>
>>>> Reply: [email protected] <[email protected]>>
>>>> Date: September 24, 2015 at 5:36:31 PM
>>>> To: [email protected] <[email protected]>>
>>>> Cc: Andries Engelbrecht <[email protected]>>
>>>> Subject:  Re: Setting drill.exec.sort.external.spill.directories
>>>>
>>>> How about a symbolic link from the local file system on each node to
>>>> the node specific tmp dir? A little hacky but workable. You could do that
>>>> once and then copy the drill config without concern.
>>>>
>>>> fyi, many eons ago a file system known as AFS had special vars that
>>>> would expand in pathnames to handle this type of thing transparently. My
>>>> memory is fuzzy but I think we had @sys, @host, and probably a few others.
>>>>
>>>> Keys
>>>> _______________________________
>>>> Keys Botzum
>>>> Senior Principal Technologist
>>>> [email protected]
>>>> 443-718-0098
>>>> MapR Technologies
>>>> http://www.mapr.com
>>>>
>>>>
>>>>
>>>> On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner <
>>>> [email protected]> wrote:
>>>>
>>>> > One question for those in the know: Is there a way to use shell (or
>>>> other)
>>>> > variables in these options? I'd much prefer $HOSTNAME , as opposed to
>>>> > having to set the variable differently on each node in my cluster.
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner <
>>>> [email protected]
>>>> >> wrote:
>>>> >
>>>> >> So, I *think* i got things working, I had some inconsistencies on
>>>> what I
>>>> >> would see depending on which user I had launched sqlline as, but I
>>>> can’t
>>>> >> reproduce reliably.
>>>> >>
>>>> >> In any case, here’s what I put in the config:
>>>> >>
>>>> >> drill.exec: {
>>>> >> cluster-id: "se1-drillbits",
>>>> >> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181",
>>>> >> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",
>>>> >> * sort.external.spill.directories: [
>>>> >> "/var/mapr/local/se-node10.se.lab/drillspill" ],*
>>>> >> * sort.external.spill.fs: "maprfs:///",*
>>>> >> impersonation: {
>>>> >> enabled: true,
>>>> >> max_chained_user_hops: 3
>>>> >> }
>>>> >> }
>>>> >>
>>>> >> Note: putting a shell variable ($HOSTNAME) did not seem to work (
>>>> I’d get
>>>> >> errors when running queries that resulted in a spill to disk,
>>>> complaining
>>>> >> about directory permissions, likely because it couldn’t resolve the
>>>> path).
>>>> >>
>>>> >> If I can figure out the original issue I had (e.g.: if I can
>>>> reproduce), I
>>>> >> will file a JIRA.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Andy Pernsteiner
>>>> >> Manager, Field Enablement
>>>> >> ph: 206.228.0737
>>>> >>
>>>> >> www.mapr.com
>>>> >>
>>>> >> Now Available - Free Hadoop On-Demand Training
>>>> >> <
>>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>> >
>>>> >>
>>>> >>
>>>> >> From: Andries Engelbrecht <[email protected]>
>>>> >> <[email protected]>
>>>> >> Reply: [email protected] <[email protected]>>
>>>> >> <[email protected]>
>>>> >> Date: September 24, 2015 at 4:21:50 PM
>>>> >> To: [email protected] <[email protected]>> <
>>>> [email protected]>
>>>> >> Subject: Re: Setting drill.exec.sort.external.spill.directories
>>>> >>
>>>> >> Maybe try
>>>> >>
>>>> >> sort.external.spill.directories: [
>>>> "/var/mapr/local/$hostname/drillspill"
>>>> >> ],
>>>> >>
>>>> >> —Andries
>>>> >>
>>>> >>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner <
>>>> >> [email protected]> wrote:
>>>> >>>
>>>> >>> I’m trying to do some experimentation and set the
>>>> >> drill.exec.sort.external.spill.directories value. Since this option
>>>> appears
>>>> >> as a ‘boot’ option ( https://drill.apache.org/docs/start-up-options/
>>>> ) ,
>>>> >> I believe the right way is to set this in drill-override.conf on
>>>> each node.
>>>> >>>
>>>> >>> I tried doing this via the following:
>>>> >>>
>>>> >>>
>>>> >>> drill.exec: {
>>>> >>> cluster-id: "se1-drillbits",
>>>> >>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181",
>>>> >>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",
>>>> >>> sort.external.spill.directories: [ "/var/mapr/$hostname/drillspill"
>>>> ],
>>>> >>> sort.external.spill.fs: "maprfs:///",
>>>> >>> impersonation: {
>>>> >>> enabled: true,
>>>> >>> max_chained_user_hops: 3
>>>> >>> }
>>>> >>> }
>>>> >>>
>>>> >>> I also tried setting via:
>>>> >>>
>>>> >>> sort: {
>>>> >>> purge.threshold : 100,
>>>> >>> external: {
>>>> >>> batch.size : 4000,
>>>> >>> spill: {
>>>> >>> batch.size : 4000,
>>>> >>> group.size : 100,
>>>> >>> threshold : 200,
>>>> >>> directories : [ "/var/mapr/$hostname/drillspill" ],
>>>> >>> fs : “maprfs:///"
>>>> >>> }
>>>> >>> }
>>>> >>> },
>>>> >>>
>>>> >>>
>>>> >>> But then looking at the sys.boot table after restarting the drill
>>>> bits,
>>>> >> I still see the default values:
>>>> >>>
>>>> >>> 0: jdbc:drill:> select * from sys.boot where name like '%spill%';
>>>> >>>
>>>> >>
>>>> +------+------+------+--------+---------+------------+----------+-----------+
>>>> >>
>>>> >>> | name | kind | type | status | num_val | string_val | bool_val |
>>>> >> float_val |
>>>> >>>
>>>> >>
>>>> +------+------+------+--------+---------+------------+----------+-----------+
>>>> >>
>>>> >>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT | BOOT |
>>>> 4000
>>>> >> | null | null | null |
>>>> >>> | drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT
>>>> |
>>>> >> null | [
>>>> >>> #
>>>> >>
>>>> jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf:
>>>> >> 145
>>>> >>> "/tmp/drill/spill"
>>>> >>> ] | null | null |
>>>> >>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null |
>>>> >> "file:///" | null | null |
>>>> >>> | drill.exec.sort.external.spill.group.size | LONG | BOOT | BOOT |
>>>> 40000
>>>> >> | null | null | null |
>>>> >>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT |
>>>> 40000
>>>> >> | null | null | null |
>>>> >>>
>>>> >>
>>>> +------+------+------+--------+---------+------------+----------+-----------+
>>>> >>
>>>> >>>
>>>> >>> Note that I’ve tried removing the shell ‘$hostname’ variable (in
>>>> case it
>>>> >> causes issues), no dice.
>>>> >>>
>>>> >>> What’s the right way to set these values?
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> Andy Pernsteiner
>>>> >>> Manager, Field Enablement
>>>> >>> ph: 206.228.0737
>>>> >>>
>>>> >>> www.mapr.com
>>>> >>> Now Available - Free Hadoop On-Demand Training
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > Andy Pernsteiner
>>>> > Manager, Field Enablement
>>>> > ph: 206.228.0737
>>>> >
>>>> > www.mapr.com
>>>> >
>>>> > Now Available - Free Hadoop On-Demand Training
>>>> > <
>>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>> >
>>>>
>>>>
>>>
>>
>

Reply via email to