I can add a JIRA on that if you'd like.

Based on what I am seeing though, that setting should be on a per drill bit
basis, and things should spell correctly right?

Any ways to force test this that you can think of ?

On Mon, Nov 9, 2015 at 1:52 PM, Jacques Nadeau <[email protected]> wrote:

> The sys.boot data is the current node's configuration as seen from
> sys.drillbits. Right now, we don't have a way of looking across nodes. Good
> feature request though.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Nov 9, 2015 at 11:00 AM, John Omernik <[email protected]> wrote:
>
> > That's actually a good follow-up, if sys.boot (and others) are drillbit
> > specific, do we have a way to query sys.boot across all bits to show
> > values?
> >
> >
> >
> > On Mon, Nov 9, 2015 at 12:58 PM, John Omernik <[email protected]> wrote:
> >
> > > I did
> > >
> > > select * from boot where name like '%sort%'
> > >
> > > and I did get
> > >
> > > string_val - [ # env var DRILL_SPILLLOC
> > >                     "/var/mapr/local/onedrillbithostname/drillspill"
> > >         ]
> > >
> > > onedrillbithostname is one of my 5 nodes, I suppose this is only going
> to
> > > be the foreman of the query (or perhaps the node that zk gave the JDBC
> > > connect to connect to) That seems well and good then.  Looks like the
> > value
> > > is getting propagated well, I'd love to see data in those directories,
> to
> > > ensure I don't have some silly situation where the since sys.boot only
> > > shows one value, that all the nodes try to use the same spill location
> > > (i.e. proving that each bit is truly writing to it's own location) but
> > this
> > > all looks very promising.
> > >
> > >
> > >
> > >
> > > On Mon, Nov 9, 2015 at 12:51 PM, John Omernik <[email protected]>
> wrote:
> > >
> > >> Just for future, readers of the user list, Jacques posted to the JIRA
> > >> that HOCON variables likely already have this. I created some scripts
> > for
> > >> my MapR setup (MapRTech folks, please take a look at the JIRA, I
> think I
> > >> created effectively a local volume correctly using maprcli in the
> > >> drill-env.sh)
> > >>
> > >> The one last piece of the puzzle is how to test this is working, the
> > >> drillbits start with no errors, but I'd like to validate it's all
> > working
> > >> as intended i.e. can I force my memory low? What type of query would
> > cause
> > >> a spill? If Drill tries to not use spill as much as possible this may
> be
> > >> hard to prove...perhaps a query that shows this is setup right under
> the
> > >> hood?
> > >>
> > >> John
> > >>
> > >> On Mon, Nov 9, 2015 at 8:28 AM, John Omernik <[email protected]>
> wrote:
> > >>
> > >>> Hey all, after speaking to Andries, I've gone ahead and created a
> JIRA
> > >>> to support variables in the drill-override.conf file:
> > >>>
> > >>> https://issues.apache.org/jira/browse/DRILL-4052
> > >>>
> > >>> This will be a huge help and a great flexibility for administrators
> > >>> looking to organize their drill clusters.  Please comment with ideas
> > if you
> > >>> have thoughts on the subject!
> > >>>
> > >>> John
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Sep 25, 2015 at 10:20 AM, Andy Pernsteiner <
> > >>> [email protected]> wrote:
> > >>>
> > >>>> That was considered,and we may elect to do that or some variation
> > >>>> (separate mount point going to the target directory per node, but
> > where the
> > >>>> local mount point is identical across all cluster nodes).  I was
> > hoping
> > >>>> that Drill had a way of parsing options within the config.  If not,
> > I’ll
> > >>>> file a JIRA for enhancement, since this sort of thing would be
> useful
> > for a
> > >>>> number of scenarios.
> > >>>>
> > >>>>
> > >>>>
> > >>>>  Andy Pernsteiner
> > >>>>  Manager, Field Enablement
> > >>>> ph: 206.228.0737
> > >>>>
> > >>>> www.mapr.com
> > >>>> Now Available - Free Hadoop On-Demand Training
> > >>>>
> > >>>>
> > >>>>
> > >>>> From: kbotzum <[email protected]>
> > >>>> Reply: [email protected] <[email protected]>>
> > >>>> Date: September 24, 2015 at 5:36:31 PM
> > >>>> To: [email protected] <[email protected]>>
> > >>>> Cc: Andries Engelbrecht <[email protected]>>
> > >>>> Subject:  Re: Setting drill.exec.sort.external.spill.directories
> > >>>>
> > >>>> How about a symbolic link from the local file system on each node to
> > >>>> the node specific tmp dir? A little hacky but workable. You could do
> > that
> > >>>> once and then copy the drill config without concern.
> > >>>>
> > >>>> fyi, many eons ago a file system known as AFS had special vars that
> > >>>> would expand in pathnames to handle this type of thing
> transparently.
> > My
> > >>>> memory is fuzzy but I think we had @sys, @host, and probably a few
> > others.
> > >>>>
> > >>>> Keys
> > >>>> _______________________________
> > >>>> Keys Botzum
> > >>>> Senior Principal Technologist
> > >>>> [email protected]
> > >>>> 443-718-0098
> > >>>> MapR Technologies
> > >>>> http://www.mapr.com
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner <
> > >>>> [email protected]> wrote:
> > >>>>
> > >>>> > One question for those in the know: Is there a way to use shell
> (or
> > >>>> other)
> > >>>> > variables in these options? I'd much prefer $HOSTNAME , as opposed
> > to
> > >>>> > having to set the variable differently on each node in my cluster.
> > >>>> >
> > >>>> >
> > >>>> >
> > >>>> > On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner <
> > >>>> [email protected]
> > >>>> >> wrote:
> > >>>> >
> > >>>> >> So, I *think* i got things working, I had some inconsistencies on
> > >>>> what I
> > >>>> >> would see depending on which user I had launched sqlline as, but
> I
> > >>>> can’t
> > >>>> >> reproduce reliably.
> > >>>> >>
> > >>>> >> In any case, here’s what I put in the config:
> > >>>> >>
> > >>>> >> drill.exec: {
> > >>>> >> cluster-id: "se1-drillbits",
> > >>>> >> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181
> ",
> > >>>> >> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",
> > >>>> >> * sort.external.spill.directories: [
> > >>>> >> "/var/mapr/local/se-node10.se.lab/drillspill" ],*
> > >>>> >> * sort.external.spill.fs: "maprfs:///",*
> > >>>> >> impersonation: {
> > >>>> >> enabled: true,
> > >>>> >> max_chained_user_hops: 3
> > >>>> >> }
> > >>>> >> }
> > >>>> >>
> > >>>> >> Note: putting a shell variable ($HOSTNAME) did not seem to work (
> > >>>> I’d get
> > >>>> >> errors when running queries that resulted in a spill to disk,
> > >>>> complaining
> > >>>> >> about directory permissions, likely because it couldn’t resolve
> the
> > >>>> path).
> > >>>> >>
> > >>>> >> If I can figure out the original issue I had (e.g.: if I can
> > >>>> reproduce), I
> > >>>> >> will file a JIRA.
> > >>>> >>
> > >>>> >>
> > >>>> >>
> > >>>> >> Andy Pernsteiner
> > >>>> >> Manager, Field Enablement
> > >>>> >> ph: 206.228.0737
> > >>>> >>
> > >>>> >> www.mapr.com
> > >>>> >>
> > >>>> >> Now Available - Free Hadoop On-Demand Training
> > >>>> >> <
> > >>>>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >>>> >
> > >>>> >>
> > >>>> >>
> > >>>> >> From: Andries Engelbrecht <[email protected]>
> > >>>> >> <[email protected]>
> > >>>> >> Reply: [email protected] <[email protected]>>
> > >>>> >> <[email protected]>
> > >>>> >> Date: September 24, 2015 at 4:21:50 PM
> > >>>> >> To: [email protected] <[email protected]>> <
> > >>>> [email protected]>
> > >>>> >> Subject: Re: Setting drill.exec.sort.external.spill.directories
> > >>>> >>
> > >>>> >> Maybe try
> > >>>> >>
> > >>>> >> sort.external.spill.directories: [
> > >>>> "/var/mapr/local/$hostname/drillspill"
> > >>>> >> ],
> > >>>> >>
> > >>>> >> —Andries
> > >>>> >>
> > >>>> >>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner <
> > >>>> >> [email protected]> wrote:
> > >>>> >>>
> > >>>> >>> I’m trying to do some experimentation and set the
> > >>>> >> drill.exec.sort.external.spill.directories value. Since this
> option
> > >>>> appears
> > >>>> >> as a ‘boot’ option (
> > https://drill.apache.org/docs/start-up-options/
> > >>>> ) ,
> > >>>> >> I believe the right way is to set this in drill-override.conf on
> > >>>> each node.
> > >>>> >>>
> > >>>> >>> I tried doing this via the following:
> > >>>> >>>
> > >>>> >>>
> > >>>> >>> drill.exec: {
> > >>>> >>> cluster-id: "se1-drillbits",
> > >>>> >>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181
> ",
> > >>>> >>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",
> > >>>> >>> sort.external.spill.directories: [
> > "/var/mapr/$hostname/drillspill"
> > >>>> ],
> > >>>> >>> sort.external.spill.fs: "maprfs:///",
> > >>>> >>> impersonation: {
> > >>>> >>> enabled: true,
> > >>>> >>> max_chained_user_hops: 3
> > >>>> >>> }
> > >>>> >>> }
> > >>>> >>>
> > >>>> >>> I also tried setting via:
> > >>>> >>>
> > >>>> >>> sort: {
> > >>>> >>> purge.threshold : 100,
> > >>>> >>> external: {
> > >>>> >>> batch.size : 4000,
> > >>>> >>> spill: {
> > >>>> >>> batch.size : 4000,
> > >>>> >>> group.size : 100,
> > >>>> >>> threshold : 200,
> > >>>> >>> directories : [ "/var/mapr/$hostname/drillspill" ],
> > >>>> >>> fs : “maprfs:///"
> > >>>> >>> }
> > >>>> >>> }
> > >>>> >>> },
> > >>>> >>>
> > >>>> >>>
> > >>>> >>> But then looking at the sys.boot table after restarting the
> drill
> > >>>> bits,
> > >>>> >> I still see the default values:
> > >>>> >>>
> > >>>> >>> 0: jdbc:drill:> select * from sys.boot where name like
> '%spill%';
> > >>>> >>>
> > >>>> >>
> > >>>>
> >
> +------+------+------+--------+---------+------------+----------+-----------+
> > >>>> >>
> > >>>> >>> | name | kind | type | status | num_val | string_val | bool_val
> |
> > >>>> >> float_val |
> > >>>> >>>
> > >>>> >>
> > >>>>
> >
> +------+------+------+--------+---------+------------+----------+-----------+
> > >>>> >>
> > >>>> >>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT |
> BOOT |
> > >>>> 4000
> > >>>> >> | null | null | null |
> > >>>> >>> | drill.exec.sort.external.spill.directories | STRING | BOOT |
> > BOOT
> > >>>> |
> > >>>> >> null | [
> > >>>> >>> #
> > >>>> >>
> > >>>>
> >
> jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf:
> > >>>> >> 145
> > >>>> >>> "/tmp/drill/spill"
> > >>>> >>> ] | null | null |
> > >>>> >>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT |
> null
> > |
> > >>>> >> "file:///" | null | null |
> > >>>> >>> | drill.exec.sort.external.spill.group.size | LONG | BOOT |
> BOOT |
> > >>>> 40000
> > >>>> >> | null | null | null |
> > >>>> >>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT
> |
> > >>>> 40000
> > >>>> >> | null | null | null |
> > >>>> >>>
> > >>>> >>
> > >>>>
> >
> +------+------+------+--------+---------+------------+----------+-----------+
> > >>>> >>
> > >>>> >>>
> > >>>> >>> Note that I’ve tried removing the shell ‘$hostname’ variable (in
> > >>>> case it
> > >>>> >> causes issues), no dice.
> > >>>> >>>
> > >>>> >>> What’s the right way to set these values?
> > >>>> >>>
> > >>>> >>>
> > >>>> >>>
> > >>>> >>>
> > >>>> >>>
> > >>>> >>>
> > >>>> >>> Andy Pernsteiner
> > >>>> >>> Manager, Field Enablement
> > >>>> >>> ph: 206.228.0737
> > >>>> >>>
> > >>>> >>> www.mapr.com
> > >>>> >>> Now Available - Free Hadoop On-Demand Training
> > >>>> >>>
> > >>>> >>>
> > >>>> >>
> > >>>> >>
> > >>>> >
> > >>>> >
> > >>>> > --
> > >>>> > Andy Pernsteiner
> > >>>> > Manager, Field Enablement
> > >>>> > ph: 206.228.0737
> > >>>> >
> > >>>> > www.mapr.com
> > >>>> >
> > >>>> > Now Available - Free Hadoop On-Demand Training
> > >>>> > <
> > >>>>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >>>> >
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>

Reply via email to