Re: Setting drill.exec.sort.external.spill.directories

John Omernik Mon, 09 Nov 2015 12:41:38 -0800

I am running watch ls on one of the drillspill directories in one window.

In drill I ran


select * from `largetesttable` order by state limit 100

(State is mailing abbreviation for state, I'd assume it would force a
larger sort, even with limit 100)

Nothing shows up in my watch window, and the query errors out with

"RESOURCE ERROR: One or more nodes ran out of memory while executing the
query"

Is this due to a poor test case, or my settings not being used properly in
the drill-override.conf


Yes, the config in the JIRA creates the volumes as needed. Looking in MapR
UI, the volumes appear to be created as I'd expect them to be, similar to
logs, mapred, and metrics local volumes. That part seems to be working, now
to validate the spill directories are working.

:)



On Mon, Nov 9, 2015 at 2:26 PM, Andries Engelbrecht <
[email protected]> wrote:

> Does the config in the JIRA create the volumes on MapR-FS?
>
> Reduce the direct memory in drill-env.sh to 2G or less and do a large sort
> (order by) and see if it forces it to spill.
>
> The solution does look interesting and will be very helpful on large nodes
> with higher MapR-FS throughput.
>
>
> —Andries
>
>
> > On Nov 9, 2015, at 10:51 AM, John Omernik <[email protected]> wrote:
> >
> > Just for future, readers of the user list, Jacques posted to the JIRA
> that
> > HOCON variables likely already have this. I created some scripts for my
> > MapR setup (MapRTech folks, please take a look at the JIRA, I think I
> > created effectively a local volume correctly using maprcli in the
> > drill-env.sh)
> >
> > The one last piece of the puzzle is how to test this is working, the
> > drillbits start with no errors, but I'd like to validate it's all working
> > as intended i.e. can I force my memory low? What type of query would
> cause
> > a spill? If Drill tries to not use spill as much as possible this may be
> > hard to prove...perhaps a query that shows this is setup right under the
> > hood?
> >
> > John
> >
> > On Mon, Nov 9, 2015 at 8:28 AM, John Omernik <[email protected]> wrote:
> >
> >> Hey all, after speaking to Andries, I've gone ahead and created a JIRA
> to
> >> support variables in the drill-override.conf file:
> >>
> >> https://issues.apache.org/jira/browse/DRILL-4052
> >>
> >> This will be a huge help and a great flexibility for administrators
> >> looking to organize their drill clusters.  Please comment with ideas if
> you
> >> have thoughts on the subject!
> >>
> >> John
> >>
> >>
> >>
> >> On Fri, Sep 25, 2015 at 10:20 AM, Andy Pernsteiner <
> >> [email protected]> wrote:
> >>
> >>> That was considered,and we may elect to do that or some variation
> >>> (separate mount point going to the target directory per node, but
> where the
> >>> local mount point is identical across all cluster nodes).  I was hoping
> >>> that Drill had a way of parsing options within the config.  If not,
> I’ll
> >>> file a JIRA for enhancement, since this sort of thing would be useful
> for a
> >>> number of scenarios.
> >>>
> >>>
> >>>
> >>> Andy Pernsteiner
> >>> Manager, Field Enablement
> >>> ph: 206.228.0737
> >>>
> >>> www.mapr.com
> >>> Now Available - Free Hadoop On-Demand Training
> >>>
> >>>
> >>>
> >>> From: kbotzum <[email protected]>
> >>> Reply: [email protected] <[email protected]>>
> >>> Date: September 24, 2015 at 5:36:31 PM
> >>> To: [email protected] <[email protected]>>
> >>> Cc: Andries Engelbrecht <[email protected]>>
> >>> Subject:  Re: Setting drill.exec.sort.external.spill.directories
> >>>
> >>> How about a symbolic link from the local file system on each node to
> the
> >>> node specific tmp dir? A little hacky but workable. You could do that
> once
> >>> and then copy the drill config without concern.
> >>>
> >>> fyi, many eons ago a file system known as AFS had special vars that
> would
> >>> expand in pathnames to handle this type of thing transparently. My
> memory
> >>> is fuzzy but I think we had @sys, @host, and probably a few others.
> >>>
> >>> Keys
> >>> _______________________________
> >>> Keys Botzum
> >>> Senior Principal Technologist
> >>> [email protected]
> >>> 443-718-0098
> >>> MapR Technologies
> >>> http://www.mapr.com
> >>>
> >>>
> >>>
> >>> On Sep 24, 2015, at 5:30 PM, Andy Pernsteiner <
> [email protected]>
> >>> wrote:
> >>>
> >>>> One question for those in the know: Is there a way to use shell (or
> >>> other)
> >>>> variables in these options? I'd much prefer $HOSTNAME , as opposed to
> >>>> having to set the variable differently on each node in my cluster.
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Sep 24, 2015 at 5:22 PM, Andy Pernsteiner <
> >>> [email protected]
> >>>>> wrote:
> >>>>
> >>>>> So, I *think* i got things working, I had some inconsistencies on
> what
> >>> I
> >>>>> would see depending on which user I had launched sqlline as, but I
> >>> can’t
> >>>>> reproduce reliably.
> >>>>>
> >>>>> In any case, here’s what I put in the config:
> >>>>>
> >>>>> drill.exec: {
> >>>>> cluster-id: "se1-drillbits",
> >>>>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181",
> >>>>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",
> >>>>> * sort.external.spill.directories: [
> >>>>> "/var/mapr/local/se-node10.se.lab/drillspill" ],*
> >>>>> * sort.external.spill.fs: "maprfs:///",*
> >>>>> impersonation: {
> >>>>> enabled: true,
> >>>>> max_chained_user_hops: 3
> >>>>> }
> >>>>> }
> >>>>>
> >>>>> Note: putting a shell variable ($HOSTNAME) did not seem to work ( I’d
> >>> get
> >>>>> errors when running queries that resulted in a spill to disk,
> >>> complaining
> >>>>> about directory permissions, likely because it couldn’t resolve the
> >>> path).
> >>>>>
> >>>>> If I can figure out the original issue I had (e.g.: if I can
> >>> reproduce), I
> >>>>> will file a JIRA.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Andy Pernsteiner
> >>>>> Manager, Field Enablement
> >>>>> ph: 206.228.0737
> >>>>>
> >>>>> www.mapr.com
> >>>>>
> >>>>> Now Available - Free Hadoop On-Demand Training
> >>>>> <
> >>>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>
> >>>>>
> >>>>>
> >>>>> From: Andries Engelbrecht <[email protected]>
> >>>>> <[email protected]>
> >>>>> Reply: [email protected] <[email protected]>>
> >>>>> <[email protected]>
> >>>>> Date: September 24, 2015 at 4:21:50 PM
> >>>>> To: [email protected] <[email protected]>> <
> >>> [email protected]>
> >>>>> Subject: Re: Setting drill.exec.sort.external.spill.directories
> >>>>>
> >>>>> Maybe try
> >>>>>
> >>>>> sort.external.spill.directories: [
> >>> "/var/mapr/local/$hostname/drillspill"
> >>>>> ],
> >>>>>
> >>>>> —Andries
> >>>>>
> >>>>>> On Sep 24, 2015, at 12:38 PM, Andy Pernsteiner <
> >>>>> [email protected]> wrote:
> >>>>>>
> >>>>>> I’m trying to do some experimentation and set the
> >>>>> drill.exec.sort.external.spill.directories value. Since this option
> >>> appears
> >>>>> as a ‘boot’ option ( https://drill.apache.org/docs/start-up-options/
> >>> ) ,
> >>>>> I believe the right way is to set this in drill-override.conf on each
> >>> node.
> >>>>>>
> >>>>>> I tried doing this via the following:
> >>>>>>
> >>>>>>
> >>>>>> drill.exec: {
> >>>>>> cluster-id: "se1-drillbits",
> >>>>>> zk.connect: "10.10.15.10:5181,10.10.15.11:5181,10.10.15.12:5181",
> >>>>>> sys.store.provider.zk.blobroot: "maprfs:///user/mapr/profiles",
> >>>>>> sort.external.spill.directories: [ "/var/mapr/$hostname/drillspill"
> ],
> >>>>>> sort.external.spill.fs: "maprfs:///",
> >>>>>> impersonation: {
> >>>>>> enabled: true,
> >>>>>> max_chained_user_hops: 3
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> I also tried setting via:
> >>>>>>
> >>>>>> sort: {
> >>>>>> purge.threshold : 100,
> >>>>>> external: {
> >>>>>> batch.size : 4000,
> >>>>>> spill: {
> >>>>>> batch.size : 4000,
> >>>>>> group.size : 100,
> >>>>>> threshold : 200,
> >>>>>> directories : [ "/var/mapr/$hostname/drillspill" ],
> >>>>>> fs : “maprfs:///"
> >>>>>> }
> >>>>>> }
> >>>>>> },
> >>>>>>
> >>>>>>
> >>>>>> But then looking at the sys.boot table after restarting the drill
> >>> bits,
> >>>>> I still see the default values:
> >>>>>>
> >>>>>> 0: jdbc:drill:> select * from sys.boot where name like '%spill%';
> >>>>>>
> >>>>>
> >>>
> +------+------+------+--------+---------+------------+----------+-----------+
> >>>>>
> >>>>>> | name | kind | type | status | num_val | string_val | bool_val |
> >>>>> float_val |
> >>>>>>
> >>>>>
> >>>
> +------+------+------+--------+---------+------------+----------+-----------+
> >>>>>
> >>>>>> | drill.exec.sort.external.spill.batch.size | LONG | BOOT | BOOT |
> >>> 4000
> >>>>> | null | null | null |
> >>>>>> | drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT
> |
> >>>>> null | [
> >>>>>> #
> >>>>>
> >>>
> jar:file:/opt/mapr/drill/drill-1.1.0/jars/drill-java-exec-1.1.0.jar!/drill-module.conf:
> >>>>> 145
> >>>>>> "/tmp/drill/spill"
> >>>>>> ] | null | null |
> >>>>>> | drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null |
> >>>>> "file:///" | null | null |
> >>>>>> | drill.exec.sort.external.spill.group.size | LONG | BOOT | BOOT |
> >>> 40000
> >>>>> | null | null | null |
> >>>>>> | drill.exec.sort.external.spill.threshold | LONG | BOOT | BOOT |
> >>> 40000
> >>>>> | null | null | null |
> >>>>>>
> >>>>>
> >>>
> +------+------+------+--------+---------+------------+----------+-----------+
> >>>>>
> >>>>>>
> >>>>>> Note that I’ve tried removing the shell ‘$hostname’ variable (in
> case
> >>> it
> >>>>> causes issues), no dice.
> >>>>>>
> >>>>>> What’s the right way to set these values?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Andy Pernsteiner
> >>>>>> Manager, Field Enablement
> >>>>>> ph: 206.228.0737
> >>>>>>
> >>>>>> www.mapr.com
> >>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Andy Pernsteiner
> >>>> Manager, Field Enablement
> >>>> ph: 206.228.0737
> >>>>
> >>>> www.mapr.com
> >>>>
> >>>> Now Available - Free Hadoop On-Demand Training
> >>>> <
> >>>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>
> >>>
> >>>
> >>
>
>

Re: Setting drill.exec.sort.external.spill.directories

Reply via email to