Drill did not automatically set that, it set it to 12, which is likely .7
or close to it on a 16 core machine, but I have 7 nodes, with different
cores, so is this setting per drill-bit or is it a cluster wide setting?
Is it possible to set this in the drill-overide based on the node itself,
or does drill handle that for us, and if I do a ALTER SESSION then it
changes thing cluster wide?

The reason I am asking is I am running this in Marathon, and assigning 6
Cores to each Drill bit.  (this is a resource constrained cluster).  Since
I am using CGROUPs, as I understand it,  if there is CPU contention, then
cgroups will limit drill to 6 shares, otherwise it will allow drill to use
more cores.

So as it pertains to this setting, should I set it to the number of cores
per node (as it's likely setting it now) or should use the number CPU
shares I am setting... and if I am doing cores per node, how do I handle
different sized nodes (16 core nodes vs 24 core nodes for example)



On Mon, Feb 15, 2016 at 1:37 PM, Abdel Hakim Deneche <[email protected]>
wrote:

> so yes, you are correct, you should set it to 1 x 32 x 0.7
>
> Btw, Drill should already have set this option to 32 x 0.7
>
> On Mon, Feb 15, 2016 at 11:36 AM, Abdel Hakim Deneche <
> [email protected]
> > wrote:
>
> > Don't be, it took me quite some time to figure out this one either =P
> >
> > the "number of active drillbits" refers to the number of Drillbits
> running
> > on each node of the cluster. Generally, you have 1 active Drillbit per
> node.
> >
> > On Mon, Feb 15, 2016 at 11:22 AM, John Omernik <[email protected]> wrote:
> >
> >> I am really sorry for being dense here, but based on your comment then,
> >> and
> >> the docs then if you had sixteen 32 core machines, but only one drill
> bit
> >> running per node, you'd still use 1 (one drill bit per node) * 32 (the
> >> number of cores) * 0.7 (the modifier in the docs) to get 23 as the
> number
> >> to set for planner.width_max_per_node  Not 16 * 32 * 0.7.  A reading of
> >> the
> >> docs is confusing (see below) you can read that as number of active
> drill
> >> bits, which on a sixteen node cluster, one per node would be 16 * 32
> >> (cores
> >> per node) * 0.7.  But I think you are saying that we should be taking 1
> >> drill bit per node * 32 * 0.7 ... correct?
> >>
> >> Quote from the docs:
> >> number of active drillbits (typically one per node) * number of cores
> per
> >> node * 0.7
> >>
> >> On Mon, Feb 15, 2016 at 11:15 AM, Abdel Hakim Deneche <
> >> [email protected]
> >> > wrote:
> >>
> >> > No, it's the maximum number of threads each drillbit will be able to
> >> spawn
> >> > for every major fragment of a query.
> >> >
> >> > If you run a query on a cluster of 32 core machines, and the query
> plan
> >> > contains multiple major fragments, each major fragment will have "at
> >> most"
> >> > 32 x 0.7= 23 minor fragments (or threads) running in parallel on every
> >> > drillbit. The "at most" is important here, as other factors limit how
> >> many
> >> > minor fragments can run in parallel, for example nature and size of
> the
> >> > data.
> >> >
> >> > On Mon, Feb 15, 2016 at 7:41 AM, John Omernik <[email protected]>
> wrote:
> >> >
> >> > > *
> >> > >
> >> >
> >>
> https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing
> >> > > <
> >> > >
> >> >
> >>
> https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing
> >> > > >*
> >> > >
> >> > >
> >> > > *On this page, on the setting planner.width.max_per_node it says the
> >> > > below.  In the equation, of number of active drillbits * number of
> >> cores
> >> > > per node * 0.7,  is the number of active drillbits the number of
> drill
> >> > bits
> >> > > PER NODE (as this setting is per node) or is that the number of
> active
> >> > > drill bits per cluster?  The example is unclear because it only
> shows
> >> an
> >> > > example on a single node cluster.  (Typically 1 per node doesn't
> >> clarify
> >> > > whether that number should be per node or per drill bit)*
> >> > >
> >> > > *Thanks!*
> >> > >
> >> > >
> >> > >
> >> > > The maximum width per node defines the maximum degree of parallelism
> >> for
> >> > > any fragment of a query, but the setting applies at the level of a
> >> single
> >> > > node in the cluster. The *default* maximum degree of parallelism per
> >> node
> >> > > is calculated as follows, with the theoretical maximum automatically
> >> > scaled
> >> > > back (and rounded down) so that only 70% of the actual available
> >> capacity
> >> > > is taken into account: number of active drillbits (typically one per
> >> > node)
> >> > > * number of cores per node * 0.7
> >> > >
> >> > > For example, on a single-node test system with 2 cores and
> >> > hyper-threading
> >> > >
> >> > > enabled: 1 * 4 * 0.7 = 3
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Abdelhakim Deneche
> >> >
> >> > Software Engineer
> >> >
> >> >   <http://www.mapr.com/>
> >> >
> >> >
> >> > Now Available - Free Hadoop On-Demand Training
> >> > <
> >> >
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Reply via email to