Re: Drill Doc Question: Multi Tenant Clusters

Abdel Hakim Deneche Mon, 15 Feb 2016 12:34:40 -0800

Someone may want to confirm this, but I think Drill will properly set the
default value (num cores x .7), and it will be specific to every node, but
when you query the option from sys.options, it will show you the value on
the "foreman" node for that specific query.
Once you set it manually using ALTER it will be the same value for all
nodes.


I don't think there is a way, for now, to change this using drill-override,
you may want to create a JIRA for this.

Again, someone else may want to give a more informed advice here, but
setting this option to 5 (6 cores x .7) will help limit how much CPU drill
will be using. Please be aware that this option only controls the "width"
of the queries, but you may still end up with more threads running
simultaneously in various stages of a query, for example Drill can spawn up
to 16 threads when it's reading parquet metadata information during
planning. There is an ongoing work to improve Drill's resource management.

On Mon, Feb 15, 2016 at 11:41 AM, John Omernik <j...@omernik.com> wrote:

> Drill did not automatically set that, it set it to 12, which is likely .7
> or close to it on a 16 core machine, but I have 7 nodes, with different
> cores, so is this setting per drill-bit or is it a cluster wide setting?
> Is it possible to set this in the drill-overide based on the node itself,
> or does drill handle that for us, and if I do a ALTER SESSION then it
> changes thing cluster wide?
>
> The reason I am asking is I am running this in Marathon, and assigning 6
> Cores to each Drill bit.  (this is a resource constrained cluster).  Since
> I am using CGROUPs, as I understand it,  if there is CPU contention, then
> cgroups will limit drill to 6 shares, otherwise it will allow drill to use
> more cores.
>
> So as it pertains to this setting, should I set it to the number of cores
> per node (as it's likely setting it now) or should use the number CPU
> shares I am setting... and if I am doing cores per node, how do I handle
> different sized nodes (16 core nodes vs 24 core nodes for example)
>
>
>
> On Mon, Feb 15, 2016 at 1:37 PM, Abdel Hakim Deneche <
> adene...@maprtech.com>
> wrote:
>
> > so yes, you are correct, you should set it to 1 x 32 x 0.7
> >
> > Btw, Drill should already have set this option to 32 x 0.7
> >
> > On Mon, Feb 15, 2016 at 11:36 AM, Abdel Hakim Deneche <
> > adene...@maprtech.com
> > > wrote:
> >
> > > Don't be, it took me quite some time to figure out this one either =P
> > >
> > > the "number of active drillbits" refers to the number of Drillbits
> > running
> > > on each node of the cluster. Generally, you have 1 active Drillbit per
> > node.
> > >
> > > On Mon, Feb 15, 2016 at 11:22 AM, John Omernik <j...@omernik.com>
> wrote:
> > >
> > >> I am really sorry for being dense here, but based on your comment
> then,
> > >> and
> > >> the docs then if you had sixteen 32 core machines, but only one drill
> > bit
> > >> running per node, you'd still use 1 (one drill bit per node) * 32 (the
> > >> number of cores) * 0.7 (the modifier in the docs) to get 23 as the
> > number
> > >> to set for planner.width_max_per_node  Not 16 * 32 * 0.7.  A reading
> of
> > >> the
> > >> docs is confusing (see below) you can read that as number of active
> > drill
> > >> bits, which on a sixteen node cluster, one per node would be 16 * 32
> > >> (cores
> > >> per node) * 0.7.  But I think you are saying that we should be taking
> 1
> > >> drill bit per node * 32 * 0.7 ... correct?
> > >>
> > >> Quote from the docs:
> > >> number of active drillbits (typically one per node) * number of cores
> > per
> > >> node * 0.7
> > >>
> > >> On Mon, Feb 15, 2016 at 11:15 AM, Abdel Hakim Deneche <
> > >> adene...@maprtech.com
> > >> > wrote:
> > >>
> > >> > No, it's the maximum number of threads each drillbit will be able to
> > >> spawn
> > >> > for every major fragment of a query.
> > >> >
> > >> > If you run a query on a cluster of 32 core machines, and the query
> > plan
> > >> > contains multiple major fragments, each major fragment will have "at
> > >> most"
> > >> > 32 x 0.7= 23 minor fragments (or threads) running in parallel on
> every
> > >> > drillbit. The "at most" is important here, as other factors limit
> how
> > >> many
> > >> > minor fragments can run in parallel, for example nature and size of
> > the
> > >> > data.
> > >> >
> > >> > On Mon, Feb 15, 2016 at 7:41 AM, John Omernik <j...@omernik.com>
> > wrote:
> > >> >
> > >> > > *
> > >> > >
> > >> >
> > >>
> >
> https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing
> > >> > > <
> > >> > >
> > >> >
> > >>
> >
> https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing
> > >> > > >*
> > >> > >
> > >> > >
> > >> > > *On this page, on the setting planner.width.max_per_node it says
> the
> > >> > > below.  In the equation, of number of active drillbits * number of
> > >> cores
> > >> > > per node * 0.7,  is the number of active drillbits the number of
> > drill
> > >> > bits
> > >> > > PER NODE (as this setting is per node) or is that the number of
> > active
> > >> > > drill bits per cluster?  The example is unclear because it only
> > shows
> > >> an
> > >> > > example on a single node cluster.  (Typically 1 per node doesn't
> > >> clarify
> > >> > > whether that number should be per node or per drill bit)*
> > >> > >
> > >> > > *Thanks!*
> > >> > >
> > >> > >
> > >> > >
> > >> > > The maximum width per node defines the maximum degree of
> parallelism
> > >> for
> > >> > > any fragment of a query, but the setting applies at the level of a
> > >> single
> > >> > > node in the cluster. The *default* maximum degree of parallelism
> per
> > >> node
> > >> > > is calculated as follows, with the theoretical maximum
> automatically
> > >> > scaled
> > >> > > back (and rounded down) so that only 70% of the actual available
> > >> capacity
> > >> > > is taken into account: number of active drillbits (typically one
> per
> > >> > node)
> > >> > > * number of cores per node * 0.7
> > >> > >
> > >> > > For example, on a single-node test system with 2 cores and
> > >> > hyper-threading
> > >> > >
> > >> > > enabled: 1 * 4 * 0.7 = 3
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Abdelhakim Deneche
> > >> >
> > >> > Software Engineer
> > >> >
> > >> >   <http://www.mapr.com/>
> > >> >
> > >> >
> > >> > Now Available - Free Hadoop On-Demand Training
> > >> > <
> > >> >
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> > >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Drill Doc Question: Multi Tenant Clusters

Reply via email to