Drill did not automatically set that, it set it to 12, which is likely .7 or close to it on a 16 core machine, but I have 7 nodes, with different cores, so is this setting per drill-bit or is it a cluster wide setting? Is it possible to set this in the drill-overide based on the node itself, or does drill handle that for us, and if I do a ALTER SESSION then it changes thing cluster wide?
The reason I am asking is I am running this in Marathon, and assigning 6 Cores to each Drill bit. (this is a resource constrained cluster). Since I am using CGROUPs, as I understand it, if there is CPU contention, then cgroups will limit drill to 6 shares, otherwise it will allow drill to use more cores. So as it pertains to this setting, should I set it to the number of cores per node (as it's likely setting it now) or should use the number CPU shares I am setting... and if I am doing cores per node, how do I handle different sized nodes (16 core nodes vs 24 core nodes for example) On Mon, Feb 15, 2016 at 1:37 PM, Abdel Hakim Deneche <[email protected]> wrote: > so yes, you are correct, you should set it to 1 x 32 x 0.7 > > Btw, Drill should already have set this option to 32 x 0.7 > > On Mon, Feb 15, 2016 at 11:36 AM, Abdel Hakim Deneche < > [email protected] > > wrote: > > > Don't be, it took me quite some time to figure out this one either =P > > > > the "number of active drillbits" refers to the number of Drillbits > running > > on each node of the cluster. Generally, you have 1 active Drillbit per > node. > > > > On Mon, Feb 15, 2016 at 11:22 AM, John Omernik <[email protected]> wrote: > > > >> I am really sorry for being dense here, but based on your comment then, > >> and > >> the docs then if you had sixteen 32 core machines, but only one drill > bit > >> running per node, you'd still use 1 (one drill bit per node) * 32 (the > >> number of cores) * 0.7 (the modifier in the docs) to get 23 as the > number > >> to set for planner.width_max_per_node Not 16 * 32 * 0.7. A reading of > >> the > >> docs is confusing (see below) you can read that as number of active > drill > >> bits, which on a sixteen node cluster, one per node would be 16 * 32 > >> (cores > >> per node) * 0.7. But I think you are saying that we should be taking 1 > >> drill bit per node * 32 * 0.7 ... correct? > >> > >> Quote from the docs: > >> number of active drillbits (typically one per node) * number of cores > per > >> node * 0.7 > >> > >> On Mon, Feb 15, 2016 at 11:15 AM, Abdel Hakim Deneche < > >> [email protected] > >> > wrote: > >> > >> > No, it's the maximum number of threads each drillbit will be able to > >> spawn > >> > for every major fragment of a query. > >> > > >> > If you run a query on a cluster of 32 core machines, and the query > plan > >> > contains multiple major fragments, each major fragment will have "at > >> most" > >> > 32 x 0.7= 23 minor fragments (or threads) running in parallel on every > >> > drillbit. The "at most" is important here, as other factors limit how > >> many > >> > minor fragments can run in parallel, for example nature and size of > the > >> > data. > >> > > >> > On Mon, Feb 15, 2016 at 7:41 AM, John Omernik <[email protected]> > wrote: > >> > > >> > > * > >> > > > >> > > >> > https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing > >> > > < > >> > > > >> > > >> > https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing > >> > > >* > >> > > > >> > > > >> > > *On this page, on the setting planner.width.max_per_node it says the > >> > > below. In the equation, of number of active drillbits * number of > >> cores > >> > > per node * 0.7, is the number of active drillbits the number of > drill > >> > bits > >> > > PER NODE (as this setting is per node) or is that the number of > active > >> > > drill bits per cluster? The example is unclear because it only > shows > >> an > >> > > example on a single node cluster. (Typically 1 per node doesn't > >> clarify > >> > > whether that number should be per node or per drill bit)* > >> > > > >> > > *Thanks!* > >> > > > >> > > > >> > > > >> > > The maximum width per node defines the maximum degree of parallelism > >> for > >> > > any fragment of a query, but the setting applies at the level of a > >> single > >> > > node in the cluster. The *default* maximum degree of parallelism per > >> node > >> > > is calculated as follows, with the theoretical maximum automatically > >> > scaled > >> > > back (and rounded down) so that only 70% of the actual available > >> capacity > >> > > is taken into account: number of active drillbits (typically one per > >> > node) > >> > > * number of cores per node * 0.7 > >> > > > >> > > For example, on a single-node test system with 2 cores and > >> > hyper-threading > >> > > > >> > > enabled: 1 * 4 * 0.7 = 3 > >> > > > >> > > >> > > >> > > >> > -- > >> > > >> > Abdelhakim Deneche > >> > > >> > Software Engineer > >> > > >> > <http://www.mapr.com/> > >> > > >> > > >> > Now Available - Free Hadoop On-Demand Training > >> > < > >> > > >> > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > >> > > > >> > > >> > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > <http://www.mapr.com/> > > > > > > Now Available - Free Hadoop On-Demand Training > > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >
