so yes, you are correct, you should set it to 1 x 32 x 0.7 Btw, Drill should already have set this option to 32 x 0.7
On Mon, Feb 15, 2016 at 11:36 AM, Abdel Hakim Deneche <[email protected] > wrote: > Don't be, it took me quite some time to figure out this one either =P > > the "number of active drillbits" refers to the number of Drillbits running > on each node of the cluster. Generally, you have 1 active Drillbit per node. > > On Mon, Feb 15, 2016 at 11:22 AM, John Omernik <[email protected]> wrote: > >> I am really sorry for being dense here, but based on your comment then, >> and >> the docs then if you had sixteen 32 core machines, but only one drill bit >> running per node, you'd still use 1 (one drill bit per node) * 32 (the >> number of cores) * 0.7 (the modifier in the docs) to get 23 as the number >> to set for planner.width_max_per_node Not 16 * 32 * 0.7. A reading of >> the >> docs is confusing (see below) you can read that as number of active drill >> bits, which on a sixteen node cluster, one per node would be 16 * 32 >> (cores >> per node) * 0.7. But I think you are saying that we should be taking 1 >> drill bit per node * 32 * 0.7 ... correct? >> >> Quote from the docs: >> number of active drillbits (typically one per node) * number of cores per >> node * 0.7 >> >> On Mon, Feb 15, 2016 at 11:15 AM, Abdel Hakim Deneche < >> [email protected] >> > wrote: >> >> > No, it's the maximum number of threads each drillbit will be able to >> spawn >> > for every major fragment of a query. >> > >> > If you run a query on a cluster of 32 core machines, and the query plan >> > contains multiple major fragments, each major fragment will have "at >> most" >> > 32 x 0.7= 23 minor fragments (or threads) running in parallel on every >> > drillbit. The "at most" is important here, as other factors limit how >> many >> > minor fragments can run in parallel, for example nature and size of the >> > data. >> > >> > On Mon, Feb 15, 2016 at 7:41 AM, John Omernik <[email protected]> wrote: >> > >> > > * >> > > >> > >> https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing >> > > < >> > > >> > >> https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing >> > > >* >> > > >> > > >> > > *On this page, on the setting planner.width.max_per_node it says the >> > > below. In the equation, of number of active drillbits * number of >> cores >> > > per node * 0.7, is the number of active drillbits the number of drill >> > bits >> > > PER NODE (as this setting is per node) or is that the number of active >> > > drill bits per cluster? The example is unclear because it only shows >> an >> > > example on a single node cluster. (Typically 1 per node doesn't >> clarify >> > > whether that number should be per node or per drill bit)* >> > > >> > > *Thanks!* >> > > >> > > >> > > >> > > The maximum width per node defines the maximum degree of parallelism >> for >> > > any fragment of a query, but the setting applies at the level of a >> single >> > > node in the cluster. The *default* maximum degree of parallelism per >> node >> > > is calculated as follows, with the theoretical maximum automatically >> > scaled >> > > back (and rounded down) so that only 70% of the actual available >> capacity >> > > is taken into account: number of active drillbits (typically one per >> > node) >> > > * number of cores per node * 0.7 >> > > >> > > For example, on a single-node test system with 2 cores and >> > hyper-threading >> > > >> > > enabled: 1 * 4 * 0.7 = 3 >> > > >> > >> > >> > >> > -- >> > >> > Abdelhakim Deneche >> > >> > Software Engineer >> > >> > <http://www.mapr.com/> >> > >> > >> > Now Available - Free Hadoop On-Demand Training >> > < >> > >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >> > > >> > >> > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available> > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
