Don't be, it took me quite some time to figure out this one either =P the "number of active drillbits" refers to the number of Drillbits running on each node of the cluster. Generally, you have 1 active Drillbit per node.
On Mon, Feb 15, 2016 at 11:22 AM, John Omernik <[email protected]> wrote: > I am really sorry for being dense here, but based on your comment then, and > the docs then if you had sixteen 32 core machines, but only one drill bit > running per node, you'd still use 1 (one drill bit per node) * 32 (the > number of cores) * 0.7 (the modifier in the docs) to get 23 as the number > to set for planner.width_max_per_node Not 16 * 32 * 0.7. A reading of the > docs is confusing (see below) you can read that as number of active drill > bits, which on a sixteen node cluster, one per node would be 16 * 32 (cores > per node) * 0.7. But I think you are saying that we should be taking 1 > drill bit per node * 32 * 0.7 ... correct? > > Quote from the docs: > number of active drillbits (typically one per node) * number of cores per > node * 0.7 > > On Mon, Feb 15, 2016 at 11:15 AM, Abdel Hakim Deneche < > [email protected] > > wrote: > > > No, it's the maximum number of threads each drillbit will be able to > spawn > > for every major fragment of a query. > > > > If you run a query on a cluster of 32 core machines, and the query plan > > contains multiple major fragments, each major fragment will have "at > most" > > 32 x 0.7= 23 minor fragments (or threads) running in parallel on every > > drillbit. The "at most" is important here, as other factors limit how > many > > minor fragments can run in parallel, for example nature and size of the > > data. > > > > On Mon, Feb 15, 2016 at 7:41 AM, John Omernik <[email protected]> wrote: > > > > > * > > > > > > https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing > > > < > > > > > > https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing > > > >* > > > > > > > > > *On this page, on the setting planner.width.max_per_node it says the > > > below. In the equation, of number of active drillbits * number of > cores > > > per node * 0.7, is the number of active drillbits the number of drill > > bits > > > PER NODE (as this setting is per node) or is that the number of active > > > drill bits per cluster? The example is unclear because it only shows > an > > > example on a single node cluster. (Typically 1 per node doesn't > clarify > > > whether that number should be per node or per drill bit)* > > > > > > *Thanks!* > > > > > > > > > > > > The maximum width per node defines the maximum degree of parallelism > for > > > any fragment of a query, but the setting applies at the level of a > single > > > node in the cluster. The *default* maximum degree of parallelism per > node > > > is calculated as follows, with the theoretical maximum automatically > > scaled > > > back (and rounded down) so that only 70% of the actual available > capacity > > > is taken into account: number of active drillbits (typically one per > > node) > > > * number of cores per node * 0.7 > > > > > > For example, on a single-node test system with 2 cores and > > hyper-threading > > > > > > enabled: 1 * 4 * 0.7 = 3 > > > > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > <http://www.mapr.com/> > > > > > > Now Available - Free Hadoop On-Demand Training > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
