I didn't notice at first that Adam said "no matter who the foreman is".
Another suspicion I have is that our current logic for assigning work will assign to the exact same nodes every time we query a particular table. Changing affinity factor may change it, but it will still be the same every time. That is my suspicion, but I am not sure why shutting down the drillbit would improve performance. I would expect that shutting down the drillbit would result in a different drillbit becoming the hotspot. On Wed, Mar 25, 2015 at 12:16 PM, Jacques Nadeau <[email protected]> wrote: > On Steven's point, the node that the client connects to is not currently > randomized. Given your description of behavior, I'm not sure that you're > hitting 2512 or just general undesirable distribution. > > On Wed, Mar 25, 2015 at 10:18 AM, Steven Phillips <[email protected]> > wrote: > > > This is a known issue: > > > > https://issues.apache.org/jira/browse/DRILL-2512 > > > > On Wed, Mar 25, 2015 at 8:13 AM, Andries Engelbrecht < > > [email protected]> wrote: > > > > > What version of Drill are you running? > > > > > > Any hints when looking at the query profiles? Is the node that is being > > > hammered the foreman for the queries and most of the major fragments > are > > > tied to the foreman? > > > > > > —Andries > > > > > > > > > On Mar 25, 2015, at 12:00 AM, Adam Gilmore <[email protected]> > > wrote: > > > > > > > Hi guys, > > > > > > > > I'm trying to understand how this could be possible. I have a Hadoop > > > > cluster of a name node and two data nodes setup. All have identical > > > specs > > > > in terms of CPU/RAM etc. > > > > > > > > The two data nodes have a replicated HDFS setup where I'm storing > some > > > > Parquet files. > > > > > > > > A Drill cluster (with Zookeeper) is running with Drillbits on all > three > > > > servers. > > > > > > > > When I submit a query to *any* of the Drillbits, no matter who the > > > foreman > > > > is, one particular data node gets picked to do the vast majority of > the > > > > work. > > > > > > > > We've even added three more task nodes to the cluster and everything > > > still > > > > puts a huge load on one particular server. > > > > > > > > There is nothing unique about this data node. HDFS is fully > replicated > > > (no > > > > unreplicated blocks) to the other data node. > > > > > > > > I know that Drill tries to get data locality, so I'm wondering if > this > > is > > > > the cause, but this essentially swamping this data node with 100% CPU > > > usage > > > > while leaving the others barely doing any work. > > > > > > > > As soon as we shut down the Drillbit on this data node, query > > performance > > > > increases significantly. > > > > > > > > Any thoughts on how I can troubleshoot why Drill is picking that > > > particular > > > > node? > > > > > > > > > > > > -- > > Steven Phillips > > Software Engineer > > > > mapr.com > > > -- Steven Phillips Software Engineer mapr.com
