This is a known issue: https://issues.apache.org/jira/browse/DRILL-2512
On Wed, Mar 25, 2015 at 8:13 AM, Andries Engelbrecht < [email protected]> wrote: > What version of Drill are you running? > > Any hints when looking at the query profiles? Is the node that is being > hammered the foreman for the queries and most of the major fragments are > tied to the foreman? > > —Andries > > > On Mar 25, 2015, at 12:00 AM, Adam Gilmore <[email protected]> wrote: > > > Hi guys, > > > > I'm trying to understand how this could be possible. I have a Hadoop > > cluster of a name node and two data nodes setup. All have identical > specs > > in terms of CPU/RAM etc. > > > > The two data nodes have a replicated HDFS setup where I'm storing some > > Parquet files. > > > > A Drill cluster (with Zookeeper) is running with Drillbits on all three > > servers. > > > > When I submit a query to *any* of the Drillbits, no matter who the > foreman > > is, one particular data node gets picked to do the vast majority of the > > work. > > > > We've even added three more task nodes to the cluster and everything > still > > puts a huge load on one particular server. > > > > There is nothing unique about this data node. HDFS is fully replicated > (no > > unreplicated blocks) to the other data node. > > > > I know that Drill tries to get data locality, so I'm wondering if this is > > the cause, but this essentially swamping this data node with 100% CPU > usage > > while leaving the others barely doing any work. > > > > As soon as we shut down the Drillbit on this data node, query performance > > increases significantly. > > > > Any thoughts on how I can troubleshoot why Drill is picking that > particular > > node? > > -- Steven Phillips Software Engineer mapr.com
