Hi Lalit Your profile hints that it is stuck in the Major Fragment 06-xx-xx, which is fed data from 16-xx-xx via 11-Exchange.
Looking at the operators’ overview and the similarity with other major fragments, only this one seems to be stuck at completing the sort. Could you provide the JStack on any of the nodes which are hosting fragments of 06-xx-xx ? Thanks Kunal From: Lalit Mishra [mailto:[email protected]] Sent: Thursday, January 25, 2018 4:03 AM To: [email protected] Subject: Re: Queries getting stuck in RUNNING state occasionally Hello Timothy, PFA the profile file (it exceeded message limit, so I had to gzip it). Please excuse the length of query, it is a long query unioned 5 times. I have tried to reproduce with a smaller query, but have failed so far. Yes, we are using MapR 6.0. Thanks, Lalit Mishra On Thu, Jan 25, 2018 at 2:37 AM, Timothy Farkas <[email protected]<mailto:[email protected]>> wrote: On 2018/01/23 14:04:42, Lalit Mishra <[email protected]<mailto:[email protected]>> wrote: > Hello, > > We are using drill 1.11 (under yarn) on a 3 node cluster. > Occasionally a query would remain stuck in the RUNNING state. The same > query runs successfully on multiple occasions. I have not captured any > information previous times this occurred, but have collected following on > the latest occurrence - > > - Full json profile > - Thread dumps on all three nodes > > I can provide these if needed. > > In the thread-dumps there are 107 threads tagged to the query id. > 105 of them are stuck with following stack-trace - > > 2598df8d-8573-5e29-292c-fb343c99d280:frag:6:3 id=266 state=WAITING > - waiting on <0x4a20ff6e> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > - locked <0x4a20ff6e> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492) > at > java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) > at > org.apache.drill.exec.work<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61) > at > org.apache.drill.exec.work<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.batch.BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:170) > at > org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getNextBatch(UnorderedReceiverBatch.java:141) > at > org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.next(UnorderedReceiverBatch.java:159) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:406) > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:357) > at > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:302) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) > at > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) > at > org.apache.drill.exec.work<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234) > at > org.apache.drill.exec.work<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) > at > org.apache.drill.exec.work<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.FragmentExecutor.run(FragmentExecutor.java:227) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Locked synchronizers: count = 1 > - > java.util.concurrent.ThreadPoolExecutor$Worker@45083904<mailto:java.util.concurrent.ThreadPoolExecutor$Worker@45083904> > > > While 2 are stuck with - > > 2598df8d-8573-5e29-292c-fb343c99d280:frag:0:0 id=390 state=WAITING > - waiting on <0x730eeaf1> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > - locked <0x730eeaf1> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492) > at > java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) > at > org.apache.drill.exec.work<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61) > at > org.apache.drill.exec.work<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.batch.BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:170) > at > org.apache.drill.exec.physical.impl.mergereceiver.MergingRecordBatch.getNext(MergingRecordBatch.java:147) > at > org.apache.drill.exec.physical.impl.mergereceiver.MergingRecordBatch.innerNext(MergingRecordBatch.java:241) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:115) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) > at > org.apache.drill.exec.work<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234) > at > org.apache.drill.exec.work<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) > at > org.apache.drill.exec.work<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.drill.exec.work&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=Bm8hdMuQbcqi5C4BwP15T13EUxF8ziRNyztWcXWPXgM&e=>.fragment.FragmentExecutor.run(FragmentExecutor.java:227) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Locked synchronizers: count = 1 > - > java.util.concurrent.ThreadPoolExecutor$Worker@378527f8<mailto:java.util.concurrent.ThreadPoolExecutor$Worker@378527f8> > > > Any help with regards to figuring out what is going wrong will be > appreciated. Thanks in advance! > > Thanks, > Lalit Mishra > Hi Lalit, The stack traces you provided indicate that down stream operators are waiting for data to be sent by upstream operators which are blocked. This could mean that a scan operator is blocked reading from a data source, or it could mean that an operator like Sort or HashAgg is getting stuck. Can you please provide the query you are using along with the json profile? Also please note that Apache Drill does not have YARN support yet, the PR is pending here https://github.com/apache/drill/pull/1011<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1011&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=umXwIIDPu7CIrHHD2R12jqdykYVdniRdtdHmfVScofg&s=5S3fhzWCf4BMewMoMObRX36hSj1Nb5UbrDTA07DXmD4&e=> . So are you using MapR's proprietary distribution of Drill? Thanks, Tim
