Hello Timothy, PFA the profile file (it exceeded message limit, so I had to gzip it). Please excuse the length of query, it is a long query unioned 5 times. I have tried to reproduce with a smaller query, but have failed so far.
Yes, we are using MapR 6.0. Thanks, Lalit Mishra On Thu, Jan 25, 2018 at 2:37 AM, Timothy Farkas <[email protected]> wrote: > > > On 2018/01/23 14:04:42, Lalit Mishra <[email protected]> > wrote: > > Hello, > > > > We are using drill 1.11 (under yarn) on a 3 node cluster. > > Occasionally a query would remain stuck in the RUNNING state. The same > > query runs successfully on multiple occasions. I have not captured any > > information previous times this occurred, but have collected following on > > the latest occurrence - > > > > - Full json profile > > - Thread dumps on all three nodes > > > > I can provide these if needed. > > > > In the thread-dumps there are 107 threads tagged to the query id. > > 105 of them are stuck with following stack-trace - > > > > 2598df8d-8573-5e29-292c-fb343c99d280:frag:6:3 id=266 state=WAITING > > - waiting on <0x4a20ff6e> (a > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > - locked <0x4a20ff6e> (a > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > at sun.misc.Unsafe.park(Native Method) > > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ > ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > > at > > java.util.concurrent.LinkedBlockingDeque.takeFirst( > LinkedBlockingDeque.java:492) > > at > > java.util.concurrent.LinkedBlockingDeque.take( > LinkedBlockingDeque.java:680) > > at > > org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$ > UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61) > > at > > org.apache.drill.exec.work.batch.BaseRawBatchBuffer. > getNext(BaseRawBatchBuffer.java:170) > > at > > org.apache.drill.exec.physical.impl.unorderedreceiver. > UnorderedReceiverBatch.getNextBatch(UnorderedReceiverBatch.java:141) > > at > > org.apache.drill.exec.physical.impl.unorderedreceiver. > UnorderedReceiverBatch.next(UnorderedReceiverBatch.java:159) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:119) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:109) > > at > > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext( > AbstractSingleRecordBatch.java:51) > > at > > org.apache.drill.exec.physical.impl.project. > ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:164) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:119) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:109) > > at > > org.apache.drill.exec.physical.impl.xsort.managed. > ExternalSortBatch.loadBatch(ExternalSortBatch.java:406) > > at > > org.apache.drill.exec.physical.impl.xsort.managed. > ExternalSortBatch.load(ExternalSortBatch.java:357) > > at > > org.apache.drill.exec.physical.impl.xsort.managed. > ExternalSortBatch.innerNext(ExternalSortBatch.java:302) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:164) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:119) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:109) > > at > > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext( > AbstractSingleRecordBatch.java:51) > > at > > org.apache.drill.exec.physical.impl.svremover. > RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:164) > > at > > org.apache.drill.exec.physical.impl.BaseRootExec. > next(BaseRootExec.java:105) > > at > > org.apache.drill.exec.physical.impl.SingleSenderCreator$ > SingleSenderRootExec.innerNext(SingleSenderCreator.java:92) > > at > > org.apache.drill.exec.physical.impl.BaseRootExec. > next(BaseRootExec.java:95) > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor$1. > run(FragmentExecutor.java:234) > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor$1. > run(FragmentExecutor.java:227) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > > org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1595) > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.run( > FragmentExecutor.java:227) > > at > > org.apache.drill.common.SelfCleaningRunnable.run( > SelfCleaningRunnable.java:38) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > > > Locked synchronizers: count = 1 > > - java.util.concurrent.ThreadPoolExecutor$Worker@45083904 > > > > > > While 2 are stuck with - > > > > 2598df8d-8573-5e29-292c-fb343c99d280:frag:0:0 id=390 state=WAITING > > - waiting on <0x730eeaf1> (a > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > - locked <0x730eeaf1> (a > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > at sun.misc.Unsafe.park(Native Method) > > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ > ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > > at > > java.util.concurrent.LinkedBlockingDeque.takeFirst( > LinkedBlockingDeque.java:492) > > at > > java.util.concurrent.LinkedBlockingDeque.take( > LinkedBlockingDeque.java:680) > > at > > org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$ > UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61) > > at > > org.apache.drill.exec.work.batch.BaseRawBatchBuffer. > getNext(BaseRawBatchBuffer.java:170) > > at > > org.apache.drill.exec.physical.impl.mergereceiver. > MergingRecordBatch.getNext(MergingRecordBatch.java:147) > > at > > org.apache.drill.exec.physical.impl.mergereceiver. > MergingRecordBatch.innerNext(MergingRecordBatch.java:241) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:164) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:119) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:109) > > at > > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext( > AbstractSingleRecordBatch.java:51) > > at > > org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext( > LimitRecordBatch.java:115) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:164) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:119) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:109) > > at > > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext( > AbstractSingleRecordBatch.java:51) > > at > > org.apache.drill.exec.physical.impl.svremover. > RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:164) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:119) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:109) > > at > > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext( > AbstractSingleRecordBatch.java:51) > > at > > org.apache.drill.exec.physical.impl.project. > ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > > at > > org.apache.drill.exec.record.AbstractRecordBatch.next( > AbstractRecordBatch.java:164) > > at > > org.apache.drill.exec.physical.impl.BaseRootExec. > next(BaseRootExec.java:105) > > at > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext( > ScreenCreator.java:81) > > at > > org.apache.drill.exec.physical.impl.BaseRootExec. > next(BaseRootExec.java:95) > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor$1. > run(FragmentExecutor.java:234) > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor$1. > run(FragmentExecutor.java:227) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > > org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1595) > > at > > org.apache.drill.exec.work.fragment.FragmentExecutor.run( > FragmentExecutor.java:227) > > at > > org.apache.drill.common.SelfCleaningRunnable.run( > SelfCleaningRunnable.java:38) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > > > Locked synchronizers: count = 1 > > - java.util.concurrent.ThreadPoolExecutor$Worker@378527f8 > > > > > > Any help with regards to figuring out what is going wrong will be > > appreciated. Thanks in advance! > > > > Thanks, > > Lalit Mishra > > > > Hi Lalit, > > The stack traces you provided indicate that down stream operators are > waiting for data to be sent by upstream operators which are blocked. This > could mean that a scan operator is blocked reading from a data source, or > it could mean that an operator like Sort or HashAgg is getting stuck. Can > you please provide the query you are using along with the json profile? > > Also please note that Apache Drill does not have YARN support yet, the PR > is pending here https://github.com/apache/drill/pull/1011 . So are you > using MapR's proprietary distribution of Drill? > > Thanks, > Tim >
stuck_query_profile.tgz
Description: GNU Zip compressed data
