On 2018/01/23 14:04:42, Lalit Mishra <[email protected]> wrote: 
> Hello,
> 
> We are using drill 1.11 (under yarn) on a 3 node cluster.
> Occasionally a query would remain stuck in the RUNNING state. The same
> query runs successfully on multiple occasions. I have not captured any
> information previous times this occurred, but have collected following on
> the latest occurrence -
> 
>    - Full json profile
>    - Thread dumps on all three nodes
> 
> I can provide these if needed.
> 
> In the thread-dumps there are 107 threads tagged to the query id.
> 105 of them are stuck with following stack-trace -
> 
> 2598df8d-8573-5e29-292c-fb343c99d280:frag:6:3 id=266 state=WAITING
>     - waiting on <0x4a20ff6e> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>     - locked <0x4a20ff6e> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>     at sun.misc.Unsafe.park(Native Method)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>     at
> java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
>     at
> java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
>     at
> org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61)
>     at
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:170)
>     at
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getNextBatch(UnorderedReceiverBatch.java:141)
>     at
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.next(UnorderedReceiverBatch.java:159)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>     at
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>     at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>     at
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:406)
>     at
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:357)
>     at
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:302)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>     at
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>     at
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>     at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105)
>     at
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
>     at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95)
>     at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
>     at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>     at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
>     at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> 
>     Locked synchronizers: count = 1
>       - java.util.concurrent.ThreadPoolExecutor$Worker@45083904
> 
> 
> While 2 are stuck with -
> 
> 2598df8d-8573-5e29-292c-fb343c99d280:frag:0:0 id=390 state=WAITING
>     - waiting on <0x730eeaf1> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>     - locked <0x730eeaf1> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>     at sun.misc.Unsafe.park(Native Method)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>     at
> java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
>     at
> java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
>     at
> org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61)
>     at
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:170)
>     at
> org.apache.drill.exec.physical.impl.mergereceiver.MergingRecordBatch.getNext(MergingRecordBatch.java:147)
>     at
> org.apache.drill.exec.physical.impl.mergereceiver.MergingRecordBatch.innerNext(MergingRecordBatch.java:241)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>     at
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>     at
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:115)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>     at
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>     at
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>     at
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>     at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
>     at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>     at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105)
>     at
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
>     at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95)
>     at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
>     at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>     at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
>     at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> 
>     Locked synchronizers: count = 1
>       - java.util.concurrent.ThreadPoolExecutor$Worker@378527f8
> 
> 
> Any help with regards to figuring out what is going wrong will be
> appreciated. Thanks in advance!
> 
> Thanks,
> Lalit Mishra
> 

Hi Lalit,

The stack traces you provided indicate that down stream operators are waiting 
for data to be sent by upstream operators which are blocked. This could mean 
that a scan operator is blocked reading from a data source, or it could mean 
that an operator like Sort or HashAgg is getting stuck. Can you please provide 
the query you are using along with the json profile?

Also please note that Apache Drill does not have YARN support yet, the PR is 
pending here https://github.com/apache/drill/pull/1011 . So are you using 
MapR's proprietary distribution of Drill?

Thanks,
Tim

Reply via email to