Hello Timothy,

PFA the profile file (it exceeded message limit, so I had to gzip it).
Please excuse the length of query, it is a long query unioned 5 times. I
have tried to reproduce with a smaller query, but have failed so far.

Yes, we are using MapR 6.0.

Thanks,
Lalit Mishra

On Thu, Jan 25, 2018 at 2:37 AM, Timothy Farkas <[email protected]>
wrote:

>
>
> On 2018/01/23 14:04:42, Lalit Mishra <[email protected]>
> wrote:
> > Hello,
> >
> > We are using drill 1.11 (under yarn) on a 3 node cluster.
> > Occasionally a query would remain stuck in the RUNNING state. The same
> > query runs successfully on multiple occasions. I have not captured any
> > information previous times this occurred, but have collected following on
> > the latest occurrence -
> >
> >    - Full json profile
> >    - Thread dumps on all three nodes
> >
> > I can provide these if needed.
> >
> > In the thread-dumps there are 107 threads tagged to the query id.
> > 105 of them are stuck with following stack-trace -
> >
> > 2598df8d-8573-5e29-292c-fb343c99d280:frag:6:3 id=266 state=WAITING
> >     - waiting on <0x4a20ff6e> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >     - locked <0x4a20ff6e> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >     at sun.misc.Unsafe.park(Native Method)
> >     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> >     at
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> >     at
> > java.util.concurrent.LinkedBlockingDeque.takeFirst(
> LinkedBlockingDeque.java:492)
> >     at
> > java.util.concurrent.LinkedBlockingDeque.take(
> LinkedBlockingDeque.java:680)
> >     at
> > org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$
> UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61)
> >     at
> > org.apache.drill.exec.work.batch.BaseRawBatchBuffer.
> getNext(BaseRawBatchBuffer.java:170)
> >     at
> > org.apache.drill.exec.physical.impl.unorderedreceiver.
> UnorderedReceiverBatch.getNextBatch(UnorderedReceiverBatch.java:141)
> >     at
> > org.apache.drill.exec.physical.impl.unorderedreceiver.
> UnorderedReceiverBatch.next(UnorderedReceiverBatch.java:159)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(
> AbstractSingleRecordBatch.java:51)
> >     at
> > org.apache.drill.exec.physical.impl.project.
> ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.physical.impl.xsort.managed.
> ExternalSortBatch.loadBatch(ExternalSortBatch.java:406)
> >     at
> > org.apache.drill.exec.physical.impl.xsort.managed.
> ExternalSortBatch.load(ExternalSortBatch.java:357)
> >     at
> > org.apache.drill.exec.physical.impl.xsort.managed.
> ExternalSortBatch.innerNext(ExternalSortBatch.java:302)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(
> AbstractSingleRecordBatch.java:51)
> >     at
> > org.apache.drill.exec.physical.impl.svremover.
> RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.physical.impl.BaseRootExec.
> next(BaseRootExec.java:105)
> >     at
> > org.apache.drill.exec.physical.impl.SingleSenderCreator$
> SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
> >     at
> > org.apache.drill.exec.physical.impl.BaseRootExec.
> next(BaseRootExec.java:95)
> >     at
> > org.apache.drill.exec.work.fragment.FragmentExecutor$1.
> run(FragmentExecutor.java:234)
> >     at
> > org.apache.drill.exec.work.fragment.FragmentExecutor$1.
> run(FragmentExecutor.java:227)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1595)
> >     at
> > org.apache.drill.exec.work.fragment.FragmentExecutor.run(
> FragmentExecutor.java:227)
> >     at
> > org.apache.drill.common.SelfCleaningRunnable.run(
> SelfCleaningRunnable.java:38)
> >     at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >     at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >     at java.lang.Thread.run(Thread.java:748)
> >
> >     Locked synchronizers: count = 1
> >       - java.util.concurrent.ThreadPoolExecutor$Worker@45083904
> >
> >
> > While 2 are stuck with -
> >
> > 2598df8d-8573-5e29-292c-fb343c99d280:frag:0:0 id=390 state=WAITING
> >     - waiting on <0x730eeaf1> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >     - locked <0x730eeaf1> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >     at sun.misc.Unsafe.park(Native Method)
> >     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> >     at
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> >     at
> > java.util.concurrent.LinkedBlockingDeque.takeFirst(
> LinkedBlockingDeque.java:492)
> >     at
> > java.util.concurrent.LinkedBlockingDeque.take(
> LinkedBlockingDeque.java:680)
> >     at
> > org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$
> UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61)
> >     at
> > org.apache.drill.exec.work.batch.BaseRawBatchBuffer.
> getNext(BaseRawBatchBuffer.java:170)
> >     at
> > org.apache.drill.exec.physical.impl.mergereceiver.
> MergingRecordBatch.getNext(MergingRecordBatch.java:147)
> >     at
> > org.apache.drill.exec.physical.impl.mergereceiver.
> MergingRecordBatch.innerNext(MergingRecordBatch.java:241)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(
> AbstractSingleRecordBatch.java:51)
> >     at
> > org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(
> LimitRecordBatch.java:115)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(
> AbstractSingleRecordBatch.java:51)
> >     at
> > org.apache.drill.exec.physical.impl.svremover.
> RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:119)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:109)
> >     at
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(
> AbstractSingleRecordBatch.java:51)
> >     at
> > org.apache.drill.exec.physical.impl.project.
> ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
> >     at
> > org.apache.drill.exec.record.AbstractRecordBatch.next(
> AbstractRecordBatch.java:164)
> >     at
> > org.apache.drill.exec.physical.impl.BaseRootExec.
> next(BaseRootExec.java:105)
> >     at
> > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(
> ScreenCreator.java:81)
> >     at
> > org.apache.drill.exec.physical.impl.BaseRootExec.
> next(BaseRootExec.java:95)
> >     at
> > org.apache.drill.exec.work.fragment.FragmentExecutor$1.
> run(FragmentExecutor.java:234)
> >     at
> > org.apache.drill.exec.work.fragment.FragmentExecutor$1.
> run(FragmentExecutor.java:227)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1595)
> >     at
> > org.apache.drill.exec.work.fragment.FragmentExecutor.run(
> FragmentExecutor.java:227)
> >     at
> > org.apache.drill.common.SelfCleaningRunnable.run(
> SelfCleaningRunnable.java:38)
> >     at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >     at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >     at java.lang.Thread.run(Thread.java:748)
> >
> >     Locked synchronizers: count = 1
> >       - java.util.concurrent.ThreadPoolExecutor$Worker@378527f8
> >
> >
> > Any help with regards to figuring out what is going wrong will be
> > appreciated. Thanks in advance!
> >
> > Thanks,
> > Lalit Mishra
> >
>
> Hi Lalit,
>
> The stack traces you provided indicate that down stream operators are
> waiting for data to be sent by upstream operators which are blocked. This
> could mean that a scan operator is blocked reading from a data source, or
> it could mean that an operator like Sort or HashAgg is getting stuck. Can
> you please provide the query you are using along with the json profile?
>
> Also please note that Apache Drill does not have YARN support yet, the PR
> is pending here https://github.com/apache/drill/pull/1011 . So are you
> using MapR's proprietary distribution of Drill?
>
> Thanks,
> Tim
>

Attachment: stuck_query_profile.tgz
Description: GNU Zip compressed data

Reply via email to