Victoria Markman created DRILL-3705:
---------------------------------------

             Summary: Query runs out of memory, reported as FAILED and leaves 
thread running 
                 Key: DRILL-3705
                 URL: https://issues.apache.org/jira/browse/DRILL-3705
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.2.0
            Reporter: Victoria Markman
            Assignee: Chris Westin


Single node drill installation
DRILL_MAX_DIRECT_MEMORY="2G"
DRILL_HEAP="1G"

Execute tpcds query 15 SF100 (parquet) with the settings above. Reproduces 2 
out of 3 times.
{code}
SELECT ca.ca_zip,
               Sum(cs.cs_sales_price)
FROM   catalog_sales    cs,
       customer         c,
       customer_address ca,
       date_dim         dd
WHERE  cs.cs_bill_customer_sk = c.c_customer_sk
       AND c.c_current_addr_sk = ca.ca_address_sk
       AND ( Substr(ca.ca_zip, 1, 5) IN ( '85669', '86197', '88274', '83405',
                                       '86475', '85392', '85460', '80348',
                                       '81792' )
              OR ca.ca_state IN ( 'CA', 'WA', 'GA' )
              OR cs.cs_sales_price > 500 )
       AND cs.cs_sold_date_sk = dd.d_date_sk
       AND dd.d_qoy = 1
       AND dd.d_year = 1998
GROUP  BY ca.ca_zip
ORDER  BY ca.ca_zip
LIMIT 100;
{code}

Query runs out of memory, but leaves thread behind even though it is reported 
as FAILED (expected result)

Snippet from jstack:
{code}
"2a2451ec-09d8-9f26-e856-5fd349ae72fd:frag:4:0" daemon prio=10 
tid=0x00007f5074140000 nid=0x3000 waiting on condition [0x00007f5055b66000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c012b038> (a 
java.util.concurrent.Semaphore$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
        at java.util.concurrent.Semaphore.acquire(Semaphore.java:472)
        at 
org.apache.drill.exec.ops.SendingAccountor.waitForSendComplete(SendingAccountor.java:48)
        - locked <0x00000000c012b068> (a 
org.apache.drill.exec.ops.SendingAccountor)
        at 
org.apache.drill.exec.ops.FragmentContext.waitForSendComplete(FragmentContext.java:436)
        at 
org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:112)
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:341)
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:173)
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
        at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

NPE in drillbit.log:
{code}
2015-08-24 23:52:04,486 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.10.88.133:31012 <--> 
/10.10.88.133:52417 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: java.lang.NullPointerException
        at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:99)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:150)
 [netty-handler-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
 [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.lang.NullPointerException: null
        at 
org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.checkForOutOfMemory(UnlimitedRawBatchBuffer.java:68)
 ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.batch.BaseRawBatchBuffer.handleOutOfMemory(BaseRawBatchBuffer.java:95)
 ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.batch.BaseRawBatchBuffer.enqueue(BaseRawBatchBuffer.java:83)
 ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.batch.AbstractDataCollector.batchArrived(AbstractDataCollector.java:105)
 ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.batch.IncomingBuffers.batchArrived(IncomingBuffers.java:75)
 ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.NonRootFragmentManager.handle(NonRootFragmentManager.java:73)
 ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at 
org.apache.drill.exec.rpc.data.DataResponseHandlerImpl.handle(DataResponseHandlerImpl.java:48)
 ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.rpc.data.DataServer.send(DataServer.java:176) 
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at 
org.apache.drill.exec.rpc.data.DataServer.handle(DataServer.java:142) 
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.rpc.data.DataServer.handle(DataServer.java:51) 
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at 
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) 
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at 
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) 
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
        ... 20 common frames omitted
2015-08-24 23:52:04,489 [BitClient-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.10.88.133:52417 <--> 
/10.10.88.133:31012 (data client).  Closing connection.
java.io.IOException: syscall:read(...)() failed: Connection reset by peer
2015-08-24 23:52:04,489 [BitClient-1] INFO  o.a.drill.exec.rpc.data.DataClient 
- Channel closed /10.10.88.133:52417 <--> /10.10.88.133:31012.
2015-08-24 23:52:04,505 [BitServer-6] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.10.88.133:31012 <--> 
/10.10.88.133:52418 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: java.lang.NullPointerException
{code}

Attached:
    drillbit.log
    2a2451ec-09d8-9f26-e856-5fd349ae72fd.sys.drill (query profile)
   jstack.txt ( stack output for the running drillbit )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to