2018-03-08 14:50 GMT+01:00 Ivan Kelly <iv...@apache.org>: > It just occurred to me that this could be a problem with the recycler. > If we recycle a buffer too early, but then keep using it, another user > could pick it up, and between them they could corrupt the data that > would cause it to be unhandled. >
Maybe current work from Sijie, Andrey and JV which Sijie is backporting to branch-4.6 could resolve the issue > > -Ivan > > On Thu, Mar 8, 2018 at 1:49 PM, Enrico Olivelli <eolive...@gmail.com> > wrote: > > > > > > 2018-03-08 11:57 GMT+01:00 Ivan Kelly <iv...@apache.org>: > >> > >> Hmm, yes, looks messed up. Do you have a reliable repro of this? > >> > >> "at io.netty.channel.DefaultChannelPipeline.onUnhandledInboundMessage" > >> <- it seems the message itself wasn't handled. Do you have any more > >> bookie side logs leading up to this event? > > > > > > > > The only suspicious logs are for logger named > > "org.apache.bookkeeper.bookie.EntryLogger" > > > > > > 18-03-08-09-22-40 Created new entry log file > > /data/qasql/bookie/./bookie_data/current/4b0.log for logId 1200. > > 18-03-08-09-32-41 Failed to get ledgers map index from: 53.log : No > > ledgers map index found on entryLogId53 > > 18-03-08-09-33-10 Failed to get ledgers map index from: 54.log : No > > ledgers map index found on entryLogId54 > > 18-03-08-09-33-10 Failed to get ledgers map index from: 375.log : > No > > ledgers map index found on entryLogId375 > > 18-03-08-09-33-13 Failed to get ledgers map index from: 878.log : > No > > ledgers map index found on entryLogId878 > > 18-03-08-09-33-22 Failed to get ledgers map index from: 1197.log : > No > > ledgers map index found on entryLogId1197 > > 18-03-08-09-33-57 Failed to get ledgers map index from: 1198.log : > No > > ledgers map index found on entryLogId1198 > > 18-03-08-09-34-54 Failed to get ledgers map index from: 1199.log : > No > > ledgers map index found on entryLogId1199 > > > > I will try to run Bookie with level debug, but it produces lots of spam > > > > Thanks > > Enrico > > > >> > >> > >> -Ivan > >> > >> On Thu, Mar 8, 2018 at 9:18 AM, Enrico Olivelli <eolive...@gmail.com> > >> wrote: > >> > Hi all, > >> > I am seeing this bad errors in some test environments with 4.6.1 > >> > The errors appear during rolling restart of the application, the test > >> > env is > >> > made of 6 machines: > >> > - 3 bookies > >> > - 3 client machines (with several BK clients, of different > sub-systems) > >> > - running with 4.6.1 both client and servers > >> > > >> > *I do not have* reports of this errors from production, already > running > >> > 4.6.1 for the last month > >> > > >> > But the problem is quite scary > >> > > >> > This is a sample of relevant errors on clients (in this case Majordodo > >> > brokers, with log level = INFO) > >> > > >> > logs/org.apache.bookkeeper.client.PendingAddOp: > >> > 18-03-07-09-51-43 Write of ledger entry to quorum failed: > L366634 > >> > E2557 > >> > 18-03-07-09-51-43 Write of ledger entry to quorum failed: > L366634 > >> > E2558 > >> > 18-03-07-09-51-43 Write of ledger entry to quorum failed: > L366634 > >> > E2559 > >> > 18-03-07-15-59-55 Failed to write entry (366680, 1865): Bookie > >> > operation timeout > >> > 18-03-07-15-59-55 Failed to write entry (366680, 1865): Bookie > >> > operation timeout > >> > 18-03-07-16-00-00 Failed to write entry (366680, 1865): Bookie > >> > operation timeout > >> > 18-03-07-16-00-00 Failed to write entry (366680, 1865): Bookie > >> > operation timeout > >> > 18-03-07-16-00-05 Failed to write entry (366680, 1865): Bookie > >> > operation timeout > >> > > >> > > >> > org.apache.bookkeeper.proto.PerChannelBookieClient: > >> > > >> > 18-03-07-10-06-44 Unexpected exception caught by bookie client > >> > channel > >> > handler > >> > 18-03-07-10-06-44 io.netty.handler.codec.DecoderException: > >> > java.lang.IllegalStateException: Received unknown response : op code > = 6 > >> > io.netty.handler.codec.DecoderException: > >> > java.lang.IllegalStateException: > >> > Received unknown response : op code = 6 > >> > at > >> > > >> > io.netty.handler.codec.MessageToMessageDecoder.channelRead( > MessageToMessageDecoder.java:98) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:362) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:348) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead( > AbstractChannelHandlerContext.java:340) > >> > at > >> > > >> > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead( > ByteToMessageDecoder.java:310) > >> > at > >> > > >> > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead( > ByteToMessageDecoder.java:297) > >> > at > >> > > >> > io.netty.handler.codec.ByteToMessageDecoder.callDecode( > ByteToMessageDecoder.java:413) > >> > at > >> > > >> > io.netty.handler.codec.ByteToMessageDecoder.channelRead( > ByteToMessageDecoder.java:265) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:362) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:348) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead( > AbstractChannelHandlerContext.java:340) > >> > at > >> > > >> > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead( > DefaultChannelPipeline.java:1414) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:362) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:348) > >> > at > >> > > >> > io.netty.channel.DefaultChannelPipeline.fireChannelRead( > DefaultChannelPipeline.java:945) > >> > at > >> > > >> > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe. > epollInReady(AbstractEpollStreamChannel.java:800) > >> > at > >> > > >> > io.netty.channel.epoll.EpollEventLoop.processReady( > EpollEventLoop.java:404) > >> > at > >> > io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304) > >> > at > >> > > >> > io.netty.util.concurrent.SingleThreadEventExecutor$5. > run(SingleThreadEventExecutor.java:886) > >> > at java.base/java.lang.Thread.run(Thread.java:844) > >> > Caused by: java.lang.IllegalStateException: Received unknown > response : > >> > op > >> > code = 6 > >> > at > >> > > >> > org.apache.bookkeeper.proto.BookieProtoEncoding$ > ResponseEnDeCoderPreV3.decode(BookieProtoEncoding.java:294) > >> > at > >> > > >> > org.apache.bookkeeper.proto.BookieProtoEncoding$ > ResponseDecoder.decode(BookieProtoEncoding.java:478) > >> > at > >> > > >> > io.netty.handler.codec.MessageToMessageDecoder.channelRead( > MessageToMessageDecoder.java:88) > >> > > >> > > >> > ... 19 more > >> > > >> > > >> > Errors on Bookie side, with log level = INFO) > >> > > >> > > >> > io.netty.channel.DefaultChannelPipeline: > >> > 8-03-07-09-51-38 An exceptionCaught() event was fired, and it > >> > reached > >> > at the tail of the pipeline. It usually means the last handler in the > >> > pipeline did not handle the exception. > >> > 18-03-07-09-51-38 io.netty.util.IllegalReferenceCountException: > >> > refCnt: 0, increment: 1 > >> > io.netty.util.IllegalReferenceCountException: refCnt: 0, increment: 1 > >> > at > >> > > >> > io.netty.buffer.AbstractReferenceCountedByteBuf.release0( > AbstractReferenceCountedByteBuf.java:100) > >> > at > >> > > >> > io.netty.buffer.AbstractReferenceCountedByteBuf.release( > AbstractReferenceCountedByteBuf.java:84) > >> > at > >> > io.netty.util.ReferenceCountUtil.release(ReferenceCountUtil.java:88) > >> > at > >> > > >> > io.netty.channel.DefaultChannelPipeline.onUnhandledInboundMessage( > DefaultChannelPipeline.java:1202) > >> > at > >> > > >> > io.netty.channel.DefaultChannelPipeline$TailContext.channelRead( > DefaultChannelPipeline.java:1299) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:362) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:348) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead( > AbstractChannelHandlerContext.java:340) > >> > at > >> > > >> > io.netty.channel.ChannelInboundHandlerAdapter.channelRead( > ChannelInboundHandlerAdapter.java:86) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:362) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:348) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead( > AbstractChannelHandlerContext.java:340) > >> > at > >> > > >> > org.apache.bookkeeper.proto.BookieRequestHandler.channelRead( > BookieRequestHandler.java:78) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:362) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:348) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead( > AbstractChannelHandlerContext.java:340) > >> > at > >> > > >> > io.netty.channel.ChannelInboundHandlerAdapter.channelRead( > ChannelInboundHandlerAdapter.java:86) > >> > at > >> > > >> > org.apache.bookkeeper.proto.AuthHandler$ServerSideHandler. > channelRead(AuthHandler.java:88) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:362) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:348) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead( > AbstractChannelHandlerContext.java:340) > >> > at > >> > > >> > io.netty.handler.codec.MessageToMessageDecoder.channelRead( > MessageToMessageDecoder.java:102) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:362) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:348) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead( > AbstractChannelHandlerContext.java:340) > >> > at > >> > > >> > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead( > ByteToMessageDecoder.java:310) > >> > at > >> > > >> > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead( > ByteToMessageDecoder.java:297) > >> > at > >> > > >> > io.netty.handler.codec.ByteToMessageDecoder.callDecode( > ByteToMessageDecoder.java:413) > >> > at > >> > > >> > io.netty.handler.codec.ByteToMessageDecoder.channelRead( > ByteToMessageDecoder.java:265) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:362) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:348) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead( > AbstractChannelHandlerContext.java:340) > >> > at > >> > > >> > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead( > DefaultChannelPipeline.java:1414) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:362) > >> > at > >> > > >> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead( > AbstractChannelHandlerContext.java:348) > >> > at > >> > > >> > io.netty.channel.DefaultChannelPipeline.fireChannelRead( > DefaultChannelPipeline.java:945) > >> > at > >> > > >> > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe. > epollInReady(AbstractEpollStreamChannel.java:800) > >> > at > >> > > >> > io.netty.channel.epoll.EpollEventLoop.processReady( > EpollEventLoop.java:404) > >> > at > >> > io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304) > >> > at > >> > > >> > io.netty.util.concurrent.SingleThreadEventExecutor$5. > run(SingleThreadEventExecutor.java:886) > >> > at java.base/java.lang.Thread.run(Thread.java:844) > >> > > >> > > > > > >