Il gio 8 mar 2018, 22:22 Enrico Olivelli <eolive...@gmail.com> ha scritto:
> > > Il gio 8 mar 2018, 22:08 Sijie Guo <guosi...@gmail.com> ha scritto: > >> This is even interesting with v3 protocol. Currently the v3 protocol >> copies the byte string and it doesn’t even use pooled buffer. So I am not >> sure if the issue comes from bookie or an issue at the netty layer. It >> would be good to get a repo. >> > > Interesting idea! I have upgraded to latest netty version both bookie and > clients in this new version of my application! > > In production I have an older version of Netty, > 4.1.21, not 4.1.22 I cannot check which version now, I will report back the version within > some hour, when I will have access to my work laptop > > Enrico > > >> Sijie >> >> On Thu, Mar 8, 2018 at 12:29 PM Enrico Olivelli <eolive...@gmail.com> >> wrote: >> >>> >>> >>> Il gio 8 mar 2018, 20:52 Sijie Guo <guosi...@gmail.com> ha scritto: >>> >>>> On Thu, Mar 8, 2018 at 7:42 AM, Enrico Olivelli <eolive...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> 2018-03-08 14:50 GMT+01:00 Ivan Kelly <iv...@apache.org>: >>>>> >>>>>> It just occurred to me that this could be a problem with the recycler. >>>>>> If we recycle a buffer too early, but then keep using it, another user >>>>>> could pick it up, and between them they could corrupt the data that >>>>>> would cause it to be unhandled. >>>>>> >>>>> >>>>> >>>>> Maybe current work from Sijie, Andrey and JV which Sijie is >>>>> backporting to branch-4.6 could resolve the issue >>>>> >>>> >>>> >>>> I think that is unrelated. The change Andrey, JV and me dealt with is >>>> the PendingAddOp. It is on client side not bookie side. >>>> >>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> -Ivan >>>>>> >>>>>> On Thu, Mar 8, 2018 at 1:49 PM, Enrico Olivelli <eolive...@gmail.com> >>>>>> wrote: >>>>>> > >>>>>> > >>>>>> > 2018-03-08 11:57 GMT+01:00 Ivan Kelly <iv...@apache.org>: >>>>>> >> >>>>>> >> Hmm, yes, looks messed up. Do you have a reliable repro of this? >>>>>> >> >>>>>> >> "at >>>>>> io.netty.channel.DefaultChannelPipeline.onUnhandledInboundMessage" >>>>>> >> <- it seems the message itself wasn't handled. Do you have any more >>>>>> >> bookie side logs leading up to this event? >>>>>> > >>>>>> > >>>>>> > >>>>>> > The only suspicious logs are for logger named >>>>>> > "org.apache.bookkeeper.bookie.EntryLogger" >>>>>> > >>>>>> > >>>>>> > 18-03-08-09-22-40 Created new entry log file >>>>>> > /data/qasql/bookie/./bookie_data/current/4b0.log for logId 1200. >>>>>> > 18-03-08-09-32-41 Failed to get ledgers map index from: >>>>>> 53.log : No >>>>>> > ledgers map index found on entryLogId53 >>>>>> > 18-03-08-09-33-10 Failed to get ledgers map index from: >>>>>> 54.log : No >>>>>> > ledgers map index found on entryLogId54 >>>>>> > 18-03-08-09-33-10 Failed to get ledgers map index from: >>>>>> 375.log : No >>>>>> > ledgers map index found on entryLogId375 >>>>>> > 18-03-08-09-33-13 Failed to get ledgers map index from: >>>>>> 878.log : No >>>>>> > ledgers map index found on entryLogId878 >>>>>> > 18-03-08-09-33-22 Failed to get ledgers map index from: >>>>>> 1197.log : No >>>>>> > ledgers map index found on entryLogId1197 >>>>>> > 18-03-08-09-33-57 Failed to get ledgers map index from: >>>>>> 1198.log : No >>>>>> > ledgers map index found on entryLogId1198 >>>>>> > 18-03-08-09-34-54 Failed to get ledgers map index from: >>>>>> 1199.log : No >>>>>> > ledgers map index found on entryLogId1199 >>>>>> >>>>> >>>> >>>> This is also unrelated. It is a fine logging, since it means it >>>> attempts to read ledgers map index, but didn't find one. >>>> >>>> the ledgers map index is only written when it is enabled at the bookies. >>>> >>>> >>>>> > >>>>>> > I will try to run Bookie with level debug, but it produces lots of >>>>>> spam >>>>>> > >>>>>> > Thanks >>>>>> > Enrico >>>>>> > >>>>>> >> >>>>>> >> >>>>>> >> -Ivan >>>>>> >> >>>>>> >> On Thu, Mar 8, 2018 at 9:18 AM, Enrico Olivelli < >>>>>> eolive...@gmail.com> >>>>>> >> wrote: >>>>>> >> > Hi all, >>>>>> >> > I am seeing this bad errors in some test environments with 4.6.1 >>>>>> >> > The errors appear during rolling restart of the application, the >>>>>> test >>>>>> >> > env is >>>>>> >> > made of 6 machines: >>>>>> >> > - 3 bookies >>>>>> >> > - 3 client machines (with several BK clients, of different >>>>>> sub-systems) >>>>>> >> > - running with 4.6.1 both client and servers >>>>>> >> > >>>>>> >> > *I do not have* reports of this errors from production, already >>>>>> running >>>>>> >> > 4.6.1 for the last month >>>>>> >> > >>>>>> >> > But the problem is quite scary >>>>>> >> > >>>>>> >> > This is a sample of relevant errors on clients (in this case >>>>>> Majordodo >>>>>> >> > brokers, with log level = INFO) >>>>>> >> > >>>>>> >> > logs/org.apache.bookkeeper.client.PendingAddOp: >>>>>> >> > 18-03-07-09-51-43 Write of ledger entry to quorum failed: >>>>>> L366634 >>>>>> >> > E2557 >>>>>> >> > 18-03-07-09-51-43 Write of ledger entry to quorum failed: >>>>>> L366634 >>>>>> >> > E2558 >>>>>> >> > 18-03-07-09-51-43 Write of ledger entry to quorum failed: >>>>>> L366634 >>>>>> >> > E2559 >>>>>> >> > 18-03-07-15-59-55 Failed to write entry (366680, 1865): >>>>>> Bookie >>>>>> >> > operation timeout >>>>>> >> > 18-03-07-15-59-55 Failed to write entry (366680, 1865): >>>>>> Bookie >>>>>> >> > operation timeout >>>>>> >> > 18-03-07-16-00-00 Failed to write entry (366680, 1865): >>>>>> Bookie >>>>>> >> > operation timeout >>>>>> >> > 18-03-07-16-00-00 Failed to write entry (366680, 1865): >>>>>> Bookie >>>>>> >> > operation timeout >>>>>> >> > 18-03-07-16-00-05 Failed to write entry (366680, 1865): >>>>>> Bookie >>>>>> >> > operation timeout >>>>>> >> > >>>>>> >> > >>>>>> >> > org.apache.bookkeeper.proto.PerChannelBookieClient: >>>>>> >> > >>>>>> >> > 18-03-07-10-06-44 Unexpected exception caught by bookie >>>>>> client >>>>>> >> > channel >>>>>> >> > handler >>>>>> >> > 18-03-07-10-06-44 io.netty.handler.codec.DecoderException: >>>>>> >> > java.lang.IllegalStateException: Received unknown response : op >>>>>> code = 6 >>>>>> >> > io.netty.handler.codec.DecoderException: >>>>>> >> > java.lang.IllegalStateException: >>>>>> >> > Received unknown response : op code = 6 >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:98) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1414) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:945) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:404) >>>>>> >> > at >>>>>> >> > >>>>>> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886) >>>>>> >> > at java.base/java.lang.Thread.run(Thread.java:844) >>>>>> >> > Caused by: java.lang.IllegalStateException: Received unknown >>>>>> response : >>>>>> >> > op >>>>>> >> > code = 6 >>>>>> >> > at >>>>>> >> > >>>>>> >> > org.apache.bookkeeper.proto.Bo >>>>>> okieProtoEncoding$ResponseEnDeCoderPreV3.decode(BookieProtoEncoding.java:294) >>>>>> >> > at >>>>>> >> > >>>>>> >> > org.apache.bookkeeper.proto.Bo >>>>>> okieProtoEncoding$ResponseDecoder.decode(BookieProtoEncoding.java:478) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) >>>>>> >> > >>>>>> >> > >>>>>> >> > ... 19 more >>>>>> >> > >>>>>> >> > >>>>>> >> > Errors on Bookie side, with log level = INFO) >>>>>> >> > >>>>>> >> > >>>>>> >> > io.netty.channel.DefaultChannelPipeline: >>>>>> >> > 8-03-07-09-51-38 An exceptionCaught() event was fired, and >>>>>> it >>>>>> >> > reached >>>>>> >> > at the tail of the pipeline. It usually means the last handler >>>>>> in the >>>>>> >> > pipeline did not handle the exception. >>>>>> >> > 18-03-07-09-51-38 >>>>>> io.netty.util.IllegalReferenceCountException: >>>>>> >> > refCnt: 0, increment: 1 >>>>>> >> > io.netty.util.IllegalReferenceCountException: refCnt: 0, >>>>>> increment: 1 >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.buffer.AbstractReferenceCountedByteBuf.release0(AbstractReferenceCountedByteBuf.java:100) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:84) >>>>>> >> > at >>>>>> >> > >>>>>> io.netty.util.ReferenceCountUtil.release(ReferenceCountUtil.java:88) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.DefaultChannelPipeline.onUnhandledInboundMessage(DefaultChannelPipeline.java:1202) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.DefaultChannelPipeline$TailContext.channelRead(DefaultChannelPipeline.java:1299) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) >>>>>> >> > at >>>>>> >> > >>>>>> >> > org.apache.bookkeeper.proto.Bo >>>>>> okieRequestHandler.channelRead(BookieRequestHandler.java:78) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) >>>>>> >> > at >>>>>> >> > >>>>>> >> > org.apache.bookkeeper.proto.Au >>>>>> thHandler$ServerSideHandler.channelRead(AuthHandler.java:88) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1414) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:945) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:404) >>>>>> >> > at >>>>>> >> > >>>>>> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304) >>>>>> >> > at >>>>>> >> > >>>>>> >> > >>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886) >>>>>> >> > at java.base/java.lang.Thread.run(Thread.java:844) >>>>>> >>>>> >>>> questions: >>>> >>>> 1) which protocol version are you using? >>>> >>> >>> V3 >>> >>> >>>> 2) is the exception coming from read response or write response? it >>>> sound like write response. If it is write response, it would be surprised >>>> about the ref count. because we don't send any buffer back in write >>>> response. >>>> >>> >>> >>> There is no clue of read error, only those logs about write failures >>> >>> Thank you for support >>> Enrico >>> >>>> >>>> >>>>> >> > >>>>>> >> > >>>>>> > >>>>>> > >>>>>> >>>>> >>>>> >>>> -- >>> >>> >>> -- Enrico Olivelli >>> >> -- > > > -- Enrico Olivelli > -- -- Enrico Olivelli