So the problem here is:

- a corrupted request failed the V3 request decoder, so bookie switched to
use v2 request decoder. Once the switch happen, the bookie will always use
v2 request decoder decoding v3 request. then all your v3 requests will be
failing with unknown op and trigger the bytebuf double releasing issue.

- a corrupted response failed the V3 response decoder, so client switched
to use v3 response decoder. Once the switch happen, the client will always
use v2 request decoder decoding v3 response. so all the responses will be
failing with " Received unknown response : op code"

Although I don't know how the first request/response happened (it can be
any issue, even network corruption), the problem is once this happen,
either client/bookie will be forcing to use v2 request/response decoder and
never change. so the problem will remain until you restarted. And this is
the behavior that Enrico is seeing.

There are a couple of issues to address here:

1) we need to add a flag to disable falling back to use v2 request/response
coder and make sure it always use v3 protocol. In this way, we will
guarantee the problem not persist in the channel level.
2) we need to throw exception on unknown op code at request decode :
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java#L195
. As what we did at response decoder :
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java#L304
in https://github.com/apache/bookkeeper/issues/198


Details are listed as below:

-- 

Details:

- The client side stacktrace clearly showed that it is using v2 decoder on
decoding responses. That means client failed to parse v3 response and it
falls back to use v2 decoder on decoding responses. Because it is a
"corrupted" v3 response, so v2 decoder can't
find a right op code.  Then it throws illegal state exception.


*Caused by: java.lang.IllegalStateException: Received unknown response : op
code = 6        at
org.apache.bookkeeper.proto.BookieProtoEncoding$ResponseEnDeCoderPreV3.decode(BookieProtoEncoding.java:294)*

- For the stacktrace at bookie side:

1. It is clear that BookieRequestHandler#L77 is called. That means the
message is neither v2 request nor v3 request. It is a ByteBuf.

https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieRequestHandler.java#L77

2. V3 request decoder is using protobuf to decode bytes array. If it is not
a valid v3 request, it will always throw exceptions. so the code mentioned
above will never be triggered

https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java#L344

3. The only case that BookieRequestHandler#L77 can happen is v2 request
decoder doesn't parse the buffer. This leads to the bug in

https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java#L194


- How this can happen?

If the client is using v3 protocol, but both client and bookie side
stacktraces show it is using v2 protocol, that means there was a corruption
causing client and bookie switching from v3 protocol to v2 protocol.

https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java#L502
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java#L432





On Thu, Mar 8, 2018 at 12:18 AM, Enrico Olivelli <eolive...@gmail.com>
wrote:

> Hi all,
> I am seeing this bad errors in some test environments with 4.6.1
> The errors appear during rolling restart of the application, the test env
> is made of 6 machines:
> - 3 bookies
> - 3 client machines (with several BK clients, of different sub-systems)
> - running with 4.6.1 both client and servers
>
> *I do not have* reports of this errors from production, already running
> 4.6.1 for the last month
>
> But the problem is quite scary
>
> This is a sample of relevant errors on clients (in this case Majordodo
> brokers, with log level = INFO)
>
> logs/org.apache.bookkeeper.client.PendingAddOp:
> 18-03-07-09-51-43       Write of ledger entry to quorum failed: L366634
> E2557
> 18-03-07-09-51-43       Write of ledger entry to quorum failed: L366634
> E2558
> 18-03-07-09-51-43       Write of ledger entry to quorum failed: L366634
> E2559
> 18-03-07-15-59-55       Failed to write entry (366680, 1865): Bookie
> operation timeout
> 18-03-07-15-59-55       Failed to write entry (366680, 1865): Bookie
> operation timeout
> 18-03-07-16-00-00       Failed to write entry (366680, 1865): Bookie
> operation timeout
> 18-03-07-16-00-00       Failed to write entry (366680, 1865): Bookie
> operation timeout
> 18-03-07-16-00-05       Failed to write entry (366680, 1865): Bookie
> operation timeout
>
>
> org.apache.bookkeeper.proto.PerChannelBookieClient:
>
> 18-03-07-10-06-44       Unexpected exception caught by bookie client
> channel handler
> 18-03-07-10-06-44       io.netty.handler.codec.DecoderException:
> java.lang.IllegalStateException: Received unknown response : op code = 6
> io.netty.handler.codec.DecoderException: java.lang.IllegalStateException:
> Received unknown response : op code = 6
>         at io.netty.handler.codec.MessageToMessageDecoder.channelRead(
> MessageToMessageDecoder.java:98)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:340)
>         at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(
> ByteToMessageDecoder.java:310)
>         at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(
> ByteToMessageDecoder.java:297)
>         at io.netty.handler.codec.ByteToMessageDecoder.callDecode(
> ByteToMessageDecoder.java:413)
>         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(
> ByteToMessageDecoder.java:265)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:340)
>         at io.netty.channel.DefaultChannelPipeline$
> HeadContext.channelRead(DefaultChannelPipeline.java:1414)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(
> DefaultChannelPipeline.java:945)
>         at io.netty.channel.epoll.AbstractEpollStreamChannel$
> EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
>         at io.netty.channel.epoll.EpollEventLoop.processReady(
> EpollEventLoop.java:404)
>         at io.netty.channel.epoll.EpollEventLoop.run(
> EpollEventLoop.java:304)
>         at io.netty.util.concurrent.SingleThreadEventExecutor$5.
> run(SingleThreadEventExecutor.java:886)
>         at java.base/java.lang.Thread.run(Thread.java:844)
>





> Caused by: java.lang.IllegalStateException: Received unknown response :
> op code = 6
>         at org.apache.bookkeeper.proto.BookieProtoEncoding$
> ResponseEnDeCoderPreV3.decode(BookieProtoEncoding.java:294)
>         at org.apache.bookkeeper.proto.BookieProtoEncoding$
> ResponseDecoder.decode(BookieProtoEncoding.java:478)
>         at io.netty.handler.codec.MessageToMessageDecoder.channelRead(
> MessageToMessageDecoder.java:88)
>
>
>         ... 19 more
>







>
>
> Errors on Bookie side, with log level = INFO)
>
>
> io.netty.channel.DefaultChannelPipeline:
> 8-03-07-09-51-38       An exceptionCaught() event was fired, and it
> reached at the tail of the pipeline. It usually means the last handler in
> the pipeline did not handle the exception.
> 18-03-07-09-51-38       io.netty.util.IllegalReferenceCountException:
> refCnt: 0, increment: 1
> io.netty.util.IllegalReferenceCountException: refCnt: 0, increment: 1
>         at io.netty.buffer.AbstractReferenceCountedByteBuf.release0(
> AbstractReferenceCountedByteBuf.java:100)
>         at io.netty.buffer.AbstractReferenceCountedByteBuf.release(
> AbstractReferenceCountedByteBuf.java:84)
>         at io.netty.util.ReferenceCountUtil.release(
> ReferenceCountUtil.java:88)
>         at io.netty.channel.DefaultChannelPipeline.
> onUnhandledInboundMessage(DefaultChannelPipeline.java:1202)
>         at io.netty.channel.DefaultChannelPipeline$
> TailContext.channelRead(DefaultChannelPipeline.java:1299)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:340)
>         at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(
> ChannelInboundHandlerAdapter.java:86)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:340)
>         at org.apache.bookkeeper.proto.BookieRequestHandler.channelRead(
> BookieRequestHandler.java:78)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:340)
>         at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(
> ChannelInboundHandlerAdapter.java:86)
>         at org.apache.bookkeeper.proto.AuthHandler$ServerSideHandler.
> channelRead(AuthHandler.java:88)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:340)
>         at io.netty.handler.codec.MessageToMessageDecoder.channelRead(
> MessageToMessageDecoder.java:102)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:340)
>         at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(
> ByteToMessageDecoder.java:310)
>         at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(
> ByteToMessageDecoder.java:297)
>         at io.netty.handler.codec.ByteToMessageDecoder.callDecode(
> ByteToMessageDecoder.java:413)
>         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(
> ByteToMessageDecoder.java:265)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
> AbstractChannelHandlerContext.java:340)
>         at io.netty.channel.DefaultChannelPipeline$
> HeadContext.channelRead(DefaultChannelPipeline.java:1414)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(
> DefaultChannelPipeline.java:945)
>         at io.netty.channel.epoll.AbstractEpollStreamChannel$
> EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
>         at io.netty.channel.epoll.EpollEventLoop.processReady(
> EpollEventLoop.java:404)
>         at io.netty.channel.epoll.EpollEventLoop.run(
> EpollEventLoop.java:304)
>         at io.netty.util.concurrent.SingleThreadEventExecutor$5.
> run(SingleThreadEventExecutor.java:886)
>         at java.base/java.lang.Thread.run(Thread.java:844)
>
>
>

Reply via email to