Re: Help with bad errors on 4.6.1

2018-03-15 Thread Enrico Olivelli
2018-03-15 11:13 GMT+01:00 Ivan Kelly :

> > What is the difference in Channel#write/ByteBuf pooling.in Java 9 ?
> Sounds like it could be an issue in netty itself. Java 9 removed a
> bunch of stuff around Unsafe, which I'm pretty sure netty was using
> for ByteBuf. Have you tried setting the pool debugging to paranoid?
>
> -Dio.netty.leakDetection.level=paranoid
>


only with 'advanced' , sorry. I will try again with paranoid

I have tried to attach my laptop to the same BK cluster and create a
reproducer client, no results !! Bookies do not break!
it seems that the error is only between the machines of that group (network)

It seems something very weird, maybe a mix of message size/network
settings/Java GC (pooled ByteBuf heap bufs are released on finalize() as
far as I have understood from Netty logs)

I hope that with 'paranoid' I will have some more stacktrace to have code
references

Thank you

Enrico


>
> i tried running my attempted repro in java9, but no cigar.
>
>
> -Ivan
>


Re: Help with bad errors on 4.6.1

2018-03-15 Thread Ivan Kelly
> What is the difference in Channel#write/ByteBuf pooling.in Java 9 ?
Sounds like it could be an issue in netty itself. Java 9 removed a
bunch of stuff around Unsafe, which I'm pretty sure netty was using
for ByteBuf. Have you tried setting the pool debugging to paranoid?

-Dio.netty.leakDetection.level=paranoid


i tried running my attempted repro in java9, but no cigar.


-Ivan


Re: Help with bad errors on 4.6.1

2018-03-15 Thread Enrico Olivelli
Very latest news:
I have narrowed the problem to ResponseEnDecoderV3#encode, using
UnpooledByteBufAllocator.DEFAULT instead of the allocator from the channel
the error disappear.

So the problem is about the encoding of the responses, using Java 9 and
Pooled Byte Bufs.
This is compatible with the errors on the client side about corrupted
responses in case of Client on Java8 and Server on Java9.

I am now doing tests with Bookie on Java 8 and Clients on Java 9 and the
problem seems the same, I receive corrupted messages on Bookie.

Does any ring bell ?

What is the difference in Channel#write/ByteBuf pooling.in Java 9 ?

Enrico







2018-03-15 5:21 GMT+01:00 Enrico Olivelli :

> Latest findings, some good news, and some very bad.
>
> Good news:
> I was wrong, I did not switch back the system to Java 8 correcly.
>
> The problem is on Bookie side and occours only if the bookie in on Java 9.
>
> Bad news:
> I have a fix. The fix to use Unpooled ByteBufs in serializeProtobuf:
>
> private static ByteBuf serializeProtobuf(MessageLite msg, ByteBufAllocator
> allocator) {
> int size = msg.getSerializedSize();
> ByteBuf buf = Unpooled.buffer(size, size);
> ...
>
> I will continue to track down to the cause, I think it is on the read-path
> (not sure).
>
> On client side we have a flag to not use pooled ByteBufs on Channel
> Allocator, the most trivial fix at the moment is to make the same on Bookie
> side as an hotfix for branch 4.6.
>
> Before jumping to this extreme hotfix solution I will dig into the issue,
> now that I know that the problem is ONLY on Java 9 and on the Bookie it
> will be simpler to find a reproducer.
>
> It remains the point that in other systems I have and in test cases there
> is no failure.
>
> Honestly I have no Java 9 bookie in production, only Java 8 bookies, maybe
> this is the motivation of the fact that no one ever reported this problem
> from production
>
> Enrico
>
>
>
>
> 2018-03-14 17:27 GMT+01:00 Ivan Kelly :
>
>> >> > @Ivan
>> >> > I wonder if some tests on Jepsen with bookie restarts may find this
>> kind
>> >> of
>> >> > issues, given that it is not a network/SO problem
>> >> If jepsen can catch then normal integration test can.
>>
>> I attempted a repro for this using the integration test stuff.
>> Running for 2-3 hours in a loop, no bug hit. Perhaps I'm not doing
>> exactly what you are doing.
>>
>> https://github.com/ivankelly/bookkeeper/blob/enrico-bug/test
>> s/integration/enrico-bug/src/test/java/org/apache/bookkeepe
>> r/tests/integration/TestEnricoBug.java
>>
>> -Ivan
>>
>
>