Re: Help with bad errors on 4.6.1

2018-03-27 Thread Enrico Olivelli
End of this story With this patch the problem does not occur anymore https://github.com/apache/bookkeeper/pull/1293 The patch does not address directly the problem, the root source is still unknown, this is very bad. But with that change no error is reported anymore, so actually it is enough to

Re: Help with bad errors on 4.6.1

2018-03-09 Thread Enrico Olivelli
2018-03-09 8:59 GMT+01:00 Sijie Guo : > Sent out a PR for the issues that I observed: > > https://github.com/apache/bookkeeper/pull/1240 > Other findings: - my problem is not related to jdk9, it happens with jdk8 too - the "tailing reader" is able to make progress and follow

Re: Help with bad errors on 4.6.1

2018-03-09 Thread Sijie Guo
Sent out a PR for the issues that I observed: https://github.com/apache/bookkeeper/pull/1240 On Thu, Mar 8, 2018 at 10:47 PM, Sijie Guo wrote: > So the problem here is: > > - a corrupted request failed the V3 request decoder, so bookie switched to > use v2 request decoder.

Re: Help with bad errors on 4.6.1

2018-03-08 Thread Sijie Guo
So the problem here is: - a corrupted request failed the V3 request decoder, so bookie switched to use v2 request decoder. Once the switch happen, the bookie will always use v2 request decoder decoding v3 request. then all your v3 requests will be failing with unknown op and trigger the bytebuf

Re: Help with bad errors on 4.6.1

2018-03-08 Thread Venkateswara Rao Jujjuri
On Thu, Mar 8, 2018 at 1:05 PM, Sijie Guo wrote: > > On Thu, Mar 8, 2018 at 12:33 PM Andrey Yegorov > wrote: > >> I am looking more at the PendigAddOp and it looks like, in addition to >> the case that Sijie has fixed, there is another scenario

Re: Help with bad errors on 4.6.1

2018-03-08 Thread Enrico Olivelli
Il gio 8 mar 2018, 22:22 Enrico Olivelli ha scritto: > > > Il gio 8 mar 2018, 22:08 Sijie Guo ha scritto: > >> This is even interesting with v3 protocol. Currently the v3 protocol >> copies the byte string and it doesn’t even use pooled buffer. So I am

Re: Help with bad errors on 4.6.1

2018-03-08 Thread Sijie Guo
This is even interesting with v3 protocol. Currently the v3 protocol copies the byte string and it doesn’t even use pooled buffer. So I am not sure if the issue comes from bookie or an issue at the netty layer. It would be good to get a repo. Sijie On Thu, Mar 8, 2018 at 12:29 PM Enrico Olivelli

Re: Help with bad errors on 4.6.1

2018-03-08 Thread Enrico Olivelli
Il gio 8 mar 2018, 21:33 Andrey Yegorov ha scritto: > I am looking more at the PendigAddOp and it looks like, in addition to the > case that Sijie has fixed, there is another scenario where recycler can get > triggered. > I think that this is another issue, but your

Re: Help with bad errors on 4.6.1

2018-03-08 Thread Andrey Yegorov
I am looking more at the PendigAddOp and it looks like, in addition to the case that Sijie has fixed, there is another scenario where recycler can get triggered. I.e.: first sendWriteRequest() in PendingAddOp.safeRun() fails in PCBC's writeAndFlush (channel == null or channel.writeAndFlush

Re: Help with bad errors on 4.6.1

2018-03-08 Thread Enrico Olivelli
Il gio 8 mar 2018, 20:52 Sijie Guo ha scritto: > On Thu, Mar 8, 2018 at 7:42 AM, Enrico Olivelli > wrote: > >> >> >> 2018-03-08 14:50 GMT+01:00 Ivan Kelly : >> >>> It just occurred to me that this could be a problem with the recycler.

Re: Help with bad errors on 4.6.1

2018-03-08 Thread Sijie Guo
On Thu, Mar 8, 2018 at 7:42 AM, Enrico Olivelli wrote: > > > 2018-03-08 14:50 GMT+01:00 Ivan Kelly : > >> It just occurred to me that this could be a problem with the recycler. >> If we recycle a buffer too early, but then keep using it, another user >>

Re: Help with bad errors on 4.6.1

2018-03-08 Thread Enrico Olivelli
2018-03-08 14:50 GMT+01:00 Ivan Kelly : > It just occurred to me that this could be a problem with the recycler. > If we recycle a buffer too early, but then keep using it, another user > could pick it up, and between them they could corrupt the data that > would cause it to be