Looking at this PR from Sijie I noticed that there is a rate limiter for our internal subclass of ZooKeeper client. https://github.com/apache/bookkeeper/pull/264
The rate limiter is not enabled and cannot be enabled. I wonder if I hit a bug in our getData or ZkRetryRunnable or it is enough to enable the rate limiter. @Sijie I left a comment on the PR, for me it is OK but it seems that it lacks support for client-side BookKeeper, it enables it only on the Bookie -- Enrico 2017-07-19 11:27 GMT+02:00 Enrico Olivelli <eolive...@gmail.com>: > > > Il mer 19 lug 2017, 11:11 Sijie Guo <guosi...@gmail.com> ha scritto: > >> On Wed, Jul 19, 2017 at 4:04 PM, Enrico Olivelli <eolive...@gmail.com> >> wrote: >> >>> Hi, >>> in some internal benchmarks we are experiencing openLedgerNoRecovery >>> calls which remain hung. >>> I see that basically that function calls ZookKeeper#getData. >>> >> >>> Does anyone have an idea of how it can happen ? >>> >> >> What version are you testing? Is it related your recent change on bumping >> zookeeper version? If that's the case, we should consider rolling back the >> zookeeper version. >> > > 3.5.1 and 3.5.3 > >> >> >>> >>> Is there any implicit timeout on ZK.getData() ? I did not find any way >>> and personally I never got into this problem. >>> >> >> As far as I know, there is no timeout on zookeeper requests. It would be >> a good question to zookeeper community. >> > > I will do > >> >> >>> >>> Maybe there is space for an improvement to add a timeout on >>> openLedgerXXX operations, but anyway it is strange that the callback is >>> never called. >>> >>> Unfortunately the problem happens only in integration tests, mabye I can >>> work to reproduce it on a BK only test case. >>> >>> The case is simple: start ZK + 1 Bookie + 1 BookKeeper, create >>> concurrencly many ledgers, write and concurrently open them with >>> openLedgerNoRecovery from other threads. >>> The fact is that no error is on ZK logs and BK logs >>> >> >> Can you turn on debugging log for the bookkeeper client and also >> zookeeper? There might be logs for checking. >> > > Yes I am koggong at info, I will try at debug > >> >> Another solution is to do a TCP dump for tracing the zookeeper calls to >> see if the getData request and response is received at both sides. >> >> >>> >>> Any suggestion ? >>> >> > > Thank you again > Enrico > >> >>> Thanks >>> >>> -- Enrico >>> >>> >>> -- > > > -- Enrico Olivelli >