Il mer 19 lug 2017, 11:11 Sijie Guo <[email protected]> ha scritto: > On Wed, Jul 19, 2017 at 4:04 PM, Enrico Olivelli <[email protected]> > wrote: > >> Hi, >> in some internal benchmarks we are experiencing openLedgerNoRecovery >> calls which remain hung. >> I see that basically that function calls ZookKeeper#getData. >> > >> Does anyone have an idea of how it can happen ? >> > > What version are you testing? Is it related your recent change on bumping > zookeeper version? If that's the case, we should consider rolling back the > zookeeper version. >
3.5.1 and 3.5.3 > > >> >> Is there any implicit timeout on ZK.getData() ? I did not find any way >> and personally I never got into this problem. >> > > As far as I know, there is no timeout on zookeeper requests. It would be a > good question to zookeeper community. > I will do > > >> >> Maybe there is space for an improvement to add a timeout on openLedgerXXX >> operations, but anyway it is strange that the callback is never called. >> >> Unfortunately the problem happens only in integration tests, mabye I can >> work to reproduce it on a BK only test case. >> >> The case is simple: start ZK + 1 Bookie + 1 BookKeeper, create >> concurrencly many ledgers, write and concurrently open them with >> openLedgerNoRecovery from other threads. >> The fact is that no error is on ZK logs and BK logs >> > > Can you turn on debugging log for the bookkeeper client and also > zookeeper? There might be logs for checking. > Yes I am koggong at info, I will try at debug > > Another solution is to do a TCP dump for tracing the zookeeper calls to > see if the getData request and response is received at both sides. > > >> >> Any suggestion ? >> > Thank you again Enrico > >> Thanks >> >> -- Enrico >> >> >> -- -- Enrico Olivelli
