Hi,
in some internal benchmarks we are experiencing openLedgerNoRecovery calls
which remain hung.
I see that basically that function calls ZookKeeper#getData.

Does anyone have an idea of how it can happen ?

Is there any implicit timeout on ZK.getData() ? I did not find any way and
personally I never got into this problem.

Maybe there is space for an improvement to add a timeout on openLedgerXXX
operations, but anyway it is strange that the callback is never called.

Unfortunately the problem happens only in integration tests, mabye I can
work to reproduce it on a BK only test case.

The case is simple: start ZK + 1 Bookie + 1 BookKeeper, create concurrencly
many ledgers, write and concurrently open them with openLedgerNoRecovery
from other threads.
The fact is that no error is on ZK logs and BK logs

Any suggestion ?

Thanks

-- Enrico

Reply via email to