This issue seems to only affect zookeeper 3.4.3 (and not 3.3.5).
Basically it seems that after the truncate method is invoked, the
logStream member of the FileTxnLog is still pointing to the old position
in the file where it would have written the next entry before the
truncate happened. Since the log file is not rolled over or the stream
to reset, now a gap in the file is created, that would be interpreted
when reading the log as an end of that file.
That means once this node becomes leader later on, it would send a
snapshot to all its peer that only contains entries up to truncation -
all entries thereafter would not be sent. We had this happening on a
test cluster on 2/3 zookeeper servers while the network connection was
bad. Even after the nodes recovered we would loose all the data every
time the leader switches to one of those two nodes.
Furthermore (and that is a thing I could not 100% reproduce yet) it
seems that there are some situations when the transaction log file would
not only contain a gap but also just stop after the last entry before
the truncation after some leader changes.
I have a small program that is able to reproduce the error safely for
3.4.3 but not for 3.3.5. That seems to be related to the new leader in
3.3.5 not sending the truncation message to the peer that was more
advanced than the new leader, but the actual problem seems also be there
in 3.3.5 (I just couldn't get the TRUNC message to be sent in my test).
Do other people have encountered the same issue already?
I will create a ticket with the test that reproduces the issue later,
but before I will need to spend some more time on that script (things
are a little hard to reproduce because I have to pull a zookeeper server
out of the ensemble for some time without restarting it, to do so I'm
using port-forwarding which I can interrupt even on localhost instead of
direct connections).
What more information do you guys need to investigate the issue?