Hello,

We have a set of processing jobs (in samza) using key compacted Kafka logs as a 
durable Key-Value store.  Recently, after some network troubles that resulted 
in various parts of the infrastructure rebooting, we discovered that a key that 
we expected to be "alive" was compacted out of the log.

Because of the nature of the outage and our current level of logging, it is 
impossible to know whether the application level was at fault and send an 
erroneous tombstone to Kafka or if Kafka's cleaner was at fault however, it got 
me thinking whether it was good practice to use Kafka as a long term backing 
for a Key Value store.

Are there best practices concerning data loss and integrity when expecting 
certain messages to live "forever" and never be reaped/compacted?  It seems 
like the basic log abstraction can assume that messages only have to live for 
their contracted amount of time/space however, with the key compacted logs this 
can be defeated perpetually.

FWIW, we are deployed on top of ZFS in mirrored mode.

-Bart


________________________________
This e-mail may contain CONFIDENTIAL AND PROPRIETARY INFORMATION and/or 
PRIVILEGED AND CONFIDENTIAL COMMUNICATION intended solely for the recipient 
and, therefore, may not be retransmitted to any party outside of the 
recipient's organization without the prior written consent of the sender. If 
you have received this e-mail in error please notify the sender immediately by 
telephone or reply e-mail and destroy the original message without making a 
copy. Deep Silver, Inc. accepts no liability for any losses or damages 
resulting from infected e-mail transmissions and viruses in e-mail attachments.

Reply via email to