Greg Fodor created KAFKA-4317:
---------------------------------

             Summary: RocksDB checkpoint files lost on kill -9
                 Key: KAFKA-4317
                 URL: https://issues.apache.org/jira/browse/KAFKA-4317
             Project: Kafka
          Issue Type: Improvement
          Components: streams
    Affects Versions: 0.10.0.1
            Reporter: Greg Fodor
            Assignee: Guozhang Wang


Right now, the checkpoint files for logged RocksDB stores are written during a 
graceful shutdown, and removed upon restoration. Unfortunately this means that 
in a scenario where the process is forcibly killed, the checkpoint files are 
not there, so all RocksDB stores are rematerialized from scratch on the next 
launch.

In a way, this is good, because it simulates bootstrapping a new node (for 
example, its a good way to see how much I/O is used to rematerialize the 
stores) however it leads to longer recovery times when a non-graceful shutdown 
occurs and we want to get the job up and running again.

It seems that two possible things to consider:

- Simply do not remove checkpoint files on restoring. This way a kill -9 will 
result in only repeating the restoration of all the data generated in the 
source topics since the last graceful shutdown.

- Continually update the checkpoint files (perhaps on commit) -- this would 
result in the least amount of overhead/latency in restarting, but the 
additional complexity may not be worth it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to