Very detailed and clear explanation. Thanks a lot!
Jason On Tue, Feb 19, 2013 at 11:28 PM, Jay Kreps <[email protected]> wrote: > Yes, exactly. Here is the full story: > > When you restart kafka it checks if a clean shutdown was executed on > the log (which would have left a marker file), if the shutdown was > clean it assumes the log was fully flushed and uses it as is. If not > (as in the case of a hard kill or machine crash) it executes recovery > on the log. The recovery process validates the CRC of each message in > the unflushed portion of the log and truncates the log to eliminate > any partial writes that may have occurred while the server was killed. > This process guarantees that only valid messages remain. There are > actually a lot of corner cases in the case of a hard crash, depending > on the OS/FS, you can also get random corrupt blocks so this process > handles that case as well. In the case that you outline this would > mean that the log would contain the 100 messages flushed to disk > (assuming the last message was fully written) but not (obviously) the > 50 messages only in RAM. > > That all obviously describes the unreplicated case in 0.7.x. In 0.8 > you have the option of having a replication factor with each topic, > and so you only would lose the 50 messages in pagecache if you lost > ALL the replicas. If you had another in-sync surviving replica then > when the server came back up it would resync with the new leader who > would have the full log and there would be no loss of committed > messages. > > -Jay > > > On Tue, Feb 19, 2013 at 8:03 PM, Jason Huang <[email protected]> wrote: >> This starts to make sense to me. >> >> So a log segment file (000000000.log) may have some messages that's in >> local filesystem hard drive, some messages that's in pagecache? Say if >> a 0000000.log file has 150 messages and the first 100 has been flushed >> to local hard drive and the last 50 is still in the pagecache, what >> would happen if there is machine crash? Then when we restart the >> server, we will see the 000000.log file with only 100 messages in it? >> >> Thanks, >> >> Jason >> >> On Wed, Feb 20, 2013 at 1:59 AM, Jay Kreps <[email protected]> wrote: >>> To be clear: to lose data in the filesystem you need to hard kill the >>> machine. A hard kill of the process will not cause that. >>> >>> -Jay >>> >>> On Tue, Feb 19, 2013 at 8:25 AM, Jun Rao <[email protected]> wrote: >>>> Jason, >>>> >>>> Although messages are always written to the log segment file, they >>>> initially are only in the file system's pagecache. As Swapnil mentioned >>>> earlier, messages are flushed to disk periodically. If you do a clean >>>> shutdown (kill -15), we close all log file, which should flush all dirty >>>> data to disk. If you do a hard kill or your machine just crashed, the >>>> unflushed data may be lost. The data that you saw in the .log file can be >>>> just in the pagecache. >>>> >>>> Thanks, >>>> >>>> Jun >>>> >>>> On Tue, Feb 19, 2013 at 4:05 AM, Jason Huang <[email protected]> wrote: >>>> >>>>> Thanks for response. >>>>> >>>>> My confusion is that - once I see the message content in the .log >>>>> file, doesn't that mean the message has already been flushed to the >>>>> hard drive? Why would those messages still get lost if someone >>>>> manually kill the process (or if the server crashes unexpectedly)? >>>>> >>>>> Jason >>>>> >>>>> On Tue, Feb 19, 2013 at 6:53 AM, Swapnil Ghike <[email protected]> >>>>> wrote: >>>>> > Correction - The flush happens based on *number of messages* and time >>>>> > limits, whichever is hit first. >>>>> > >>>>> > >>>>> > >>>>> > On 2/19/13 3:50 AM, "Swapnil Ghike" <[email protected]> wrote: >>>>> > >>>>> >>The flush happens based on size and time limits, >>>>> >>whichever is hit first. >>>>> > >>>>>
