Hi, I have a doubt of the processing steps of NameNode:
*Reference:* "Hadoop: The Definitive Guide:3rd Ed" book by "Tom White" On page# 340 (Ch 10: HDFS > The file system image & edit log) Text from book: .... When a filesystem client performs a write operation (such as creating or moving a file), it is first recorded in the edit log. The namenode also has an in-memory representation of the filesystem metadata, which it updates after the edit log has been modified. The in-memory metadata is used to serve read requests. The edit log is flushed and *synced *after every write before a success code is returned to the client. For namenodes that write to multiple directories, the write must be flushed and synced to every copy before returning successfully. This ensures that no operation is lost due to machine failure. ... *Question 1: *The in-memory representation is updated before/after returning to the client or it is done async while updating the status code to client? I believe it should be before the status is sent to client. *Question 2: *What does "synced after every write" means here? For one file, there is only one writer. So when there is any write operation to the file, it is recorded in the edit log and flushed, no other writer will be working for this file. However there might be other writers working on other files and for any operation to that, edit log will be updated. Now there will multiple copies of edit log which will be merged. Is this understanding correct ? *Question 3:* Sorry, I did not get "For namenodes that* write to multiple directories*, the write must be flushed and synced to *every copy* before returning successfully." ? Especially the text in bold. Thanks Amit Mittal
