Agree. For some reason, if you would not like to use more than one datanode (let alone datanodes across multiple racks for fault tolerance) for some non-critical usecase, it's still recommended to use hsync over the output stream for on-disk persistence (unless the single DN setup is being used only for some deliberate resilience testing of hflush, and data loss is not a concern).
On Fri, Dec 30, 2022 at 9:04 AM Ayush Saxena <ayush...@gmail.com> wrote: > The file was in progress? In that case this is possible, once the data > gets persisted on the disk of the datanode then the data loss ain’t > possible. > > If someone did a hflush and not hsync while writing and the power loss > happens immediately after that, so in that case also I feel there is a > possibility that data might get lost post restart. > > Rest if the file was complete, then I don’t think in any circumstance data > should get lost > > -Ayush > > > On 30-Dec-2022, at 5:17 PM, hehaore...@gmail.com wrote: > > > > Hi, > > A 1-replica HDFS cluster with a single DataNode. When the DataNode was > restarted after power failure, it found a file with a missing block. The > size of the block and mate files found in the storage path is empty, and > the last modification time is the power off time. Besides the fact that the > file is being written, what else could be causing this phenomenon? > > I wish you a happy New Year > > > > Hao He > > 从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送 > > > --------------------------------------------------------------------- To > unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional > commands, e-mail: user-h...@hadoop.apache.org > >