Agree. For some reason, if you would not like to use more than one datanode
(let alone datanodes across multiple racks for fault tolerance) for some
non-critical usecase, it's still recommended to use hsync over the output
stream for on-disk persistence (unless the single DN setup is being used
only for some deliberate resilience testing of hflush, and data loss is not
a concern).


On Fri, Dec 30, 2022 at 9:04 AM Ayush Saxena <ayush...@gmail.com> wrote:

> The file was in progress? In that case this is possible, once the data
> gets persisted on the disk of the datanode then the data loss ain’t
> possible.
>
> If someone did a hflush and not hsync while writing and the power loss
> happens immediately after that, so in that case also I feel there is a
> possibility that data might get lost post restart.
>
> Rest if the file was complete, then I don’t think in any circumstance data
> should get lost
>
> -Ayush
>
>
> On 30-Dec-2022, at 5:17 PM, hehaore...@gmail.com wrote:
>
> 
>
> Hi,
>
> A 1-replica HDFS cluster with a single DataNode. When the DataNode was
> restarted after power failure, it found a file with a missing block. The
> size of the block and mate files found in the storage path is empty, and
> the last modification time is the power off time. Besides the fact that the
> file is being written, what else could be causing this phenomenon?
>
> I wish you a happy New Year
>
>
>
> Hao He
>
> 从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送
>
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional
> commands, e-mail: user-h...@hadoop.apache.org
>
>

Reply via email to