[
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057213#comment-17057213
]
Shashikant Banerjee edited comment on HDDS-3155 at 3/11/20, 5:01 PM:
---------------------------------------------------------------------
{code:java}
Currently, each flush operation in ozone generates a new chunk file in real
time on the disk. This approach is not very efficient at the moment.
{code}
This is not true. We never do a sync write to disk during flush/close. It does
not create a chunk file either during flush/close.
A chunk file is created when a new chunk is getting written (for 4MB data by
default).
was (Author: shashikant):
{code:java}
Currently, each flush operation in ozone generates a new chunk file in real
time on the disk. This approach is not very efficient at the moment.
{code}
This is not true. We never do a sync write to disk during flush/close. It does
not create a chunk file either during flush/close.
> Improved ozone flush implementation to make it faster.
> ------------------------------------------------------
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: mingchao zhao
> Priority: Major
> Attachments: amlog, stdout
>
>
> Background:
> When we execute mapreduce in the ozone, we find that the task will be
> stuck for a long time after the completion of Map and Reduce. The log is as
> follows:
> {code:java}
> //Refer to the attachment: stdout
> 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33%
> 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100%
> 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed
> successfully{code}
> By looking at AM's log(Refer to the amlog for details), we found that the
> time of over 40 minutes is AM writing a task log into ozone.
> At present, after MR execution, the Task information is recorded into the
> log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS
> or ozone one by one
> ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
> The problem occurs when the number of task maps is large.
> Currently, each flush operation in ozone generates a new chunk file in
> real time on the disk. This approach is not very efficient at the moment. For
> this we can refer to the implementation of HDFS flush. Instead of writing to
> disk each time flush writes the contents of the buffer to the datanode's OS
> buffer. In the first place, we need to ensure that this content can be read
> by other datanodes.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]