[jira] [Comment Edited] (HDDS-3155) Improved ozone client flush implementation to make it faster.

mingchao zhao (Jira) Wed, 18 Mar 2020 00:28:50 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061383#comment-17061383
 ]


mingchao zhao edited comment on HDDS-3155 at 3/18/20, 7:25 AM:
---------------------------------------------------------------

Hi [~shashikant] Thanks for your feedback.
 >>>>  If I actively call flush, as I did in my test program above. The data in 
 >>>>the client buffer will be flushed directly to the DN buffer via hflush (no 
 >>>>writing to disk is guaranteed). This is why HDFS Mapreduce doesn't get 
 >>>>stuck.

>>In this case as well, if my understanding is correct, it will flush the data 
>>from the client buffer to DN in Ozone as well.
  
     The point is the HDFS flush implementation is different from  ozone when 
the buffer reaches the DN. HDFS flush does not necessarily persist data to 
disk. This makes HDFS perform better than ozone in write tests. 
     I tested it on a single-node machine, where watchForCommit does not take 
effect. Same as my previous test program, call flush after every write, and 
loop 8000 times. Ozone takes 10min and HDFS takes only a few seconds.


was (Author: micahzhao):
Hi [~shashikant] Thanks for your feedback.
>>>>  If I actively call flush, as I did in my test program above. The data in 
>>>>the client buffer will be flushed directly to the DN buffer via hflush (no 
>>>>writing to disk is guaranteed). This is why HDFS Mapreduce doesn't get 
>>>>stuck.

>>In this case as well, if my understanding is correct, it will flush the data 
>>from the client buffer to DN in Ozone as well.
 
    The key point is that the HDFS flush implementation is different from  
ozone when the buffer reaches the DN. HDFS flush does not necessarily persist 
data to disk. This makes HDFS perform better than ozone in write tests. 
    I tested it on a single-node machine, where watchForCommit does not take 
effect. Same as my previous test program, call flush after every write, and 
loop 8000 times. Ozone takes 10min and HDFS takes only a few seconds.

> Improved ozone client flush implementation to make it faster.
> -------------------------------------------------------------
>
>                 Key: HDDS-3155
>                 URL: https://issues.apache.org/jira/browse/HDDS-3155
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: mingchao zhao
>            Priority: Major
>         Attachments: amlog, image-2020-03-12-16-48-08-391.png, 
> image-2020-03-12-17-47-57-770.png, stdout
>
>
> Background:
>     When we execute mapreduce in the ozone, we find that the task will be 
> stuck for a long time after the completion of Map and Reduce. The log is as 
> follows:
> {code:java}
> //Refer to the attachment: stdout
> 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
> 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
> 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully{code}
>     By looking at AM's log(Refer to the amlog for details), we found that the 
> time of over 40 minutes is AM writing a task log into ozone.
>     At present, after MR execution, the Task information is recorded into the 
> log on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS 
> or ozone one by one 
> ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
>  The problem occurs when the number of task maps is large. 
>      Currently, each flush operation in ozone generates a new chunk file in 
> real time on the disk. This approach is not very efficient at the moment. For 
> this we can refer to the implementation of HDFS flush. Instead of writing to 
> disk each time flush writes the contents of the buffer to the datanode's OS 
> buffer. In the first place, we need to ensure that this content can be read 
> by other datanodes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-3155) Improved ozone client flush implementation to make it faster.

Reply via email to