[jira] [Comment Edited] (HDDS-3155) Improved ozone client flush implementation to make it faster.

mingchao zhao (Jira) Tue, 17 Mar 2020 04:42:27 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058462#comment-17058462
 ]


mingchao zhao edited comment on HDDS-3155 at 3/17/20, 11:41 AM:
----------------------------------------------------------------

    Sorry, the initial test didn't use the latest code.I tested it again with 
master's latest code

    By tested HDDS-2717 , I found  hdds-2717  does append chunks to a block. 
When the user writes, it opens a file through a RandomAccessFile(the file stays 
open until the FSDataOutputStream of the client closes), and write to this file 
with a FileChannel. When clinet call flush, the clent buffer's data is appended 
to the block file. Some of the performance will be improved by not open a new 
chunk file.  But each flush still triggers an IO write)

     So the problems I had in my previous tests are still there.

    Except for the problem of getting stuck executing a MapReduce task. We  
tested the write performance of HDFS and ozone with TestDFSIO. The performance 
of ozone is much slower than that of HDFS. We found that the reason for the 
above two problems is HDFS's optimization of flush:
     1. If you do not actively call flush, after the client's buffer is full, 
it will automatically brush to the DN's buffer through the DFSOutputStream's 
hflush (no guarantee to write to disk). This is why HDFS behaves faster than 
ozone.
     2. If I actively call flush, as I did in my test program above. The data 
in the client buffer will be flushed directly to the DN buffer via hflush (no 
writing to disk is guaranteed). This is why HDFS Mapreduce doesn't get stuck.

    HDFS has two persistence mechanisms, hflush([no guarantee to write to 
disk|https://github.com/apache/hadoop/blob/ac4b556e2d44d3cd10b81c190ecee23e2dd66c10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L573])
 and hsync([guarantee to write to 
disk|https://github.com/apache/hadoop/blob/ac4b556e2d44d3cd10b81c190ecee23e2dd66c10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L599]).
 

  Based on the current problem, I think we can add the interface of hflush 
semantics in ozone. For example, keep the chunk in the buffer until it reaches 
a certain size or call the close method. Hi [~adoroszlai]  Do you have any 
Suggestions here?

 

 


was (Author: micahzhao):
    Sorry, the initial test didn't use the latest code.I tested it again with 
master's latest code

    By tested HDDS-2717 , I found  hdds-2717  does append chunks to a block. 
When the user writes, it opens a file through a RandomAccessFile(the file stays 
open until the FSDataOutputStream of the client closes), and write to this file 
with a FileChannel. When clinet call flush, the clent buffer's data is appended 
to the block file. Some of the performance will be improved by not open a new 
chunk file.  But each flush still triggers an IO write)

     So the problems I had in my previous tests are still there.

    Except for the problem of getting stuck executing a MapReduce task. We  
tested the write performance of HDFS and ozone with TestDFSIO. The performance 
of ozone is much slower than that of HDFS. We found that the reason for the 
above two problems is HDFS's optimization of flush:
     1. If you do not actively call flush, after the client's buffer is full, 
it will automatically brush to the DN's buffer through the DFSOutputStream's 
hflush (no guarantee to write to disk). This is why HDFS behaves faster than 
ozone.
     2. If I actively call flush, as I did in my test program above. The data 
in the client buffer will be flushed directly to the DN buffer via hflush (no 
writing to disk is guaranteed). This is why HDFS Mapreduce doesn't get stuck.

    HDFS has two persistence mechanisms, hflush([no guarantee to write to 
disk|https://github.com/apache/hadoop/blob/ac4b556e2d44d3cd10b81c190ecee23e2dd66c10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L573])
 and hsync([guarantee to write to 
disk|https://github.com/apache/hadoop/blob/ac4b556e2d44d3cd10b81c190ecee23e2dd66c10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java#L599]).
 

  

  Hi [~adoroszlai]  Do you have any Suggestions for ozone to support write 
similar to hflus for better write performance?

 

 

> Improved ozone client flush implementation to make it faster.
> -------------------------------------------------------------
>
>                 Key: HDDS-3155
>                 URL: https://issues.apache.org/jira/browse/HDDS-3155
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: mingchao zhao
>            Assignee: mingchao zhao
>            Priority: Major
>         Attachments: amlog, image-2020-03-12-16-48-08-391.png, 
> image-2020-03-12-17-47-57-770.png, stdout
>
>
> Background:
>     When we execute mapreduce in the ozone, we find that the task will be 
> stuck for a long time after the completion of Map and Reduce. The log is as 
> follows:
> {code:java}
> //Refer to the attachment: stdout
> 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
> 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
> 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully{code}
>     By looking at AM's log(Refer to the amlog for details), we found that the 
> time of over 40 minutes is AM writing a task log into ozone.
>     At present, after MR execution, the Task information is recorded into the 
> log on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS 
> or ozone one by one 
> ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
>  The problem occurs when the number of task maps is large. 
>      Currently, each flush operation in ozone generates a new chunk file in 
> real time on the disk. This approach is not very efficient at the moment. For 
> this we can refer to the implementation of HDFS flush. Instead of writing to 
> disk each time flush writes the contents of the buffer to the datanode's OS 
> buffer. In the first place, we need to ensure that this content can be read 
> by other datanodes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-3155) Improved ozone client flush implementation to make it faster.

Reply via email to