[ 
https://issues.apache.org/jira/browse/HDDS-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3240:
-----------------------------
    Description: 
Now follower cannot create container until leader finish creating container. 
But follower and leader can create container in parallel rather than in 
sequential.

1. From the code,  the [future 
thread|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L672]
 do getCachedStateMachineData  in readStateMachineData and the [future 
thread|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L459]
 do createContainer in writeStateMachineData  are the same 
[thread|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L505].
 Because `writeStateMachineData  `called before `readStateMachineData`. So 
leader must wait `createContainer `finish then `getCachedStateMachineData `and 
append logs to the follower, so leader and follower are not independent in 
createContainer, follower must wait leader finish `createContainer`.  
**How to improve it:**
I think this order can be improved by distinguishing the thread used by 
`getCachedStateMachineData `  and `createContainer `, and  [data = 
readStateMachineData(requestProto, term, 
logIndex)](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L619)
  use same thread with `createContainer `. If 
[stateMachineDataCache.get(logIndex)](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L617)
 does not return null,  leader can get stateMachineData from cache and need not 
wait `createContainer` finish, thus leader and follower can be independent. But 
if it return null, leader must finish `createContainer `and then apennd logs to 
the follower, so I think [data = readStateMachineData(requestProto, term, 
logIndex)](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L619)
 should use the same thread with `createContainer` rather than the whole 
[getCachedStateMachineData](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L614),
 and in this case leader and follower are not independent, open RocksDB when 
commit data is still meaningful. What do you think ?
2. From the jaeger UI, you can also see follower create container after leader 
finishing it currently.

  was:
1. As the image shows, when stage is 
DispatcherContext.WriteChunkStage.WRITE_DATA, the leader datanode execute two 
steps: KeyValueHandleHandler.handleCreateContainer, 
keyValueHandler.handleWriteChunk, after leader finish the two steps, the two 
follower datanodes execute two steps: 
KeyValueHandleHandler.handleCreateContainer, keyValueHandler.handleWriteChunk. 
2. The problem is KeyValueHandleHandler.handleCreateContainer cost about 300ms, 
so the leader and two followers cost 600ms to createContainer. The total cost 
of the whole write is about 1000ms, so it's a waste for leader and follower 
createContainer in sequential.
3. Besides, when createContainer, RocksDB.open cost about 200ms.

So I will try to:
A. leader datanode and 2 follower datanodes create container in parallel, not 
in sequential.
B. optimize RocksDB.open.
 !screenshot-1.png! 


> Improve write efficiency by creating container in parallel.
> -----------------------------------------------------------
>
>                 Key: HDDS-3240
>                 URL: https://issues.apache.org/jira/browse/HDDS-3240
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Major
>         Attachments: screenshot-1.png
>
>
> Now follower cannot create container until leader finish creating container. 
> But follower and leader can create container in parallel rather than in 
> sequential.
> 1. From the code,  the [future 
> thread|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L672]
>  do getCachedStateMachineData  in readStateMachineData and the [future 
> thread|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L459]
>  do createContainer in writeStateMachineData  are the same 
> [thread|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L505].
>  Because `writeStateMachineData  `called before `readStateMachineData`. So 
> leader must wait `createContainer `finish then `getCachedStateMachineData 
> `and append logs to the follower, so leader and follower are not independent 
> in createContainer, follower must wait leader finish `createContainer`.  
> **How to improve it:**
> I think this order can be improved by distinguishing the thread used by 
> `getCachedStateMachineData `  and `createContainer `, and  [data = 
> readStateMachineData(requestProto, term, 
> logIndex)](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L619)
>   use same thread with `createContainer `. If 
> [stateMachineDataCache.get(logIndex)](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L617)
>  does not return null,  leader can get stateMachineData from cache and need 
> not wait `createContainer` finish, thus leader and follower can be 
> independent. But if it return null, leader must finish `createContainer `and 
> then apennd logs to the follower, so I think [data = 
> readStateMachineData(requestProto, term, 
> logIndex)](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L619)
>  should use the same thread with `createContainer` rather than the whole 
> [getCachedStateMachineData](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L614),
>  and in this case leader and follower are not independent, open RocksDB when 
> commit data is still meaningful. What do you think ?
> 2. From the jaeger UI, you can also see follower create container after 
> leader finishing it currently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to