[jira] [Commented] (YARN-11656) RMStateStore event queue blocked

ASF GitHub Bot (Jira) Thu, 22 Feb 2024 06:09:05 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819659#comment-17819659
 ]


ASF GitHub Bot commented on YARN-11656:
---------------------------------------

K0K0V0K commented on code in PR #6569:
URL: https://github.com/apache/hadoop/pull/6569#discussion_r1499304409


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/Event.java:
##########
@@ -32,4 +32,14 @@ public interface Event<TYPE extends Enum<TYPE>> {
   TYPE getType();
   long getTimestamp();
   String toString();
+
+  /**
+   * In case of parallel execution of events in the same dispatcher,
+   * the result of this method will be used as semaphore.
+   * If method returns null, then a default semaphore will be used.
+   * @return the semaphore
+   */
+  default String getLockKey() {
+    return null;

Review Comment:
   Hi @slfan1989 !
   
   Thanks for the review.
   Yes, that is expected. 
   If we dont specify lockKey for an event we should return with null, so these 
events will be executed in sequential not parallel. The method is used in the 
MultiDispatcherLocks.





> RMStateStore event queue blocked
> --------------------------------
>
>                 Key: YARN-11656
>                 URL: https://issues.apache.org/jira/browse/YARN-11656
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>    Affects Versions: 3.4.1
>            Reporter: Bence Kosztolnik
>            Assignee: Bence Kosztolnik
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: issue.png, log.png
>
>
> h2. Problem statement
>  
> I observed Yarn cluster has pending and available resources as well, but the 
> cluster utilization is usually around ~50%. The cluster had loaded with 200 
> parallel PI example job (from hadoop-mapreduce-examples) with 20 map and 20 
> reduce containers configured, on a 50 nodes cluster, where each node had 8 
> cores, and a lot of memory (there was cpu bottleneck).
> Finally, I realized the RM had some IO bottleneck and needed 1~20 seconds to 
> persist a RMStateStoreEvent (using FileSystemRMStateStore).
> To reduce the impact of the issue:
> - create a dispatcher where events can persist in parallel threads
> - create metric data for the RMStateStore event queue to be able easily to 
> identify the problem if occurs on a cluster
> {panel:title=Issue visible on UI2}
>  !issue.png|height=250!
> {panel}
> Also another way to identify the issue if we can see too much time is 
> required to store info for app after reach new_saving state
> {panel:title=How issue can look like in log}
>  !log.png|height=250!
> {panel}
> h2. Solution
> Created a *MultiDispatcher* class which implements the Dispatcher interface.
> The Dispatcher creates a separate metric object called _Event metrics for 
> "rm-state-store"_ where we can see 
> - how many unhandled events are currently present in the event queue for the 
> specific event type
> - how many events were handled for the specific event type
> - average execution time for the specific event
> The dispatcher has the following configs ( the placeholder is for the 
> dispatcher name, for example, rm-state-store )
> ||Config name||Description||Default value||
> |yarn.dispatcher.multi-thread.{}.*default-pool-size*|How many parallel 
> threads should execute the parallel event execution| 4|
> |yarn.dispatcher.multi-thread.{}.*max-pool-size*|If the event queue is full 
> the execution threads will scale up to this many|8|
> |yarn.dispatcher.multi-thread.{}.*keep-alive-seconds*|Execution threads will 
> be destroyed after this many seconds|10|
> |yarn.dispatcher.multi-thread.{}.*queue-size*|Size of the eventqueue|1 000 
> 000|
> |yarn.dispatcher.multi-thread.{}.*monitor-seconds*|The size of the event 
> queue will be logged with this frequency (if not zero) |30|
> |yarn.dispatcher.multi-thread.{}.*graceful-stop-seconds*|After the stop 
> signal the dispatcher will wait this many seconds to be able to process the 
> incoming events before terminating them|60|
> {panel:title=Example output from RM JMX api}
> {noformat}
> ...
>     {
>       "name": "Hadoop:service=ResourceManager,name=Event metrics for 
> rm-state-store",
>       "modelerType": "Event metrics for rm-state-store",
>       "tag.Context": "yarn",
>       "tag.Hostname": CENSORED
>       "RMStateStoreEventType#STORE_APP_ATTEMPT_Current": 51,
>       "RMStateStoreEventType#STORE_APP_ATTEMPT_NumOps": 0,
>       "RMStateStoreEventType#STORE_APP_ATTEMPT_AvgTime": 0.0,
>       "RMStateStoreEventType#STORE_APP_Current": 124,
>       "RMStateStoreEventType#STORE_APP_NumOps": 46,
>       "RMStateStoreEventType#STORE_APP_AvgTime": 3318.25,
>       "RMStateStoreEventType#UPDATE_APP_Current": 31,
>       "RMStateStoreEventType#UPDATE_APP_NumOps": 16,
>       "RMStateStoreEventType#UPDATE_APP_AvgTime": 2629.6666666666665,
>       "RMStateStoreEventType#UPDATE_APP_ATTEMPT_Current": 31,
>       "RMStateStoreEventType#UPDATE_APP_ATTEMPT_NumOps": 12,
>       "RMStateStoreEventType#UPDATE_APP_ATTEMPT_AvgTime": 2048.6666666666665,
>       "RMStateStoreEventType#REMOVE_APP_Current": 12,
>       "RMStateStoreEventType#REMOVE_APP_NumOps": 3,
>       "RMStateStoreEventType#REMOVE_APP_AvgTime": 1378.0,
>       "RMStateStoreEventType#REMOVE_APP_ATTEMPT_Current": 0,
>       "RMStateStoreEventType#REMOVE_APP_ATTEMPT_NumOps": 0,
>       "RMStateStoreEventType#REMOVE_APP_ATTEMPT_AvgTime": 0.0,
>       "RMStateStoreEventType#FENCED_Current": 0,
>       "RMStateStoreEventType#FENCED_NumOps": 0,
>       "RMStateStoreEventType#FENCED_AvgTime": 0.0,
>       "RMStateStoreEventType#STORE_MASTERKEY_Current": 0,
>       "RMStateStoreEventType#STORE_MASTERKEY_NumOps": 0,
>       "RMStateStoreEventType#STORE_MASTERKEY_AvgTime": 0.0,
>       "RMStateStoreEventType#REMOVE_MASTERKEY_Current": 0,
>       "RMStateStoreEventType#REMOVE_MASTERKEY_NumOps": 0,
>       "RMStateStoreEventType#REMOVE_MASTERKEY_AvgTime": 0.0,
>       "RMStateStoreEventType#STORE_DELEGATION_TOKEN_Current": 0,
>       "RMStateStoreEventType#STORE_DELEGATION_TOKEN_NumOps": 0,
>       "RMStateStoreEventType#STORE_DELEGATION_TOKEN_AvgTime": 0.0,
>       "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_Current": 0,
>       "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_NumOps": 0,
>       "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_AvgTime": 0.0,
>       "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_Current": 0,
>       "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_NumOps": 0,
>       "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_AvgTime": 0.0,
>       "RMStateStoreEventType#UPDATE_AMRM_TOKEN_Current": 0,
>       "RMStateStoreEventType#UPDATE_AMRM_TOKEN_NumOps": 0,
>       "RMStateStoreEventType#UPDATE_AMRM_TOKEN_AvgTime": 0.0,
>       "RMStateStoreEventType#STORE_RESERVATION_Current": 0,
>       "RMStateStoreEventType#STORE_RESERVATION_NumOps": 0,
>       "RMStateStoreEventType#STORE_RESERVATION_AvgTime": 0.0,
>       "RMStateStoreEventType#REMOVE_RESERVATION_Current": 0,
>       "RMStateStoreEventType#REMOVE_RESERVATION_NumOps": 0,
>       "RMStateStoreEventType#REMOVE_RESERVATION_AvgTime": 0.0,
>       "RMStateStoreEventType#STORE_PROXY_CA_CERT_Current": 0,
>       "RMStateStoreEventType#STORE_PROXY_CA_CERT_NumOps": 0,
>       "RMStateStoreEventType#STORE_PROXY_CA_CERT_AvgTime": 0.0
>     },
> ...
> {noformat}
> {panel}
> h2. Testing
> I deployed the MultiDispatcher supported version of yarn to the cluster and 
> applied the following performance test:
> {code:bash}
> #!/bin/bash
> for i in {1..50}; 
> do
>       ssh root@$i-node-url 'nohup ./perf.sh 4 1>/dev/null 2>/dev/nul &' &
> done
> sleep 300
> for i in {1..50}; 
> do
>       ssh root@$i-node-url "pkill -9 -f perf" &
> done
> sleep 5
> echo "DONE"
> {code}
> Each node had do following perf script
> {code:bash}
> #!/bin/bash
> while true
> do
>     if [ $(ps -o pid= -u hadoop | wc -l) -le $1 ]
>     then
>         hadoop jar /opt/hadoop-mapreduce-examples.jar pi 20 20 1>/dev/null 
> 2>&1 &
>     fi
>     sleep 1
> done
> {code}
> This way in 5 minute (+ wait until all job finish) i could process 332 app.
> After i tested the same with the official build i needed 5 minute only to 
> finish with the first app, after that 221 app were finished.
> I also tested it with LeveldbRMStateStore and ZKRMStateStore and did not 
> found any problem with the implementation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11656) RMStateStore event queue blocked

Reply via email to