[ 
https://issues.apache.org/jira/browse/YARN-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bence Kosztolnik reassigned YARN-11656:
---------------------------------------

    Assignee: Bence Kosztolnik

> RMStateStore event queue blocked
> --------------------------------
>
>                 Key: YARN-11656
>                 URL: https://issues.apache.org/jira/browse/YARN-11656
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>    Affects Versions: 3.4.1
>            Reporter: Bence Kosztolnik
>            Assignee: Bence Kosztolnik
>            Priority: Major
>         Attachments: issue.png, log.png
>
>
> h2. Problem statement
>  
> I observed Yarn cluster has pending and available resources as well, but the 
> cluster utilization is usually around ~50%. The cluster had loaded with 200 
> parallel PI example job (from hadoop-mapreduce-examples) with 20 map and 20 
> reduce containers configured, on a 50 nodes cluster, where each node had 8 
> cores, and a lot of memory (there was cpu bottleneck).
> Finally, I realized the RM had some IO bottleneck and needed 1~20 seconds to 
> persist a RMStateStoreEvent (using FileSystemRMStateStore).
> To reduce the impact of the issue:
> - create a dispatcher where events can persist in parallel threads
> - create metric data for the RMStateStore event queue to be able easily to 
> identify the problem if occurs on a cluster
> {panel:title=Issue visible on UI2}
>  !issue.png|height=250!
> {panel}
> Also another way to identify the issue if we can see too much time is 
> required to store info for app after reach new_saving state
> {panel:title=How issue can look like in log}
>  !log.png|height=250!
> {panel}
> h2. Solution
> Created a *MultiDispatcher* class which implements the Dispatcher interface.
> The Dispatcher creates a separate metric object called _Event metrics for 
> "rm-state-store"_ where we can see 
> - how many unhandled events are currently present in the event queue for the 
> specific event type
> - how many events were handled for the specific event type
> - average execution time for the specific event
> The dispatcher has the following configs ( the placeholder is for the 
> dispatcher name, for example, rm-state-store )
> ||Config name||Description||Default value||
> |yarn.dispatcher.multi-thread.{}.*default-pool-size*|How many parallel 
> threads should execute the parallel event execution| 4|
> |yarn.dispatcher.multi-thread.{}.*max-pool-size*|If the event queue is full 
> the execution threads will scale up to this many|8|
> |yarn.dispatcher.multi-thread.{}.*keep-alive-seconds*|Execution threads will 
> be destroyed after this many seconds|10|
> |yarn.dispatcher.multi-thread.{}.*queue-size*|Size of the eventqueue|1 000 
> 000|
> |yarn.dispatcher.multi-thread.{}.*monitor-seconds*|The size of the event 
> queue will be logged with this frequency (if not zero) |30|
> |yarn.dispatcher.multi-thread.{}.*graceful-stop-seconds*|After the stop 
> signal the dispatcher will wait this many seconds to be able to process the 
> incoming events before terminating them|60|
> {panel:title=Example output from RM JMX api}
> {noformat}
> ...
>     {
>       "name": "Hadoop:service=ResourceManager,name=Event metrics for 
> rm-state-store",
>       "modelerType": "Event metrics for rm-state-store",
>       "tag.Context": "yarn",
>       "tag.Hostname": CENSORED
>       "RMStateStoreEventType#STORE_APP_ATTEMPT_Current": 51,
>       "RMStateStoreEventType#STORE_APP_ATTEMPT_NumOps": 0,
>       "RMStateStoreEventType#STORE_APP_ATTEMPT_AvgTime": 0.0,
>       "RMStateStoreEventType#STORE_APP_Current": 124,
>       "RMStateStoreEventType#STORE_APP_NumOps": 46,
>       "RMStateStoreEventType#STORE_APP_AvgTime": 3318.25,
>       "RMStateStoreEventType#UPDATE_APP_Current": 31,
>       "RMStateStoreEventType#UPDATE_APP_NumOps": 16,
>       "RMStateStoreEventType#UPDATE_APP_AvgTime": 2629.6666666666665,
>       "RMStateStoreEventType#UPDATE_APP_ATTEMPT_Current": 31,
>       "RMStateStoreEventType#UPDATE_APP_ATTEMPT_NumOps": 12,
>       "RMStateStoreEventType#UPDATE_APP_ATTEMPT_AvgTime": 2048.6666666666665,
>       "RMStateStoreEventType#REMOVE_APP_Current": 12,
>       "RMStateStoreEventType#REMOVE_APP_NumOps": 3,
>       "RMStateStoreEventType#REMOVE_APP_AvgTime": 1378.0,
>       "RMStateStoreEventType#REMOVE_APP_ATTEMPT_Current": 0,
>       "RMStateStoreEventType#REMOVE_APP_ATTEMPT_NumOps": 0,
>       "RMStateStoreEventType#REMOVE_APP_ATTEMPT_AvgTime": 0.0,
>       "RMStateStoreEventType#FENCED_Current": 0,
>       "RMStateStoreEventType#FENCED_NumOps": 0,
>       "RMStateStoreEventType#FENCED_AvgTime": 0.0,
>       "RMStateStoreEventType#STORE_MASTERKEY_Current": 0,
>       "RMStateStoreEventType#STORE_MASTERKEY_NumOps": 0,
>       "RMStateStoreEventType#STORE_MASTERKEY_AvgTime": 0.0,
>       "RMStateStoreEventType#REMOVE_MASTERKEY_Current": 0,
>       "RMStateStoreEventType#REMOVE_MASTERKEY_NumOps": 0,
>       "RMStateStoreEventType#REMOVE_MASTERKEY_AvgTime": 0.0,
>       "RMStateStoreEventType#STORE_DELEGATION_TOKEN_Current": 0,
>       "RMStateStoreEventType#STORE_DELEGATION_TOKEN_NumOps": 0,
>       "RMStateStoreEventType#STORE_DELEGATION_TOKEN_AvgTime": 0.0,
>       "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_Current": 0,
>       "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_NumOps": 0,
>       "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_AvgTime": 0.0,
>       "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_Current": 0,
>       "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_NumOps": 0,
>       "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_AvgTime": 0.0,
>       "RMStateStoreEventType#UPDATE_AMRM_TOKEN_Current": 0,
>       "RMStateStoreEventType#UPDATE_AMRM_TOKEN_NumOps": 0,
>       "RMStateStoreEventType#UPDATE_AMRM_TOKEN_AvgTime": 0.0,
>       "RMStateStoreEventType#STORE_RESERVATION_Current": 0,
>       "RMStateStoreEventType#STORE_RESERVATION_NumOps": 0,
>       "RMStateStoreEventType#STORE_RESERVATION_AvgTime": 0.0,
>       "RMStateStoreEventType#REMOVE_RESERVATION_Current": 0,
>       "RMStateStoreEventType#REMOVE_RESERVATION_NumOps": 0,
>       "RMStateStoreEventType#REMOVE_RESERVATION_AvgTime": 0.0,
>       "RMStateStoreEventType#STORE_PROXY_CA_CERT_Current": 0,
>       "RMStateStoreEventType#STORE_PROXY_CA_CERT_NumOps": 0,
>       "RMStateStoreEventType#STORE_PROXY_CA_CERT_AvgTime": 0.0
>     },
> ...
> {noformat}
> {panel}
> h2. Testing
> I deployed the MultiDispatcher supported version of yarn to the cluster and 
> applied the following performance test:
> {code:bash}
> #!/bin/bash
> for i in {1..50}; 
> do
>       ssh root@$i-node-url 'nohup ./perf.sh 4 1>/dev/null 2>/dev/nul &' &
> done
> sleep 300
> for i in {1..50}; 
> do
>       ssh root@$i-node-url "pkill -9 -f perf" &
> done
> sleep 5
> echo "DONE"
> {code}
> Each node had do following perf script
> {code:bash}
> #!/bin/bash
> while true
> do
>     if [ $(ps -o pid= -u hadoop | wc -l) -le $1 ]
>     then
>         hadoop jar /opt/hadoop-mapreduce-examples.jar pi 20 20 1>/dev/null 
> 2>&1 &
>     fi
>     sleep 1
> done
> {code}
> This way in 5 minute (+ wait until all job finish) i could process 332 app.
> After i tested the same with the official build i needed 5 minute only to 
> finish with the first app, after that 221 app were finished.
> I also tested it with LeveldbRMStateStore and ZKRMStateStore and did not 
> found any problem with the implementation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to