[ 
https://issues.apache.org/jira/browse/YARN-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bence Kosztolnik updated YARN-11656:
------------------------------------
    Description: 
h2. Problem statement
 
I observed Yarn cluster has pending and available resources as well, but the 
cluster utilization is usually around ~50%. The cluster had loaded with 200 
parallel PI example job (from hadoop-mapreduce-examples) with 20 map and 20 
reduce containers configured, on a 50 nodes cluster, where each node had 8 
cores, and a lot of memory (there was cpu bottleneck).
Finally, I realized the RM had some IO bottleneck and needed 1~20 seconds to 
persist a RMStateStoreEvent (using FileSystemRMStateStore).

To reduce the impact of the issue:
- create a dispatcher where events can persist in parallel threads
- create metric data for the RMStateStore event queue to be able easily to 
identify the problem if occurs on a cluster


{panel:title=Issue visible on UI2}
 !issue.png|height=250!
{panel}

Also another way to identify the issue if we can see too much time is required 
to store info for app after reach new_saving state

{panel:title=How issue can look like in log}
 !log.png! 
{panel}

h2. Solution

Created a *MultiDispatcher* class which implements the Dispatcher interface.
The Dispatcher creates a separate metric object called _Event metrics for 
"rm-state-store"_ where we can see 
- how many unhandled events are currently present in the event queue for the 
specific event type
- how many events were handled for the specific event type
- average execution time for the specific event

The dispatcher has the following configs ( the placeholder is for the 
dispatcher name, for example, rm-state-store )

||Config name||Description||Default value||
|yarn.dispatcher.multi-thread.{}.*default-pool-size*|How many parallel threads 
should execute the parallel event execution| 4|
|yarn.dispatcher.multi-thread.{}.*max-pool-size*|If the event queue is full the 
execution threads will scale up to this many|8|
|yarn.dispatcher.multi-thread.{}.*keep-alive-seconds*|Execution threads will be 
destroyed after this many seconds|10|
|yarn.dispatcher.multi-thread.{}.*queue-size*|Size of the eventqueue|1 000 000|
|yarn.dispatcher.multi-thread.{}.*monitor-seconds*|The size of the event queue 
will be logged with this frequency (if not zero) |30|
|yarn.dispatcher.multi-thread.{}.*graceful-stop-seconds*|After the stop signal 
the dispatcher will wait this many seconds to be able to process the incoming 
events before terminating them|60|



h2. Testing

  was:
h2. Problem statement
 
I observed Yarn cluster has pending and available resources as well, but the 
cluster utilization is usually around ~50%. The cluster had loaded with 200 
parallel PI example job (from hadoop-mapreduce-examples) with 20 map and 20 
reduce containers configured, on a 50 nodes cluster, where each node had 8 
cores, and a lot of memory (there was cpu bottleneck).
Finally, I realized the RM had some IO bottleneck and needed 1~20 seconds to 
persist a RMStateStoreEvent (using FileSystemRMStateStore).

To reduce the impact of the issue:
- create a dispatcher where events can persist in parallel threads
- create metric data for the RMStateStore event queue to be able easily to 
identify the problem if occurs on a cluster


{panel:title=Issue visible on UI2}
 !issue.png|height=250!
{panel}

Also another way to identify the issue if we can see too much time is required 
to store info for app after reach new_saving state

{panel:title=How issue can look like in log}

{panel}

h2. Solution

Created a *MultiDispatcher* class which implements the Dispatcher interface.
The Dispatcher creates a separate metric object called _Event metrics for 
"rm-state-store"_ where we can see 
- how many unhandled events are currently present in the event queue for the 
specific event type
- how many events were handled for the specific event type
- average execution time for the specific event

The dispatcher has the following configs ( the placeholder is for the 
dispatcher name, for example, rm-state-store )

||Config name||Description||Default value||
|yarn.dispatcher.multi-thread.{}.*default-pool-size*|How many parallel threads 
should execute the parallel event execution| 4|
|yarn.dispatcher.multi-thread.{}.*max-pool-size*|If the event queue is full the 
execution threads will scale up to this many|8|
|yarn.dispatcher.multi-thread.{}.*keep-alive-seconds*|Execution threads will be 
destroyed after this many seconds|10|
|yarn.dispatcher.multi-thread.{}.*queue-size*|Size of the eventqueue|1 000 000|
|yarn.dispatcher.multi-thread.{}.*monitor-seconds*|The size of the event queue 
will be logged with this frequency (if not zero) |30|
|yarn.dispatcher.multi-thread.{}.*graceful-stop-seconds*|After the stop signal 
the dispatcher will wait this many seconds to be able to process the incoming 
events before terminating them|60|



h2. Testing


> RMStateStore event queue blocked
> --------------------------------
>
>                 Key: YARN-11656
>                 URL: https://issues.apache.org/jira/browse/YARN-11656
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>    Affects Versions: 3.4.1
>            Reporter: Bence Kosztolnik
>            Priority: Major
>         Attachments: issue.png, log.png
>
>
> h2. Problem statement
>  
> I observed Yarn cluster has pending and available resources as well, but the 
> cluster utilization is usually around ~50%. The cluster had loaded with 200 
> parallel PI example job (from hadoop-mapreduce-examples) with 20 map and 20 
> reduce containers configured, on a 50 nodes cluster, where each node had 8 
> cores, and a lot of memory (there was cpu bottleneck).
> Finally, I realized the RM had some IO bottleneck and needed 1~20 seconds to 
> persist a RMStateStoreEvent (using FileSystemRMStateStore).
> To reduce the impact of the issue:
> - create a dispatcher where events can persist in parallel threads
> - create metric data for the RMStateStore event queue to be able easily to 
> identify the problem if occurs on a cluster
> {panel:title=Issue visible on UI2}
>  !issue.png|height=250!
> {panel}
> Also another way to identify the issue if we can see too much time is 
> required to store info for app after reach new_saving state
> {panel:title=How issue can look like in log}
>  !log.png! 
> {panel}
> h2. Solution
> Created a *MultiDispatcher* class which implements the Dispatcher interface.
> The Dispatcher creates a separate metric object called _Event metrics for 
> "rm-state-store"_ where we can see 
> - how many unhandled events are currently present in the event queue for the 
> specific event type
> - how many events were handled for the specific event type
> - average execution time for the specific event
> The dispatcher has the following configs ( the placeholder is for the 
> dispatcher name, for example, rm-state-store )
> ||Config name||Description||Default value||
> |yarn.dispatcher.multi-thread.{}.*default-pool-size*|How many parallel 
> threads should execute the parallel event execution| 4|
> |yarn.dispatcher.multi-thread.{}.*max-pool-size*|If the event queue is full 
> the execution threads will scale up to this many|8|
> |yarn.dispatcher.multi-thread.{}.*keep-alive-seconds*|Execution threads will 
> be destroyed after this many seconds|10|
> |yarn.dispatcher.multi-thread.{}.*queue-size*|Size of the eventqueue|1 000 
> 000|
> |yarn.dispatcher.multi-thread.{}.*monitor-seconds*|The size of the event 
> queue will be logged with this frequency (if not zero) |30|
> |yarn.dispatcher.multi-thread.{}.*graceful-stop-seconds*|After the stop 
> signal the dispatcher will wait this many seconds to be able to process the 
> incoming events before terminating them|60|
> h2. Testing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to