[
https://issues.apache.org/jira/browse/YARN-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973932#comment-16973932
]
zhoukang commented on YARN-9979:
--------------------------------
I think we can add throttle logic for ContainerAllocationExpirer
> When a app expired with many containers , scheduler event size will be huge
> ---------------------------------------------------------------------------
>
> Key: YARN-9979
> URL: https://issues.apache.org/jira/browse/YARN-9979
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager, scheduler
> Reporter: zhoukang
> Assignee: zhoukang
> Priority: Major
>
> When there is an app expired with many containers, the scheduler event size
> will be huge.
> {code:java}
> 2019-11-11,21:39:49,690 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 9000
> 2019-11-11,21:39:49,695 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 10000
> 2019-11-11,21:39:49,700 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 11000
> 2019-11-11,21:39:49,705 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 12000
> 2019-11-11,21:39:49,710 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 13000
> 2019-11-11,21:39:49,715 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 14000
> 2019-11-11,21:39:49,720 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Discarded 1
> messages due to full event buffer including: Size of scheduler event-queue is
> 15000
> 2019-11-11,21:39:49,724 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 16000
> 2019-11-11,21:39:49,729 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 17000
> 2019-11-11,21:39:49,733 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 18000
> 2019-11-11,21:40:14,953 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 19000
> 2019-11-11,21:43:09,743 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 19000
> 2019-11-11,21:43:09,750 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 20000
> 2019-11-11,21:43:09,758 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 21000
> 2019-11-11,21:43:09,766 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 22000
> 2019-11-11,21:43:09,775 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 23000
> 2019-11-11,21:43:09,783 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 24000
> 2019-11-11,21:43:09,792 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 25000
> 2019-11-11,21:43:09,800 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 26000
> 2019-11-11,21:43:09,807 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 27000
> 2019-11-11,21:43:09,814 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 28000
> 2019-11-11,21:46:29,830 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 29000
> 2019-11-11,21:46:29,841 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 30000
> 2019-11-11,21:46:29,850 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 31000
> 2019-11-11,21:46:29,862 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 32000
> 2019-11-11,21:49:49,875 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 33000
> 2019-11-11,21:49:49,875 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 34000
> 2019-11-11,21:49:49,876 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 35000
> 2019-11-11,21:49:49,882 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 36000
> 2019-11-11,21:49:49,887 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 37000
> 2019-11-11,21:49:49,891 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 38000
> 2019-11-11,21:49:49,896 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 39000
> 2019-11-11,21:49:49,900 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 40000
> 2019-11-11,21:49:49,905 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 41000
> 2019-11-11,21:49:49,910 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 42000
> 2019-11-11,21:49:49,914 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 43000
> 2019-11-11,21:49:49,919 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 44000
> 2019-11-11,21:49:49,923 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 45000
> 2019-11-11,21:49:49,927 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 46000
> 2019-11-11,21:49:49,932 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 47000
> 2019-11-11,21:49:49,938 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 48000
> 2019-11-11,21:49:49,943 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 49000
> 2019-11-11,21:49:49,947 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 50000
> 2019-11-11,21:49:49,951 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 51000
> 2019-11-11,21:49:49,956 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 52000
> 2019-11-11,21:49:49,961 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 53000
> 2019-11-11,21:49:49,967 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 54000
> 2019-11-11,21:49:49,972 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 55000
> 2019-11-11,21:49:49,976 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 56000
> 2019-11-11,21:49:49,980 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 57000
> 2019-11-11,21:49:49,983 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 58000
> 2019-11-11,21:49:49,988 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 59000
> 2019-11-11,21:49:49,991 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 60000
> 2019-11-11,21:49:49,996 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 61000
> 2019-11-11,21:53:10,004 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 61000
> 2019-11-11,21:53:10,014 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 62000
> 2019-11-11,21:53:10,022 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 63000
> 2019-11-11,21:53:10,032 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 64000
> 2019-11-11,21:53:10,034 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 65000
> 2019-11-11,21:53:10,040 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 66000
> 2019-11-11,21:53:10,046 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 67000
> 2019-11-11,21:56:30,056 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 68000
> 2019-11-11,21:56:30,067 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 69000
> 2019-11-11,21:56:30,077 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 70000
> 2019-11-11,21:56:30,086 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 71000
> 2019-11-11,21:56:30,094 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 72000
> 2019-11-11,21:56:30,102 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 73000
> 2019-11-11,21:56:30,107 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 74000
> 2019-11-11,21:56:30,111 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 75000
> 2019-11-11,21:56:30,116 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 76000
> 2019-11-11,21:56:30,122 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 77000
> 2019-11-11,21:59:50,128 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 78000
> 2019-11-11,21:59:50,135 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 79000
> 2019-11-11,21:59:50,140 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 80000
> 2019-11-11,21:59:50,145 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 81000
> 2019-11-11,21:59:50,149 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 82000
> 2019-11-11,21:59:50,154 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 83000
> 2019-11-11,21:59:50,159 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 84000
> 2019-11-11,21:59:50,164 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 85000
> 2019-11-11,21:59:50,168 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 86000
> 2019-11-11,21:59:52,305 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 87000
> 2019-11-11,22:03:10,175 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 87000
> 2019-11-11,22:03:10,181 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 88000
> 2019-11-11,22:03:10,186 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 89000
> 2019-11-11,22:03:10,191 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 90000
> 2019-11-11,22:03:10,196 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 91000
> 2019-11-11,22:03:10,201 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 92000
> 2019-11-11,22:03:10,206 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Discarded 1
> messages due to full event buffer including: Size of scheduler event-queue is
> 93000
> 2019-11-11,22:03:10,211 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 94000
> 2019-11-11,22:03:10,215 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Discarded 1
> messages due to full event buffer including: Size of scheduler event-queue is
> 95000
> 2019-11-11,22:06:30,221 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 96000
> 2019-11-11,22:06:30,227 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 97000
> 2019-11-11,22:06:30,234 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 98000
> 2019-11-11,22:06:30,240 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 99000
> 2019-11-11,22:06:30,245 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 100000
> 2019-11-11,22:06:30,250 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 101000
> 2019-11-11,22:07:40,962 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 102000
> 2019-11-11,22:09:50,259 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 91000
> 2019-11-11,22:09:50,269 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 92000
> 2019-11-11,22:09:50,278 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 93000
> 2019-11-11,22:09:50,287 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 94000
> 2019-11-11,22:09:50,295 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 95000
> 2019-11-11,22:09:50,302 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 96000
> 2019-11-11,22:09:50,310 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 97000
> 2019-11-11,22:13:03,082 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 53000
> 2019-11-11,22:13:10,318 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 54000
> 2019-11-11,22:13:10,324 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 55000
> 2019-11-11,22:13:10,330 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 56000
> 2019-11-11,22:13:10,338 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 57000
> 2019-11-11,22:13:10,347 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 58000
> 2019-11-11,22:13:10,354 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Size of
> scheduler event-queue is 59000
> {code}
> Container expired at given time:
> {code:java}
> [work@xxx zhoukang-yarn]$ grep "21:39:" expired.1 | wc -l
> 11377
> [work@xxx zhoukang-yarn]$ grep "21:43:" expired.1 | wc -l
> 10508
> [work@xxx zhoukang-yarn]$ grep "21:49:" expired.1 | wc -l
> 29269
> {code}
> {code:java}
> private class PingChecker implements Runnable {
> @Override
> public void run() {
> while (!stopped && !Thread.currentThread().isInterrupted()) {
> synchronized (AbstractLivelinessMonitor.this) {
> Iterator<Map.Entry<O, Long>> iterator =
> running.entrySet().iterator();
> // avoid calculating current time everytime in loop
> long currentTime = clock.getTime();
> while (iterator.hasNext()) {
> Map.Entry<O, Long> entry = iterator.next();
> O key = entry.getKey();
> long interval = getExpireInterval(key);
> if (currentTime > entry.getValue() + interval) {
> iterator.remove();
> expire(key);
> LOG.info("Expired:" + entry.getKey().toString()
> + " Timed out after " + interval / 1000 + " secs");
> }
> }
> }
> try {
> Thread.sleep(monitorInterval);
> } catch (InterruptedException e) {
> LOG.info(getName() + " thread interrupted");
> break;
> }
> }
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]