[
https://issues.apache.org/jira/browse/YARN-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528543#comment-16528543
]
Wangda Tan commented on YARN-8479:
----------------------------------
Thanks [~daemon], [~cheersyang],
Basically, inside scheduling logic, I suggest to remove all
non-allocation/reservation/release scheduler logs to debug. Otherwise when
async scheduling is enabled, it could be very annoying to see such logs.
The most annoying log to me is. This happens when re-reservation happens.
Scheduler do lots of re-reservation for the same reserved container, ideally we
should only log once.
{code}
2018-06-14 21:49:33,918 INFO capacity.CapacityScheduler
(CapacityScheduler.java:allocateContainerOnSingleNode(1431)) - Trying to
fulfill reservation for application application_1527807533249_0089 on node:
ctr-e138-1518143905142-92974-01-000009.hwx.site:45454
2018-06-14 21:49:33,918 INFO capacity.CapacityScheduler
(CapacityScheduler.java:tryCommit(2794)) - Allocation proposal accepted
2018-06-14 21:49:33,918 INFO allocator.AbstractContainerAllocator
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) -
Reserved container application=application_1527807533249_0089
resource=<memory:2
{code}
And for scheduling code path, not sure if you see any duplicated logs.
Duplicated log info means, if we print container size / queue / node /
application name at methodA, we should not do the same thing for the same
container at methodB.
For the classes you listed:
1) RMContainerImpl is actually very helpful for troubleshooting, especially for
container states changed from ALLOCATED to ACQUIRED and RUNNING to finalized
state.
2) RMAuditLogger, I think we could do thing for it.
3) For rest of them, let's check how many things we can do to optimize it.
[~daemon], If you have any thoughts about log optimization, you can share a
patch so we can give some suggestions.
> The capacity scheduler logs too frequently seriously affecting performance
> --------------------------------------------------------------------------
>
> Key: YARN-8479
> URL: https://issues.apache.org/jira/browse/YARN-8479
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacity scheduler, capacityscheduler
> Reporter: YunFan Zhou
> Assignee: YunFan Zhou
> Priority: Critical
> Attachments: image-2018-06-29-14-16-06-332.png
>
>
> The capacity scheduler logs too frequently, seriously affecting performance.
> As a result of our test that the scheduling speed of capacity scheduler is
> difficult to reach 5000/s in the production scenario.
> And it will soon reach the log bottleneck.
> My current work is to change many log levels from INFO to DEBUG level.
> [~wangda] [~leftnoteasy] Any suggestion?
>
> !image-2018-06-29-14-16-06-332.png!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]