[jira] [Commented] (YARN-8479) The capacity scheduler logs too frequently seriously affecting performance

Wangda Tan (JIRA) Fri, 29 Jun 2018 21:52:49 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528543#comment-16528543
 ]


Wangda Tan commented on YARN-8479:
----------------------------------

Thanks [~daemon], [~cheersyang], 

Basically, inside scheduling logic, I suggest to remove all 
non-allocation/reservation/release scheduler logs to debug. Otherwise when 
async scheduling is enabled, it could be very annoying to see such logs.

The most annoying log to me is. This happens when re-reservation happens. 
Scheduler do lots of re-reservation for the same reserved container, ideally we 
should only log once.

{code} 
2018-06-14 21:49:33,918 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:allocateContainerOnSingleNode(1431)) - Trying to 
fulfill reservation for application application_1527807533249_0089 on node: 
ctr-e138-1518143905142-92974-01-000009.hwx.site:45454
2018-06-14 21:49:33,918 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2794)) - Allocation proposal accepted
2018-06-14 21:49:33,918 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
Reserved container  application=application_1527807533249_0089 
resource=<memory:2
{code}

And for scheduling code path, not sure if you see any duplicated logs. 
Duplicated log info means, if we print container size / queue / node / 
application name at methodA, we should not do the same thing for the same 
container at methodB.

For the classes you listed: 
1) RMContainerImpl is actually very helpful for troubleshooting, especially for 
container states changed from ALLOCATED to ACQUIRED and RUNNING to finalized 
state.
2) RMAuditLogger, I think we could do thing for it.
3) For rest of them, let's check how many things we can do to optimize it.

[~daemon], If you have any thoughts about log optimization, you can share a 
patch so we can give some suggestions.

> The capacity scheduler logs too frequently seriously affecting performance
> --------------------------------------------------------------------------
>
>                 Key: YARN-8479
>                 URL: https://issues.apache.org/jira/browse/YARN-8479
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, capacityscheduler
>            Reporter: YunFan Zhou
>            Assignee: YunFan Zhou
>            Priority: Critical
>         Attachments: image-2018-06-29-14-16-06-332.png
>
>
> The capacity scheduler logs too frequently, seriously affecting performance.
> As a result of our test that the scheduling speed of capacity scheduler is 
> difficult to reach 5000/s in the production scenario.
> And it will soon reach the log bottleneck.
> My current work is to change many log levels from INFO to DEBUG level.
> [~wangda] [~leftnoteasy] Any suggestion?
>  
> !image-2018-06-29-14-16-06-332.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8479) The capacity scheduler logs too frequently seriously affecting performance

Reply via email to