[ 
https://issues.apache.org/jira/browse/YARN-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160121#comment-17160121
 ] 

Jim Brennan commented on YARN-10353:
------------------------------------

Thanks for the comments [~ebadger]!  I am definitely open to changing the 
wording to make things clearer.

The reason I did it this way was to match the style of the other CPU fields, 
and to keep it compact and parsable by scripts.  I was thinking spaces delimit 
fields, and colons delimit label vs value.   But it's definitely parsable 
either way, and memory components use the x of y format, so I can live with 
either.  I agree that 2 of 10 vCores is easier to read.  Was just trying for a 
more compact format.  Would {{Vcores: 2 of 10}} be better?

I used {{CPU-ms}} to make it clear what the units were (one complaint I had of 
the 2.8 format).  It represents total cpu time for the process tree since it 
started.  Maybe {{Accumulated-CPU-ms}}?

Speaking of which, I've always thought the existing CPU fields in the log line 
are mis-named.  {{CPU}} is showing cpu/processor (like what you would see in 
{{top}}, while {{CPU/proc}} is showing percent of CPU across all processors 
used by yarn.  Naming is hard.

 

> Log vcores used and cumulative cpu in containers monitor
> --------------------------------------------------------
>
>                 Key: YARN-10353
>                 URL: https://issues.apache.org/jira/browse/YARN-10353
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>    Affects Versions: 3.4.0
>            Reporter: Jim Brennan
>            Assignee: Jim Brennan
>            Priority: Minor
>         Attachments: YARN-10353.001.patch
>
>
> We currently log the percentage/cpu and percentage/cpus-used-by-yarn in the 
> Containers Monitor log. It would be useful to also log vcores used vs vcores 
> assigned, and total accumulated CPU time.
> For example, currently we have an audit log that looks like this:
> {noformat}
> 2020-07-16 20:33:51,550 DEBUG [Container Monitor] ContainersMonitorImpl.audit 
> (ContainersMonitorImpl.java:recordUsage(651)) - Resource usage of ProcessTree 
> 809 for container-id container_1594931466123_0002_01_000007: 309.5 MB of 2 GB 
> physical memory used; 2.8 GB of 4.2 GB virtual memory used CPU:143.0905 
> CPU/core:35.772625
> {noformat}
> The proposal is to add two more fields to show vCores and Cumulative CPU ms:
> {noformat}
> 2020-07-16 20:33:51,550 DEBUG [Container Monitor] ContainersMonitorImpl.audit 
> (ContainersMonitorImpl.java:recordUsage(651)) - Resource usage of ProcessTree 
> 809 for container-id container_1594931466123_0002_01_000007: 309.5 MB of 2 GB 
> physical memory used; 2.8 GB of 4.2 GB virtual memory used CPU:143.0905 
> CPU/core:35.772625 vCores:2/1 CPU-ms:4180
> {noformat}
> This is a snippet of a log from one of our clusters running branch-2.8 with a 
> similar change.
> {noformat}
> 2020-07-16 21:00:02,240 [Container Monitor] DEBUG 
> ContainersMonitorImpl.audit: Memory usage of ProcessTree 5267 for 
> container-id container_e04_1594079801456_1397450_01_001992: 1.6 GB of 2.5 GB 
> physical memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 18 of 
> 10 CPU vCores used. Cumulative CPU time: 157410
> 2020-07-16 21:00:02,269 [Container Monitor] DEBUG 
> ContainersMonitorImpl.audit: Memory usage of ProcessTree 18801 for 
> container-id container_e04_1594079801456_1390375_01_000019: 413.2 MB of 2.5 
> GB physical memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 0 
> of 10 CPU vCores used. Cumulative CPU time: 113830
> 2020-07-16 21:00:02,298 [Container Monitor] DEBUG 
> ContainersMonitorImpl.audit: Memory usage of ProcessTree 5279 for 
> container-id container_e04_1594079801456_1397450_01_001991: 2.2 GB of 2.5 GB 
> physical memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 17 of 
> 10 CPU vCores used. Cumulative CPU time: 128630
> 2020-07-16 21:00:02,339 [Container Monitor] DEBUG 
> ContainersMonitorImpl.audit: Memory usage of ProcessTree 24189 for 
> container-id container_e04_1594079801456_1390430_01_000415: 392.7 MB of 2.5 
> GB physical memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 0 
> of 10 CPU vCores used. Cumulative CPU time: 96060
> 2020-07-16 21:00:02,367 [Container Monitor] DEBUG 
> ContainersMonitorImpl.audit: Memory usage of ProcessTree 6751 for 
> container-id container_e04_1594079801456_1397923_01_003248: 1.3 GB of 3 GB 
> physical memory used; 4.3 GB of 6.3 GB virtual memory used. CPU usage: 12 of 
> 10 CPU vCores used. Cumulative CPU time: 116820
> 2020-07-16 21:00:02,396 [Container Monitor] DEBUG 
> ContainersMonitorImpl.audit: Memory usage of ProcessTree 12138 for 
> container-id container_e04_1594079801456_1397760_01_000044: 4.4 GB of 6 GB 
> physical memory used; 6.9 GB of 12.6 GB virtual memory used. CPU usage: 15 of 
> 10 CPU vCores used. Cumulative CPU time: 45900
> 2020-07-16 21:00:02,424 [Container Monitor] DEBUG 
> ContainersMonitorImpl.audit: Memory usage of ProcessTree 101918 for 
> container-id container_e04_1594079801456_1391130_01_002378: 2.4 GB of 4 GB 
> physical memory used; 5.8 GB of 8.4 GB virtual memory used. CPU usage: 13 of 
> 10 CPU vCores used. Cumulative CPU time: 2572390
> 2020-07-16 21:00:02,456 [Container Monitor] DEBUG 
> ContainersMonitorImpl.audit: Memory usage of ProcessTree 26596 for 
> container-id container_e04_1594079801456_1390446_01_000665: 418.6 MB of 2.5 
> GB physical memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 0 
> of 10 CPU vCores used. Cumulative CPU time: 101210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to