Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

Robin Jarry Mon, 11 Sep 2023 03:41:52 -0700

Hey Kevin,

Kevin Traynor, Sep 07, 2023 at 15:37:
> This came up in conversation with other maintainers as I mentioned I was 
> reviewing and the question raised was - Why add this ? if you want these 
> values exposed, wouldn't it be better to to add to ovsdb ?


That's a good point. I had considered using ovsdb but it seemed to me
less suitable for a few reasons:

* I had understood that ovsdb is a configuration database, not a state
  reporting database.

* To have reliable and up to date numbers, ovs would need to push them
  at high rate to the database so that clients to get outdated cpu
  usage. The DPDK telemetry socket is real-time, the current numbers are
  returned on every request.

* I would need to define a custom schema / table to store structured
  information in the db. The DPDK telemetry socket already has a schema
  defined for this.

* Accessing ovsdb requires a library making it more complex to use for
  telemetry scrapers. The DPDK telemetry socket can be accessed with
  a standalone python script with no external dependencies[1].

[1]: 
https://github.com/rjarry/dpdk/blob/main/usertools/prometheus-dpdk-exporter.py#L135-L143

Maybe my observations are wrong, please do correct me if they are.

> Are you looking for individual lcore usage with identification of that 
> pmd? or overall aggregate usage ?
>
> I ask because it will report lcore id's which would need to be mapped to 
> pmd core id's for anything regarding individual pmds.
>
> That can be found in ovs-vswitchd.log or checked locally with 
> 'ovs-appctl dpdk/lcore-list' but assuming if they were available, then 
> user would not be using dpdk telemetry anyway.

I would assume that the important data is the aggregate usage for
overall monitoring and resource planing. Individual pmd usage can be
accessed for fine tuning and debugging via appctl.

> These stats are cumulative so in the absence of 'ovs-appctl 
> dpif-netdev/pmd-stats-clear'  that would need to be taken care of with 
> some post-processing by whatever is pulling these stats - otherwise 
> you'll get cumulative stats for an unknown time period and unknown 
> traffic profile (e.g. it would be counting before any traffic started).
>
> These might also be reset with pmd-stats-clear independently, so that 
> would need to be accounted for too.

The only important data point that we need is the ratio between
busy/(busy + idle) over a specified delta which any scraper can do.
I consider these numbers like any other counter that can eventually be
reset.

See this reply from Morten Brørup on dpdk-dev for more context:

https://lore.kernel.org/dpdk-dev/[email protected]/

> Another thing I noticed is that without the pmd-sleep info the stats in 
> isolation can be misleading. Example below:
>
> With low rate traffic and clearing stats between 10 sec runs
>
> 2023-09-07T13:14:56Z|00158|dpif_netdev|INFO|PMD max sleep request is 0 
> usecs.
> 2023-09-07T13:14:56Z|00159|dpif_netdev|INFO|PMD load based sleeps are 
> disabled.
>
> Time: 13:15:06.842
> Measurement duration: 10.009 s
>
> pmd thread numa_id 0 core_id 8:
>
>    Iterations:             51712564  (0.19 us/it)
>    - Used TSC cycles:   26021354654  (100.0 % of total cycles)
>    - idle iterations:      51710963  ( 99.9 % of used cycles)
>    - busy iterations:          1601  (  0.1 % of used cycles)
>    - sleep iterations:            0  (  0.0 % of iterations)
> ^^^ can see here that pmd does not sleep and is 0.1% busy
>
>    Sleep time (us):               0  (  0 us/iteration avg.)
>    Rx packets:                37250  (4 Kpps, 866 cycles/pkt)
>    Datapath passes:           37250  (1.00 passes/pkt)
>    - PHWOL hits:                  0  (  0.0 %)
>    - MFEX Opt hits:               0  (  0.0 %)
>    - Simple Match hits:       37250  (100.0 %)
>    - EMC hits:                    0  (  0.0 %)
>    - SMC hits:                    0  (  0.0 %)
>    - Megaflow hits:               0  (  0.0 %, 0.00 subtbl lookups/hit)
>    - Upcalls:                     0  (  0.0 %, 0.0 us/upcall)
>    - Lost upcalls:                0  (  0.0 %)
>    Tx packets:                    0
>
> {
>    "/eal/lcore/usage": {
>      "lcore_ids": [
>        1
>      ],
>      "total_cycles": [
>        26127284389
>      ],
>      "busy_cycles": [
>        32370313
>      ]
>    }
> }
>
> ^^^ This in isolation implies pmd is 32370313/26127284389 0.12% busy 
> which is true
>
> 2023-09-07T13:15:06Z|00160|dpif_netdev|INFO|PMD max sleep request is 500 
> usecs.
> 2023-09-07T13:15:06Z|00161|dpif_netdev|INFO|PMD load based sleeps are 
> enabled.
>
> Time: 13:15:16.908
> Measurement duration: 10.008 s
>
> pmd thread numa_id 0 core_id 8:
>
>    Iterations:                75197  (133.09 us/it)
>    - Used TSC cycles:     237910969  (  0.9 % of total cycles)
>    - idle iterations:         73782  ( 74.4 % of used cycles)
>    - busy iterations:          1415  ( 25.6 % of used cycles)
>    - sleep iterations:        74033  ( 98.5 % of iterations)
> ^^^ can see here that pmd spends most of the time sleeping and is 25% 
> busy when it is not sleeping
>
>    Sleep time (us):         9916314  (134 us/iteration avg.)
>    Rx packets:                37249  (4 Kpps, 1637 cycles/pkt)
>    Datapath passes:           37249  (1.00 passes/pkt)
>    - PHWOL hits:                  0  (  0.0 %)
>    - MFEX Opt hits:               0  (  0.0 %)
>    - Simple Match hits:       37249  (100.0 %)
>    - EMC hits:                    0  (  0.0 %)
>    - SMC hits:                    0  (  0.0 %)
>    - Megaflow hits:               0  (  0.0 %, 0.00 subtbl lookups/hit)
>    - Upcalls:                     0  (  0.0 %, 0.0 us/upcall)
>    - Lost upcalls:                0  (  0.0 %)
>    Tx packets:                    0
>
> {
>    "/eal/lcore/usage": {
>      "lcore_ids": [
>        1
>      ],
>      "total_cycles": [
>        238786638
>      ],
>      "busy_cycles": [
>        61268951
>      ]
>    }
> }
>
> ^^^ this in isolation implies that pmd is 61268951/238786638 25% busy 
> but it's misleading because missing sleep info

Hmm I should add the sleep cycles to the total_cycles counter. I thought
it was part of idle. Good catch.

Thanks for the review and testing!

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

Reply via email to