On 09/01/2023 16:00, Robin Jarry wrote:
Kevin Traynor, Jan 06, 2023 at 15:59:
Sleep for an incremental amount of time if none of the Rx queues
assigned to a PMD have at least half a batch of packets (i.e. 16 pkts)
on an polling iteration of the PMD.
Upon detecting the threshold of >= 16 pkts on an Rxq, reset the
sleep time to zero (i.e. no sleep).
Sleep time will be increased on each iteration where the low load
conditions remain up to a total of the max sleep time which is set
by the user e.g:
ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500
The default pmd-maxsleep value is 0, which means that no sleeps
will occur and the default behaviour is unchanged from previously.
Also add new stats to pmd-perf-show to get visibility of operation
e.g.
...
- sleep iterations: 153994 ( 76.8 % of iterations)
Sleep time: 9159399 us ( 46 us/iteration avg.)
...
Signed-off-by: Kevin Traynor <[email protected]>
Hi Kevin,
Hi Robin,
For the record, here are a few numbers that were gathered on a HP DL360
Gen9 server (Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz) with and without
this patch series applied.
Single socket, Physical to physical test, 2 cores in pmd-cpu-mask, power
measurement with pcm-power:
+------------+------------+------------+--------------+-----------------+
| | Reference: | Powersave: | pmd-maxsleep | Power off |
| | disabled | | 500us | unused cores |
| | c-states | C6 enabled | C6 enabled | (X remaining) |
+------------+------------+------------+--------------+-----------------+
| No OvS | 33 W | 11.30W | N/A | 2 cores online |
| | | | | All OFF: 11.30W |
+------------+------------+------------+--------------+-----------------+
| No traffic | 37W | 26.5W | 12W | 12W |
| 0 PPS | | | | |
+------------+------------+------------+--------------+-----------------+
| Idle | 37W | 26.5W | 12W | 12W |
| 1k pps | | | | |
+------------+------------+------------+--------------+-----------------+
| Medium | 37W | 27W | 15-20W | 15-20W |
| 1 Mpps | | | | |
+------------+------------+------------+--------------+-----------------+
| High | 38W | 28W | 28W | 28W |
| 14 Mpps | | | | |
+------------+------------+------------+--------------+-----------------+
Interesting, thanks for trying it out. This is a good test showing that
system configuration changes are also needed to save power.
One thing to note is that probably the rest of the cores and most things
in the package are not doing much else now that the pmd threads are
sleeping in your test.
Sleeping these 2 cores alone when there are workloads on other cores may
result in much less power saving for the package overall. So YMMV
depending on the system config and workloads.
diff --git a/Documentation/topics/dpdk/pmd.rst
b/Documentation/topics/dpdk/pmd.rst
index 9006fd40f..89f6b3052 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -325,4 +325,55 @@ reassignment due to PMD Auto Load Balance. For example,
this could be set
(in min) such that a reassignment is triggered at most every few hours.
+PMD Power Saving (Experimental)
+-------------------------------
+
+PMD threads constantly poll Rx queues which are assigned to them. In order to
+reduce the CPU cycles they use, they can sleep for small periods of time
+when there is no load or very-low load on all the Rx queues they poll.
+
+This can be enabled by setting the max requested sleep time (in microseconds)
+for a PMD thread::
+
+ $ ovs-vsctl set open_vswitch . other_config:pmd-maxsleep=500
+
+Non-zero values will be rounded up to the nearest 10 microseconds to avoid
+requesting very small sleep times.
+
+With a non-zero max value a PMD may request to sleep by an incrementing amount
+of time up to the maximum time. If at any point the threshold of at least half
+a batch of packets (i.e. 16) is received from an Rx queue that the PMD is
+polling is met, the requested sleep time will be reset to 0. At that point no
+sleeps will occur until the no/low load conditions return.
+
+Sleeping in a PMD thread will mean there is a period of time when the PMD
+thread will not process packets. Sleep times requested are not guaranteed
+and can differ significantly depending on system configuration. The actual
+time not processing packets will be determined by the sleep and processor
+wake-up times and should be tested with each system configuration.
+
+Sleep time statistics for 10 secs can be seen with::
+
+ $ ovs-appctl dpif-netdev/pmd-stats-clear \
+ && sleep 10 && ovs-appctl dpif-netdev/pmd-perf-show
+
+Example output, showing that during the last 10 seconds, 76.8% of iterations
+had a sleep of some length. The total amount of sleep time was 9.15 seconds and
+the average sleep time per iteration was 46 microseconds::
+
+ - sleep iterations: 153994 ( 76.8 % of iterations)
+ Sleep time: 9159399 us ( 46 us/iteration avg.)
+
+.. note::
+
+ If there is a sudden spike of packets while the PMD thread is sleeping and
+ the processor is in a low-power state it may result in some lost packets or
+ extra latency before the PMD thread returns to processing packets at full
+ rate.
+
+.. note::
+
+ Default Linux kernel hrtimer resolution is set to 50 microseconds so this
+ will add overhead to requested sleep time.
I wonder if it would make sense to round up to the nearest hrtimer
resolution (if such info can be retrieved at runtime).
Hmm, I think I used the wrong word describing as 'resolution'. iiuc, the
kernel groups timer expirations so the timer expires later than
expected. In this case, it manifests more like a fixed overhead and
changing the resolution in OVS will not reduce the overhead.
David showed me that the slack timer could be changed to reduce
overhead, but it's not something I would be comfortable to do at the
moment as it could have some unintended consequences.
I have changed the text to:
"By default Linux kernel groups timer expirations and this can add an
overhead of up to 50 microseconds to a requested timer expiration."
Hope it's a bit clearer. Thanks for reviewing and your tests.
Cheers,
Reviewed-by: Robin Jarry <[email protected]>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev