On 5/11/2025 10:10 PM, Nicholas Piggin wrote:
From: Glenn Miles <mil...@linux.ibm.com>
The current xive algorithm for finding a matching group vCPU
target always uses the first vCPU found. And, since it always
starts the search with thread 0 of a core, thread 0 is almost
always used to handle group interrupts. This can lead to additional
interrupt latency and poor performance for interrupt intensive
work loads.
Changing this to use a simple round-robin algorithm for deciding which
thread number to use when starting a search, which leads to a more
distributed use of threads for handling group interrupts.
Reviewed-by: Michael Kowal <ko...@linux.ibm.com>
Thanks MAK
[npiggin: Also round-robin among threads, not just cores]
Signed-off-by: Glenn Miles <mil...@linux.ibm.com>
---
hw/intc/pnv_xive2.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c
index 72cdf0f20c..d7ca97ecbb 100644
--- a/hw/intc/pnv_xive2.c
+++ b/hw/intc/pnv_xive2.c
@@ -643,13 +643,18 @@ static int pnv_xive2_match_nvt(XivePresenter *xptr,
uint8_t format,
int i, j;
bool gen1_tima_os =
xive->cq_regs[CQ_XIVE_CFG >> 3] & CQ_XIVE_CFG_GEN1_TIMA_OS;
+ static int next_start_core;
+ static int next_start_thread;
+ int start_core = next_start_core;
+ int start_thread = next_start_thread;
for (i = 0; i < chip->nr_cores; i++) {
- PnvCore *pc = chip->cores[i];
+ PnvCore *pc = chip->cores[(i + start_core) % chip->nr_cores];
CPUCore *cc = CPU_CORE(pc);
for (j = 0; j < cc->nr_threads; j++) {
- PowerPCCPU *cpu = pc->threads[j];
+ /* Start search for match with different thread each call */
+ PowerPCCPU *cpu = pc->threads[(j + start_thread) % cc->nr_threads];
XiveTCTX *tctx;
int ring;
@@ -694,6 +699,15 @@ static int pnv_xive2_match_nvt(XivePresenter *xptr, uint8_t format,
if (!match->tctx) {
match->ring = ring;
match->tctx = tctx;
+
+ next_start_thread = j + start_thread + 1;
+ if (next_start_thread >= cc->nr_threads) {
+ next_start_thread = 0;
+ next_start_core = i + start_core + 1;
+ if (next_start_core >= chip->nr_cores) {
+ next_start_core = 0;
+ }
+ }
}
count++;
}