Looks good. Reviewed-by: Caleb Schlossin <cal...@linux.ibm.com>
On 5/11/25 10:10 PM, Nicholas Piggin wrote: > From: Glenn Miles <mil...@linux.ibm.com> > > The current xive algorithm for finding a matching group vCPU > target always uses the first vCPU found. And, since it always > starts the search with thread 0 of a core, thread 0 is almost > always used to handle group interrupts. This can lead to additional > interrupt latency and poor performance for interrupt intensive > work loads. > > Changing this to use a simple round-robin algorithm for deciding which > thread number to use when starting a search, which leads to a more > distributed use of threads for handling group interrupts. > > [npiggin: Also round-robin among threads, not just cores] > Signed-off-by: Glenn Miles <mil...@linux.ibm.com> > --- > hw/intc/pnv_xive2.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c > index 72cdf0f20c..d7ca97ecbb 100644 > --- a/hw/intc/pnv_xive2.c > +++ b/hw/intc/pnv_xive2.c > @@ -643,13 +643,18 @@ static int pnv_xive2_match_nvt(XivePresenter *xptr, > uint8_t format, > int i, j; > bool gen1_tima_os = > xive->cq_regs[CQ_XIVE_CFG >> 3] & CQ_XIVE_CFG_GEN1_TIMA_OS; > + static int next_start_core; > + static int next_start_thread; > + int start_core = next_start_core; > + int start_thread = next_start_thread; > > for (i = 0; i < chip->nr_cores; i++) { > - PnvCore *pc = chip->cores[i]; > + PnvCore *pc = chip->cores[(i + start_core) % chip->nr_cores]; > CPUCore *cc = CPU_CORE(pc); > > for (j = 0; j < cc->nr_threads; j++) { > - PowerPCCPU *cpu = pc->threads[j]; > + /* Start search for match with different thread each call */ > + PowerPCCPU *cpu = pc->threads[(j + start_thread) % > cc->nr_threads]; > XiveTCTX *tctx; > int ring; > > @@ -694,6 +699,15 @@ static int pnv_xive2_match_nvt(XivePresenter *xptr, > uint8_t format, > if (!match->tctx) { > match->ring = ring; > match->tctx = tctx; > + > + next_start_thread = j + start_thread + 1; > + if (next_start_thread >= cc->nr_threads) { > + next_start_thread = 0; > + next_start_core = i + start_core + 1; > + if (next_start_core >= chip->nr_cores) { > + next_start_core = 0; > + } > + } > } > count++; > }