On Mon, Jun 02, 2025 at 11:26:36AM +0200, Markus Armbruster wrote: > Peter Xu <pet...@redhat.com> writes: > > > Blocktime so far only cares about the time one vcpu (or the whole system) > > got blocked. It would be also be helpful if it can also report the latency > > of page requests, which could be very sensitive during postcopy. > > > > Blocktime itself is sometimes not very important, especially when one > > thinks about KVM async PF support, which means vCPUs are literally almost > > not blocked at all because the guest OS is smart enough to switch to > > another task when a remote fault is needed. > > > > However, latency is still sensitive and important because even if the guest > > vCPU is running on threads that do not need a remote fault, the workload > > that accesses some missing page is still affected. > > > > Add two entries to the report, showing how long it takes to resolve a > > remote fault. Mention in the QAPI doc that this is not the real average > > fault latency, but only the ones that was requested for a remote fault. > > > > Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice, > > meanwhile add the entry checks in qtests for all postcopy tests. > > > > Cc: Markus Armbruster <arm...@redhat.com> > > Cc: Dr. David Alan Gilbert <d...@treblig.org> > > Signed-off-by: Peter Xu <pet...@redhat.com> > > --- > > qapi/migration.json | 13 +++++ > > migration/migration-hmp-cmds.c | 70 ++++++++++++++++++--------- > > migration/postcopy-ram.c | 48 ++++++++++++------ > > tests/qtest/migration/migration-qmp.c | 3 ++ > > 4 files changed, 97 insertions(+), 37 deletions(-) > > > > diff --git a/qapi/migration.json b/qapi/migration.json > > index 8b9c53595c..8b13cea169 100644 > > --- a/qapi/migration.json > > +++ b/qapi/migration.json > > @@ -236,6 +236,17 @@ > > # This is only present when the postcopy-blocktime migration > > # capability is enabled. (Since 3.0) > > # > > +# @postcopy-latency: average remote page fault latency (in us). Note that > > +# this doesn't include all faults, but only the ones that require a > > +# remote page request. So it should be always bigger than the real > > +# average page fault latency. This is only present when the > > +# postcopy-blocktime migration capability is enabled. (Since 10.1) > > +# > > +# @postcopy-vcpu-latency: average remote page fault latency per vCPU (in > > +# us). It has the same definition of @postcopy-latency, but instead > > +# this is the per-vCPU statistics. This is only present when the > > Two spaces between sentences for consistency, please.
Fixed. There's another similar occurance in the last patch, I'll fix that too. > > > +# postcopy-blocktime migration capability is enabled. (Since 10.1) > > I figure the the @i-th array element is for vCPU with index @i. Correct? > > This is also only present when @postcopy-blocktime is enabled. Correct? Correct on both. > > Could a QMP client compute @postcopy-latency from > @postcopy-vcpu-latency? Not with the current API. Right now, the reported values are per-vCPU average latencies and global average latencies, not yet per-vCPU fault counts. Per-vCPU fault counts will be needed to do the calculation. I chose to export global average latency only because that should be the most important one to me as of now. The per-vCPU results are pretty much side effect of how blocktime feature does accounting so far (which is based on per-vCPU), so it's very low hanging fruit. Thanks, -- Peter Xu