On Thu, Oct 26, 2023 at 05:06:37PM +0100, Joao Martins wrote: > On 26/10/2023 16:53, Peter Xu wrote: > > This small series (actually only the last patch; first two are cleanups) > > wants to improve ability of QEMU downtime analysis similarly to what Joao > > used to propose here: > > > > https://lore.kernel.org/r/20230926161841.98464-1-joao.m.mart...@oracle.com > > > Thanks for following up on the idea; It's been hard to have enough bandwidth > for > everything on the past set of weeks :(
Yeah, totally understdood. I think our QE team pushed me towards some series like this, while my plan was waiting for your new version. :) Then when I started I decided to go into per-device. I was thinking of also persist that information, but then I remembered some ppc guest can have ~40,000 vmstates.. and memory to maintain that may or may not regress a ppc user. So I figured I should first keep it simple with tracepoints. > > > But with a few differences: > > > > - Nothing exported yet to qapi, all tracepoints so far > > > > - Instead of major checkpoints (stop, iterable, non-iterable, resume-rp), > > finer granule by providing downtime measurements for each vmstate (I > > made microsecond to be the unit to be accurate). So far it seems > > iterable / non-iterable is the core of the problem, and I want to nail > > it to per-device. > > > > - Trace dest QEMU too > > > > For the last bullet: consider the case where a device save() can be super > > fast, while load() can actually be super slow. Both of them will > > contribute to the ultimate downtime, but not a simple summary: when src > > QEMU is save()ing on device1, dst QEMU can be load()ing on device2. So > > they can run in parallel. However the only way to figure all components of > > the downtime is to record both. > > > > Please have a look, thanks. > > > > I like your series, as it allows a user to pinpoint one particular bad device, > while covering the load side too. The checkpoints of migration on the other > hand > were useful -- while also a bit ugly -- for the sort of big picture of how > downtime breaks down. Perhaps we could add that /also/ as tracepoitns without > specifically commiting to be exposed in QAPI. > > More fundamentally, how can one capture the 'stop' part? There's also time > spent > there like e.g. quiescing/stopping vhost-net workers, or suspending the VF > device. All likely as bad to those tracepoints pertaining device-state/ram > related stuff (iterable and non-iterable portions). Yeah that's a good point. I didn't cover "stop" yet because I think it's just more tricky and I didn't think it all through, yet. The first question is, when stopping some backends, the vCPUs are still running, so it's not 100% clear to me on which should be contributed as part of real downtime. Meanwhile that'll be another angle besides vmstates: need to keep some eye on the state change handlers, and that can be a device, or something else. Did you measure the stop process in some way before? Do you have some rough number or anything surprising you already observed? Thanks, -- Peter Xu