On Mon, Jun 13, 2022 at 09:03:24PM +0530, manish.mishra wrote: > > On 13/06/22 8:03 pm, Peter Xu wrote: > > On Mon, Jun 13, 2022 at 03:28:34PM +0530, manish.mishra wrote: > > > On 26/05/22 8:21 am, Jason Wang wrote: > > > > On Wed, May 25, 2022 at 11:56 PM Peter Xu <pet...@redhat.com> wrote: > > > > > On Wed, May 25, 2022 at 11:38:26PM +0800, Hyman Huang wrote: > > > > > > > 2. Also this algorithm only control or limits dirty rate by guest > > > > > > > writes. There can be some memory dirtying done by virtio based > > > > > > > devices > > > > > > > which is accounted only at qemu level so may not be accounted > > > > > > > through > > > > > > > dirty rings so do we have plan for that in future? Those are not > > > > > > > issue > > > > > > > for auto-converge as it slows full VM but dirty rate limit only > > > > > > > slows > > > > > > > guest writes. > > > > > > > > > > > > > From the migration point of view, time spent on migrating memory > > > > > > is far > > > > > > greater than migrating devices emulated by qemu. I think we can do > > > > > > that when > > > > > > migrating device costs the same magnitude time as migrating memory. > > > > > > > > > > > > As to auto-converge, it throttle vcpu by kicking it and force it to > > > > > > sleep > > > > > > periodically. The two seems has no much difference from the > > > > > > perspective of > > > > > > internal method but the auto-converge is kind of "offensive" when > > > > > > doing > > > > > > restraint. I'll read the auto-converge implementation code and > > > > > > figure out > > > > > > the problem you point out. > > > > > This seems to be not virtio-specific, but can be applied to any > > > > > device DMA > > > > > writting to guest mem (if not including vfio). But indeed virtio can > > > > > be > > > > > normally faster. > > > > > > > > > > I'm also curious how fast a device DMA could dirty memories. This > > > > > could be > > > > > a question to answer to all vcpu-based throttling approaches > > > > > (including the > > > > > quota based approach that was proposed on KVM list). Maybe for kernel > > > > > virtio drivers we can have some easier estimation? > > > > As you said below, it really depends on the speed of the backend. > > > > > > > > > My guess is it'll be > > > > > much harder for DPDK-in-guest (aka userspace drivers) because IIUC > > > > > that > > > > > could use a large chunk of guest mem. > > > > Probably, for vhost-user backend, it could be ~20Mpps or even higher. > > > Sorry for late response on this. We did experiment with IO on virtio-scsi > > > based disk. > > Thanks for trying this and sharing it out. > > > > > We could see dirty rate of ~500MBps on my system and most of that was not > > > tracked > > > > > > as kvm_dirty_log. Also for reference i am attaching test we used to avoid > > > tacking > > > > > > in KVM. (as attached file). > > The number looks sane as it seems to be the sequential bandwidth for a > > disk, though I'm not 100% sure it'll work as expected since you mmap()ed > > the region with private pages rather than shared, so after you did I'm > > wondering whether below will happen (also based on the fact that you mapped > > twice the size of guest mem as you mentioned in the comment): > > > > (1) Swap out will start to trigger after you read a lot of data into the > > mem already, then old-read pages will be swapped out to disk (and > > hopefully the swap device does not reside on the same virtio-scsi > > disk or it'll be even more complicated scenario of mixture IOs..), > > meanwhile when you finish reading a round and start to read from > > offset 0 swap-in will start to happen too. Swapping can slow down > > things already, and I'm wondering whether the 500MB/s was really > > caused by the swapout rather than backend disk reads. More below. > > > > (2) Another attribute of private pages AFAICT is after you read it once > > it does not need to be read again from the virtio-scsi disks. In > > other words, I'm thinking whether starting from the 2nd iteration > > your program won't trigger any DMA at all but purely torturing the > > swap device. > > > > Maybe changing MAP_PRIVATE to MAP_SHARED can emulate better on what we want > > to measure, but I'm also not 100% sure on whether it could be accurate.. > > > > Thanks, > > > Thanks Peter, Yes agree MAP_SHARED should be used here, sorry i missed that 😁. > > Yes, my purpose of taking file size larger than RAM_SIZE was to cause > > frequent page cache flush and re-populating page-cache pages, not to > > trigger swaps. I checked on my VM i had swapping disabled, may be > > MAP_PRIVATE did not make difference because it was read-only.
Makes sense. And yeah I overlooked the RO part - indeed page cache will be used for RO pages as long as never written. So it'll behave like shared. Otherwise for a swap-all-off you should have have hit OOM anyway and the process probably will get killed sooner or later. :) > > I tested again with MAP_SHARED it comes around ~500MBps. Makes sense. I'd guess that's the limitation of the virtio-scsi backend, IOW the logical limitation of device dirtying memory could be unlimited (e.g., when we put the virtio backend onto a ramdisk). -- Peter Xu