On Mon, Jun 13, 2022 at 09:03:24PM +0530, manish.mishra wrote:
> 
> On 13/06/22 8:03 pm, Peter Xu wrote:
> > On Mon, Jun 13, 2022 at 03:28:34PM +0530, manish.mishra wrote:
> > > On 26/05/22 8:21 am, Jason Wang wrote:
> > > > On Wed, May 25, 2022 at 11:56 PM Peter Xu <pet...@redhat.com> wrote:
> > > > > On Wed, May 25, 2022 at 11:38:26PM +0800, Hyman Huang wrote:
> > > > > > > 2. Also this algorithm only control or limits dirty rate by guest
> > > > > > > writes. There can be some memory dirtying done by virtio based 
> > > > > > > devices
> > > > > > > which is accounted only at qemu level so may not be accounted 
> > > > > > > through
> > > > > > > dirty rings so do we have plan for that in future? Those are not 
> > > > > > > issue
> > > > > > > for auto-converge as it slows full VM but dirty rate limit only 
> > > > > > > slows
> > > > > > > guest writes.
> > > > > > > 
> > > > > >   From the migration point of view, time spent on migrating memory 
> > > > > > is far
> > > > > > greater than migrating devices emulated by qemu. I think we can do 
> > > > > > that when
> > > > > > migrating device costs the same magnitude time as migrating memory.
> > > > > > 
> > > > > > As to auto-converge, it throttle vcpu by kicking it and force it to 
> > > > > > sleep
> > > > > > periodically. The two seems has no much difference from the 
> > > > > > perspective of
> > > > > > internal method but the auto-converge is kind of "offensive" when 
> > > > > > doing
> > > > > > restraint. I'll read the auto-converge implementation code and 
> > > > > > figure out
> > > > > > the problem you point out.
> > > > > This seems to be not virtio-specific, but can be applied to any 
> > > > > device DMA
> > > > > writting to guest mem (if not including vfio).  But indeed virtio can 
> > > > > be
> > > > > normally faster.
> > > > > 
> > > > > I'm also curious how fast a device DMA could dirty memories.  This 
> > > > > could be
> > > > > a question to answer to all vcpu-based throttling approaches 
> > > > > (including the
> > > > > quota based approach that was proposed on KVM list).  Maybe for kernel
> > > > > virtio drivers we can have some easier estimation?
> > > > As you said below, it really depends on the speed of the backend.
> > > > 
> > > > >    My guess is it'll be
> > > > > much harder for DPDK-in-guest (aka userspace drivers) because IIUC 
> > > > > that
> > > > > could use a large chunk of guest mem.
> > > > Probably, for vhost-user backend, it could be ~20Mpps or even higher.
> > > Sorry for late response on this. We did experiment with IO on virtio-scsi 
> > > based disk.
> > Thanks for trying this and sharing it out.
> > 
> > > We could see dirty rate of ~500MBps on my system and most of that was not 
> > > tracked
> > > 
> > > as kvm_dirty_log. Also for reference i am attaching test we used to avoid 
> > > tacking
> > > 
> > > in KVM. (as attached file).
> > The number looks sane as it seems to be the sequential bandwidth for a
> > disk, though I'm not 100% sure it'll work as expected since you mmap()ed
> > the region with private pages rather than shared, so after you did I'm
> > wondering whether below will happen (also based on the fact that you mapped
> > twice the size of guest mem as you mentioned in the comment):
> > 
> >    (1) Swap out will start to trigger after you read a lot of data into the
> >        mem already, then old-read pages will be swapped out to disk (and
> >        hopefully the swap device does not reside on the same virtio-scsi
> >        disk or it'll be even more complicated scenario of mixture IOs..),
> >        meanwhile when you finish reading a round and start to read from
> >        offset 0 swap-in will start to happen too.  Swapping can slow down
> >        things already, and I'm wondering whether the 500MB/s was really
> >        caused by the swapout rather than backend disk reads.  More below.
> > 
> >    (2) Another attribute of private pages AFAICT is after you read it once
> >        it does not need to be read again from the virtio-scsi disks.  In
> >        other words, I'm thinking whether starting from the 2nd iteration
> >        your program won't trigger any DMA at all but purely torturing the
> >        swap device.
> > 
> > Maybe changing MAP_PRIVATE to MAP_SHARED can emulate better on what we want
> > to measure, but I'm also not 100% sure on whether it could be accurate..
> > 
> > Thanks,
> > 
> Thanks Peter, Yes agree MAP_SHARED should be used here, sorry i missed that 😁.
> 
> Yes, my purpose of taking file size larger than RAM_SIZE was to cause
> 
> frequent page cache flush and re-populating page-cache pages, not to
> 
> trigger swaps. I checked on my VM i had swapping disabled, may be
> 
> MAP_PRIVATE did not make difference because it was read-only.

Makes sense. And yeah I overlooked the RO part - indeed page cache will be
used for RO pages as long as never written.  So it'll behave like shared.

Otherwise for a swap-all-off you should have have hit OOM anyway and the
process probably will get killed sooner or later. :)

> 
> I tested again with MAP_SHARED it comes around ~500MBps.

Makes sense.  I'd guess that's the limitation of the virtio-scsi backend,
IOW the logical limitation of device dirtying memory could be unlimited
(e.g., when we put the virtio backend onto a ramdisk).

-- 
Peter Xu


Reply via email to