Am 28.04.2023 um 11:39 hat zhoushl geschrieben:
> > On Apr 27, 2023, at 22:02, Kevin Wolf <[email protected]> wrote:
> > Am 27.04.2023 um 15:22 hat zhoushl geschrieben:
> >> Hi Kevin:
> >> I’m sorry for missing commit message, next time I will be careful. When 
> >> the application in guest vm execute fsync, qemu will execute fsync too. 
> >> But when aio + dio is enabled, pagecache is bypassed
> > 
> > As far as I can tell, you don't need AIO for that, only DIO.
> > 
> >> and we could assure the data is on disk
> > 
> > No.
> > 
> >> (at least on the disk cache),
> > 
> > In some cases, for a local file system on a physical disk, yes. But
> > this is not enough. The promise when a guest application calls fsync()
> > is not that the data is in a potentially volatile disk cache, but on
> > disk.
> > 
> > If the image is on a network file system, there are other options where
> > the data could still be cached, like the page cache of the server.
> 
> Just as you mentioned, when the image is on network file system, the fsync
> operation still can’t assure the data is really flushed to disk.

Why? Every serious network protocol supports a flush command that makes
sure that the data is really flushed to disk.

But O_DIRECT is a concept that has very little to do with flushing to
disk. This is why cache=none in QEMU is still a writeback mode, not a
writethrough mode, and relies on guests issuing flush commands in the
right places.

> >> so there is no needto sync anymore.  For example, we could execute the
> >> following python script in vm:
> >>    
> >>    #!/usr/bin/python
> >>    import os
> >> 
> >>    fo = os.open(“test.txt”, os.O_RDWR|os.O_CREAT)
> >>    while True:
> >>                    os.write(fo, “123\n”)
> >>                    os.fsync(fo)
> >> 
> >>    os.closed(fo)
> >> 
> >> In this case, each write will take an fsync operation, which will
> >> search the dirty page in pagecache, force flushing the metadata and
> >> data into disk, which is often useless and waste IO resource and maybe
> >> will cause write amplification in filesystem.
> > 
> > Yes, if you request an fsync(), you get an fsync(). This is necessary to
> > fulfill the guarantes that fsync() makes. If a guest application doesn't
> > want fsync() semantics, it shouldn't call it.
> 
> In this extreme scenario(the fsync python script), could we do
> something to avoid the write amplification in filesystem? Sometimes
> the vm user don’t have a clear understanding about the backend storage
> and we don’t know what’s kind of application will be run in vm, but in
> qemu we could filter or ignore some improper operation.

The guest application doesn't have to know what storage it runs on. It
only has to be aware of whether or not it needs to call fsync. If it
doesn't need it (like your Python script), it shouldn't call it. If it
does need it, then QEMU can't ignore the request without breaking the
correctness of the application.

> > QEMU has an option cache.no <http://cache.no/>-flush=on for block backends 
> > (cache=unsafe
> > contains this), which will skip flushes. This is unsafe and if your host
> > crashes, you may get a corrupted file system in the guest. But at the
> > risk of losing your filesystem, it does save the overhead of these
> > operations that you want to avoid.
> 
> When AIO is enabled, cache mode should be set to none or direct sync.

This is true in so far as Linux AIO only works with O_DIRECT.

> Even call fsync() after each IO, the data in disk cache still will be
> missing when host crash.

This is wrong, fsync() makes sure that the data in the disk cache is
written to the disk.

Kevin


Reply via email to