> On Apr 27, 2023, at 22:02, Kevin Wolf <[email protected]> wrote:
> 
> Am 27.04.2023 um 15:22 hat zhoushl geschrieben:
>> Hi Kevin:
>> I’m sorry for missing commit message, next time I will be careful. When 
>> the application in guest vm execute fsync, qemu will execute fsync too. 
>> But when aio + dio is enabled, pagecache is bypassed
> 
> As far as I can tell, you don't need AIO for that, only DIO.
> 
>> and we could assure the data is on disk
> 
> No.
> 
>> (at least on the disk cache),
> 
> In some cases, for a local file system on a physical disk, yes. But
> this is not enough. The promise when a guest application calls fsync()
> is not that the data is in a potentially volatile disk cache, but on
> disk.
> 
> If the image is on a network file system, there are other options where
> the data could still be cached, like the page cache of the server.

Just as you mentioned, when the image is on network file system, the fsync
operation still can’t assure the data is really flushed to disk.

> 
>> so there is no needto sync anymore.  For example, we could execute the
>> following python script in vm:
>>      
>>      #!/usr/bin/python
>>      import os
>> 
>>      fo = os.open(“test.txt”, os.O_RDWR|os.O_CREAT)
>>      while True:
>>              os.write(fo, “123\n”)
>>              os.fsync(fo)
>> 
>>      os.closed(fo)
>> 
>> In this case, each write will take an fsync operation, which will
>> search the dirty page in pagecache, force flushing the metadata and
>> data into disk, which is often useless and waste IO resource and maybe
>> will cause write amplification in filesystem.
> 
> Yes, if you request an fsync(), you get an fsync(). This is necessary to
> fulfill the guarantes that fsync() makes. If a guest application doesn't
> want fsync() semantics, it shouldn't call it.
> 

In this extreme scenario(the fsync python script), could we do something 
to avoid the write amplification in filesystem? Sometimes the vm user don’t 
have a clear understanding about the backend storage and we don’t know 
what’s kind of application will be run in vm, but in qemu we could filter or 
ignore some improper operation.

> QEMU has an option cache.no <http://cache.no/>-flush=on for block backends 
> (cache=unsafe
> contains this), which will skip flushes. This is unsafe and if your host
> crashes, you may get a corrupted file system in the guest. But at the
> risk of losing your filesystem, it does save the overhead of these
> operations that you want to avoid.

When AIO is enabled, cache mode should be set to none or direct sync. 
Even call fsync() after each IO, the data in disk cache still will be missing 
when host crash. 

Reply via email to