On Thu, Dec 24, 2020 at 12:15:08PM -0500, Michael A. Smith wrote:

> With all the buffering that modern disks and filesystems do, a
> specific question has come up a few times with respect to whether or
> not data was actually written after flush. I think it would be pretty
> useful for the standard library to have a variant in the io module
> that would explicitly fsync on close.

One argument against this idea is that "disks and file systems buffer 
for a reason, you should trust them, explicitly calling sync after every 
written file is just going to slow I/O down".

Personally I don't believe this argument, I've been bitten many, many 
times until I learned to explicitly sync files, but its an argument you 
should counter.

Another argument is that even syncing your data doesn't mean that the 
data is actually written to disk, since the hardware can lie. On the 
other hand, I don't know what anyone can do, not even the kernel, in the 
face of deceitful hardware.


> You might be tempted to argue that this can be done very easily in
> Python already, so why include it in the standard io module?
> 
> 1. It seems to me that it would be better to do this in C, so for the
> folks who need to make a consistency > performance kind of choice,
> they don't have to sacrifice any additional performance.

The actual I/O is surely going to outweigh the cost of calling sync from 
Python.

This sounds like a trivial micro-optimization for small files, and an 
undetectable one for large files on slow media. If you save a dozen 
microseconds when syncing a two gigabyte file written to a USB-2 stick, 
the sync might take four or five minutes. Are you even going to notice 
the difference?

I think you need to show benchmarks before claiming that this needs to 
be in C.


> 2. Having it in the io library will call attention to this issue,
> which I think is something a lot of folks don't consider. Assuming
> that `close` or `flush` are sufficient for consistency has always been
> wrong (at its limits), but it was less likely to be a stumbling block
> in the past, when buffering was less aggressive and less layered and
> the peak size and continuous-ness of data streams was a more niche
> concern.

I don't know, I wonder whether burying it in the io library will make it 
disappear.

Perhaps a "sync on close" keyword argument to open? At least then it is 
always available and easily discoverable.


> 3. There are many ways to do this, and I think several of them could
> be subtly incorrect.

Can you elaborate?

I mean, the obvious way is:

    try:
        with open(..., 'w') as f:
            f.write("stuff")
    finally:
        os.sync()

so maybe all we really need is a "sync file" context manager.



-- 
Steve
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ITYDART2XW26WEOEFY6BO6JAZICDYEF7/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to