On 25Dec2020 09:29, Steven D'Aprano <st...@pearwood.info> wrote:
>On Thu, Dec 24, 2020 at 12:15:08PM -0500, Michael A. Smith wrote:
>
>> With all the buffering that modern disks and filesystems do, a
>> specific question has come up a few times with respect to whether or
>> not data was actually written after flush. I think it would be pretty
>> useful for the standard library to have a variant in the io module
>> that would explicitly fsync on close.
>
>One argument against this idea is that "disks and file systems buffer
>for a reason, you should trust them, explicitly calling sync after every
>written file is just going to slow I/O down".
>
>Personally I don't believe this argument, I've been bitten many, many
>times until I learned to explicitly sync files, but its an argument you
>should counter.

By contrast, I support this argument. The _vast_ majority of things 
don't need to sync their data all the way to the hardware base substrate 
(eg magnetic fields on spinning rust).

And on the whole, if I do care, I issue a single sync() call at the end 
of a large task (typically interactively, at a prompt!) rather than 
forcing a heap of performance impairing stutters all the way through 
some process because many per-file syncs force that.

IMO, per-file syncs fall into the "policy" arena: aside from low level 
tools (example: fdisk, a disc partition editor), to my mind the purpose 
of the kernel is to accept responsibility for my data when I hand it 
off.

Perhaps for you that isn't enough; for me it normally is. And when it 
isn't, I'll take steps myself, _outside_ the programme, to ensure the 
sync or commit or off site backup is complete when it matters. Thus the 
policy is in my hands.

The tool which causes a per-file sync all on every close, or even after 
every write, is a performance killer. The faster our hardware, the less 
that may seem to matter (and, conversely, the less the risk as the 
ordinary kernel I/O flushing will catch up faster). But when the 
hardware slowness _is_ relevant, if I can't turn that off I have a 
needlessly unperformant task.

The example which stands out in my own mind is when I was using firefox 
on a laptop with a spinning rust hard drive (and being a laptop 
hardware, a low power physically slow piece of spinning rust). There was 
once a setting to turn off the synchronous-write sqlite setting (used 
for history and bookmarks). That was _visibly obvious_ in the user 
experience. And I turned it off. As a matter of policy, those data 
didn't need such care.

So I'm resistant to this kind of thing because IMO it leads to an 
attractive nuisance: over use of sync or fsync for everything. And it 
will usually not be exposed as policy the user can adjust/disable.

My rule of thumb:

    If it can't be turned off, it's not a feature. - Karl Heuer

>Another argument is that even syncing your data doesn't mean that the
>data is actually written to disk, since the hardware can lie. On the
>other hand, I don't know what anyone can do, not even the kernel, in the
>face of deceitful hardware.

Aye.

But in principle, after a sync() or fsync() the kernel at least believes 
that. Hardware which lies, or which claims saved data without having the 
rresources to guarrentee it (eg a small battery to complete the writes 
if there's a power out) is indeed nasty.

>> You might be tempted to argue that this can be done very easily in
>> Python already, so why include it in the standard io module?

I would indeed. There _should_ be a small bar which at least causes the 
programmer to think "do I really need this here"? I suppose a 
"fsync=False" default parameter is a visible bar.

[...]
>I mean, the obvious way is:
>
>    try:
>        with open(..., 'w') as f:
>            f.write("stuff")
>    finally:
>        os.sync()

An os.fsync(f.fileno()) is lower impact - os.sync() requests a sync of 
all filesystems.

>so maybe all we really need is a "sync file" context manager.

Aye. Fully agree here, and frankly think this is a "write your own" 
situation. Except, of course, that like all "write your own" one/few 
liners there will be suboptimal or buggy ones released. Such as the 
"overly wide sync" from your os.sync() above.

Personally I'm -1 on this. A context manager while goes f.flush() 
os.fsync(f.fileno()) seems plenty, and easy to roll your own.

Cheers,
Cameron Simpson <c...@cskk.id.au>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QSDOU4NA2YIZSOKM6OJKCBSEVMMXMRVZ/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to