On Thu, Dec 24, 2020 at 6:18 PM Cameron Simpson <c...@cskk.id.au> wrote:
>
> On 25Dec2020 09:29, Steven D'Aprano <st...@pearwood.info> wrote:
> >On Thu, Dec 24, 2020 at 12:15:08PM -0500, Michael A. Smith wrote:
> >
> >> With all the buffering that modern disks and filesystems do, a
> >> specific question has come up a few times with respect to whether or
> >> not data was actually written after flush. I think it would be pretty
> >> useful for the standard library to have a variant in the io module
> >> that would explicitly fsync on close.
> >
> >One argument against this idea is that "disks and file systems buffer
> >for a reason, you should trust them, explicitly calling sync after every
> >written file is just going to slow I/O down".
> >
> >Personally I don't believe this argument, I've been bitten many, many
> >times until I learned to explicitly sync files, but its an argument you
> >should counter.
>
> By contrast, I support this argument. The _vast_ majority of things
> don't need to sync their data all the way to the hardware base substrate
> (eg magnetic fields on spinning rust).
>
> And on the whole, if I do care, I issue a single sync() call at the end
> of a large task (typically interactively, at a prompt!) rather than
> forcing a heap of performance impairing stutters all the way through
> some process because many per-file syncs force that.
>
> IMO, per-file syncs fall into the "policy" arena: aside from low level
> tools (example: fdisk, a disc partition editor), to my mind the purpose
> of the kernel is to accept responsibility for my data when I hand it
> off.
>
> Perhaps for you that isn't enough; for me it normally is. And when it
> isn't, I'll take steps myself, _outside_ the programme, to ensure the
> sync or commit or off site backup is complete when it matters. Thus the
> policy is in my hands.
>
> The tool which causes a per-file sync all on every close, or even after
> every write, is a performance killer. The faster our hardware, the less
> that may seem to matter (and, conversely, the less the risk as the
> ordinary kernel I/O flushing will catch up faster). But when the
> hardware slowness _is_ relevant, if I can't turn that off I have a
> needlessly unperformant task.
>
>
> The example which stands out in my own mind is when I was using firefox
> on a laptop with a spinning rust hard drive (and being a laptop
> hardware, a low power physically slow piece of spinning rust). There was
> once a setting to turn off the synchronous-write sqlite setting (used
> for history and bookmarks). That was _visibly obvious_ in the user
> experience. And I turned it off. As a matter of policy, those data
> didn't need such care.


> So I'm resistant to this kind of thing because IMO it leads to an
> attractive nuisance: over use of sync or fsync for everything. And it
> will usually not be exposed as policy the user can adjust/disable.
>
> My rule of thumb:
>
>     If it can't be turned off, it's not a feature. - Karl Heuer


Are you arguing that if something is a bad idea to overuse, even if
it's a good idea sometimes, then it shouldn't be allowed into Python,
because someone might write a program that abuses that feature, you
might end up with that program, and it would be irksome to deal with
it?

I'm not trying to present a straw man, but that is my genuine
impression of what you said. If I got it wrong, I apologize and please
help me understand what you meant.

> >Another argument is that even syncing your data doesn't mean that the
> >data is actually written to disk, since the hardware can lie. On the
> >other hand, I don't know what anyone can do, not even the kernel, in the
> >face of deceitful hardware.
>
> Aye.
>
> But in principle, after a sync() or fsync() the kernel at least believes
> that. Hardware which lies, or which claims saved data without having the
> rresources to guarrentee it (eg a small battery to complete the writes
> if there's a power out) is indeed nasty.
>
> >> You might be tempted to argue that this can be done very easily in
> >> Python already, so why include it in the standard io module?
>
> I would indeed. There _should_ be a small bar which at least causes the
> programmer to think "do I really need this here"? I suppose a
> "fsync=False" default parameter is a visible bar.


>
> [...]
> >I mean, the obvious way is:
> >
> >    try:
> >        with open(..., 'w') as f:
> >            f.write("stuff")
> >    finally:
> >        os.sync()
>
> An os.fsync(f.fileno()) is lower impact - os.sync() requests a sync of
> all filesystems.
>
> >so maybe all we really need is a "sync file" context manager.
>
> Aye. Fully agree here, and frankly think this is a "write your own"
> situation. Except, of course, that like all "write your own" one/few
> liners there will be suboptimal or buggy ones released. Such as the
> "overly wide sync" from your os.sync() above.
>
> Personally I'm -1 on this. A context manager while goes f.flush()
> os.fsync(f.fileno()) seems plenty, and easy to roll your own.

There are very smart people on this list who have already demonstrated
that there is more than one way to do it, and that it's not obvious.
So, it's not easy to roll your own correctly.

I love context managers when they're alone, but I dislike stacking
them. It is less clear how we can ensure the fsync happens exactly
between flush and close with a context manager than a keyword argument
to open. That is, if open is the only context manager, everything is
great. But if is up to users to stack context managers including open
and some fsync, I think correct ordering will be a problem.

Thank you for engaging on this topic.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LAVQVMO32URFSAPRGB63KKZCD4UL2IV3/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to