On 25Dec2020 09:29, Steven D'Aprano <st...@pearwood.info> wrote: >On Thu, Dec 24, 2020 at 12:15:08PM -0500, Michael A. Smith wrote: > >> With all the buffering that modern disks and filesystems do, a >> specific question has come up a few times with respect to whether or >> not data was actually written after flush. I think it would be pretty >> useful for the standard library to have a variant in the io module >> that would explicitly fsync on close. > >One argument against this idea is that "disks and file systems buffer >for a reason, you should trust them, explicitly calling sync after every >written file is just going to slow I/O down". > >Personally I don't believe this argument, I've been bitten many, many >times until I learned to explicitly sync files, but its an argument you >should counter.
By contrast, I support this argument. The _vast_ majority of things don't need to sync their data all the way to the hardware base substrate (eg magnetic fields on spinning rust). And on the whole, if I do care, I issue a single sync() call at the end of a large task (typically interactively, at a prompt!) rather than forcing a heap of performance impairing stutters all the way through some process because many per-file syncs force that. IMO, per-file syncs fall into the "policy" arena: aside from low level tools (example: fdisk, a disc partition editor), to my mind the purpose of the kernel is to accept responsibility for my data when I hand it off. Perhaps for you that isn't enough; for me it normally is. And when it isn't, I'll take steps myself, _outside_ the programme, to ensure the sync or commit or off site backup is complete when it matters. Thus the policy is in my hands. The tool which causes a per-file sync all on every close, or even after every write, is a performance killer. The faster our hardware, the less that may seem to matter (and, conversely, the less the risk as the ordinary kernel I/O flushing will catch up faster). But when the hardware slowness _is_ relevant, if I can't turn that off I have a needlessly unperformant task. The example which stands out in my own mind is when I was using firefox on a laptop with a spinning rust hard drive (and being a laptop hardware, a low power physically slow piece of spinning rust). There was once a setting to turn off the synchronous-write sqlite setting (used for history and bookmarks). That was _visibly obvious_ in the user experience. And I turned it off. As a matter of policy, those data didn't need such care. So I'm resistant to this kind of thing because IMO it leads to an attractive nuisance: over use of sync or fsync for everything. And it will usually not be exposed as policy the user can adjust/disable. My rule of thumb: If it can't be turned off, it's not a feature. - Karl Heuer >Another argument is that even syncing your data doesn't mean that the >data is actually written to disk, since the hardware can lie. On the >other hand, I don't know what anyone can do, not even the kernel, in the >face of deceitful hardware. Aye. But in principle, after a sync() or fsync() the kernel at least believes that. Hardware which lies, or which claims saved data without having the rresources to guarrentee it (eg a small battery to complete the writes if there's a power out) is indeed nasty. >> You might be tempted to argue that this can be done very easily in >> Python already, so why include it in the standard io module? I would indeed. There _should_ be a small bar which at least causes the programmer to think "do I really need this here"? I suppose a "fsync=False" default parameter is a visible bar. [...] >I mean, the obvious way is: > > try: > with open(..., 'w') as f: > f.write("stuff") > finally: > os.sync() An os.fsync(f.fileno()) is lower impact - os.sync() requests a sync of all filesystems. >so maybe all we really need is a "sync file" context manager. Aye. Fully agree here, and frankly think this is a "write your own" situation. Except, of course, that like all "write your own" one/few liners there will be suboptimal or buggy ones released. Such as the "overly wide sync" from your os.sync() above. Personally I'm -1 on this. A context manager while goes f.flush() os.fsync(f.fileno()) seems plenty, and easy to roll your own. Cheers, Cameron Simpson <c...@cskk.id.au> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QSDOU4NA2YIZSOKM6OJKCBSEVMMXMRVZ/ Code of Conduct: http://python.org/psf/codeofconduct/