At Wed, 19 Oct 2022 10:21:12 +0530, Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote in > On Wed, Oct 19, 2022 at 8:58 AM Kyotaro Horiguchi > <horikyota....@gmail.com> wrote: > > An archive module could clean up them at startup or at the first call > > but that might be dangerous (since archive directory is I think > > thought as external resource). > > The archive module must be responsible for cleaning up the temp file > that it creates. One way to do it is in the archive module's shutdown > callback, this covers most of the cases, but not all.
True. But I agree to Robert that such temporary files should be cleanup-able without needing temporarily-valid knowledge (current file name, in this case). A common strategy for this is to name those files by names that can be identifed as garbage. > > Honestly, I'm not sure about a reasonable scenario where simultaneous > > archivings of a same file is acceptable, though. I feel that we should > > not allow simultaneous archiving of the same segment by some kind of > > interlocking. In other words, we might should serialize duplicate > > archiving of asame file. > > In a typical production environment, there's some kind of locking (for > instance lease files) that allow/disallow file > creation/deletion/writes/reads which guarantees that the same file > isn't put into a directory (can be archive location) many times. And > as you rightly said archive_directory is something external to > postgres and we really can't deal with concurrent writers > writing/creating the same files. Even if we somehow try to do it, it > makes things complicated. This is true for any PGDATA directories. > However, the archive module implementers can choose to define such a > locking strategy. flock() on nfs.. > > In another direction, the current code allows duplicate simultaneous > > copying to temporary files with different names then the latest > > renaming wins. We reach the almost same result (on Linuxen (or > > POSIX?)) by unlinking the existing tempfile first then create a new > > one with the same name then continue. Even if the tempfile were left > > alone after a crash, that file would be unlinked at the next trial of > > archiving. But I'm not sure how this can be done on Windows.. In the > > first place I'm not sure that the latest-unconditionally-wins policy > > is appropriate or not, though. > > We really can't just unlink the temp file because it has pid and > timestamp in the filename and it's hard to determine the temp file > that we created earlier. But since power cut is a typical crash source, we need to identify ruined temporary files and the current naming convention is incomplete in this regard. The worst case I can come up with regardless of feasibility is a multi-standby physical replication set where all hosts share one archive directory. Indeed live and dead temprary files can coexist there. However, I think we can identify truly rotten temp files by inserting host name or cluster name (means cluster_name in postgresql.conf) even in that case. This premise that DBA names every cluster differently, but I think DBAs that is going to configure such a system are required to be very cautious about that kind of aspect. > As far as the basic_archive module is concerned, we ought to keep it > simple. I still think the simplest we can do is to use the > basic_archive's shutdown_cb to delete (in most of the cases, but not (Sorry, my memory was confused at the time. That callback feature already existed.) > all) the left-over temp file that the module is dealing with > as-of-the-moment and add a note about the users dealing with > concurrent writers to the basic_archive.archive_directory like the > attached v2 patch. > > > > A simple server restart while the basic_archive module is copying > > > to/from temp file would leave the file behind, see[1]. I think we can > > > fix this by defining shutdown_cb for the basic_archive module, like > > > the attached patch. While this helps in most of the crashes, but not > > > all. However, this is better than what we have right now. > > > > ShutdownWalRecovery() does the similar thing, but as you say this one > > covers rather narrow cases than that since RestoreArchiveFile() > > finally overwrites the left-alone files at the next call for that > > file. > > We're using unique temp file names in the basic_archive module so we > can't overwrite the same upon restart. Of course, it premised that a cluster uses the same name for a segment. If we insert cluseter_name into the temprary name, a starting cluster can indentify garbage files to clean up. For example if we name them as follows. ARCHTEMP_cluster1_pid_time_<lsn> A starting cluster can clean up all files starts with "archtemp_cluster1_*". (We need to choose the delimiter carefully, though..) > > # The patch seems forgetting to clear the tmepfilepath *after* a > > # successful renaming. > > It does so at the beginning of basic_archive_file() which is sufficient. No. I didn't mean that, If server stops after a successfull durable_rename but before the next call to basic_archive_file_internal, that call back makes false comlaint since that temprary file is actually gone. > > And I don't see how the callback is called. > > call_archive_module_shutdown_callback()->basic_archive_shutdown(). Yeah, sorry for the noise. > Please see the attached v2 patch. +static char tempfilepath[MAXPGPATH + 256]; MAXPGPATH is the maximum length of a file name that PG assumes to be able to handle. Thus extending that length seems wrong. -- Kyotaro Horiguchi NTT Open Source Software Center