On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Mon, Jan 31, 2022 at 10:42:54AM +0530, Bharath Rupireddy wrote: > > After an off-list discussion with Andreas, proposing here a patch that > > basically replaces ReadDir call with ReadDirExtended and gets rid of > > lstat entirely. With this chance, the checkpoint will only care about > > the snapshot and mapping files and not fail if it finds other files in > > the directories. Removing lstat enables us to make things faster as we > > avoid a bunch of extra system calls - one lstat call per each mapping > > or snapshot file. > > I think removing the lstat() is probably reasonable. We currently aren't > doing proper error checking, and the chances of a non-regular file matching > the prefix are likely pretty low. In the worst case, we'll LOG or ERROR > when unlinking or fsyncing fails. > > However, I'm not sure about the change to ReadDirExtended(). That might be > okay for CheckPointSnapBuild(), which is just trying to remove old files, > but CheckPointLogicalRewriteHeap() is responsible for ensuring that files > are flushed to disk for the checkpoint. If we stop reading the directory > after an error and let the checkpoint continue, isn't it possible that some > mappings files won't be persisted to disk?
Unless I mis-read your above statement, with LOG level in ReadDirExtended, I don't think we stop reading the files in CheckPointLogicalRewriteHeap. Am I missing something here? Since, we also continue in CheckPointLogicalRewriteHeap if we can't parse/delete some files with the change of "could not parse filename"/"could not remove file" messages to LOG level I'm attaching v6, just changed elog(LOG, to ereport(LOG in CheckPointLogicalRewriteHeap, other things remain the same. Regards, Bharath Rupireddy.
v6-0001-Replace-ReadDir-with-ReadDirExtended.patch
Description: Binary data