Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-09-03 Thread Michael Paquier
On Sat, Sep 03, 2022 at 10:47:32AM +0500, Ibrar Ahmed wrote: > Your patch require a rebase. Please provide the latest rebase patch. I was looking for a CF entry yesterday when I looked at this patch, missing https://commitfest.postgresql.org/39/3496/. This has been applied as of bfb9dfd. --

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-09-02 Thread Ibrar Ahmed
On Sat, Sep 3, 2022 at 3:09 AM Nathan Bossart wrote: > On Fri, Sep 02, 2022 at 05:09:14PM +0900, Michael Paquier wrote: > > I had a few hours and I've spent them looking at what you had here in > > details, and there were a few things I have tweaked before applying > > the patch. > > Thanks! > >

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-09-02 Thread Nathan Bossart
On Fri, Sep 02, 2022 at 05:09:14PM +0900, Michael Paquier wrote: > I had a few hours and I've spent them looking at what you had here in > details, and there were a few things I have tweaked before applying > the patch. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-09-02 Thread Michael Paquier
On Wed, Aug 31, 2022 at 04:10:59PM +0900, Michael Paquier wrote: > Yeah, what you have here looks pretty fine to me, so this is IMO > committable. Let's wait a bit to see if there are any objections from > the others, in case. I had a few hours and I've spent them looking at what you had here in

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-31 Thread Andres Freund
Hi, On 2022-08-31 12:24:28 -0700, Nathan Bossart wrote: > On Wed, Aug 31, 2022 at 12:14:33PM -0700, Andres Freund wrote: > > On 2022-08-27 14:06:32 +0530, Bharath Rupireddy wrote: > > It continues to make no sense to me to add behaviour changes around > > error-handling as part of a conversion to

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-31 Thread Nathan Bossart
On Wed, Aug 31, 2022 at 12:14:33PM -0700, Andres Freund wrote: > On 2022-08-27 14:06:32 +0530, Bharath Rupireddy wrote: >> -if (lstat(fromfile, ) < 0) >> -ereport(ERROR, >> -(errcode_for_file_access(), >> -

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-31 Thread Andres Freund
Hi, On 2022-08-27 14:06:32 +0530, Bharath Rupireddy wrote: > @@ -50,7 +51,7 @@ copydir(char *fromdir, char *todir, bool recurse) > > while ((xlde = ReadDir(xldir, fromdir)) != NULL) > { > - struct stat fst; > + PGFileType xlde_type; > >

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-31 Thread Michael Paquier
On Sun, Aug 28, 2022 at 02:52:43PM -0700, Nathan Bossart wrote: > On Sat, Aug 27, 2022 at 02:06:32PM +0530, Bharath Rupireddy wrote: >> PSA v17 patch with reorderbuffer.c changes reverted. > > LGTM Yeah, what you have here looks pretty fine to me, so this is IMO committable. Let's wait a bit to

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-28 Thread Nathan Bossart
On Sat, Aug 27, 2022 at 02:06:32PM +0530, Bharath Rupireddy wrote: > PSA v17 patch with reorderbuffer.c changes reverted. LGTM -- Nathan Bossart Amazon Web Services: https://aws.amazon.com

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-27 Thread Bharath Rupireddy
On Thu, Aug 25, 2022 at 5:51 PM Michael Paquier wrote: > > FWIW, the changes in reorderbuffer.c for > ReorderBufferCleanupSerializedTXNs() reduce the code readability, in > my opinion, so that's one less argument in favor of this change. Agreed. I reverted the changes. > The gain in

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-25 Thread Michael Paquier
On Thu, Aug 25, 2022 at 10:48:08AM +0530, Bharath Rupireddy wrote: > On Tue, Aug 23, 2022 at 11:37 PM Nathan Bossart > wrote: >> IIUC an error in get_dirent_type() could cause slots to be skipped here, >> which is a behavior change. > > That behaviour hasn't changed, no? Currently, if lstat()

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-24 Thread Bharath Rupireddy
On Tue, Aug 23, 2022 at 11:37 PM Nathan Bossart wrote: > > On Wed, Aug 17, 2022 at 12:39:26PM +0530, Bharath Rupireddy wrote: > > + /* > > + * We're only handling directories here, skip if it's not ours. Also, > > skip > > + * if the caller has already performed this check. > > +

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-23 Thread Nathan Bossart
On Wed, Aug 17, 2022 at 12:39:26PM +0530, Bharath Rupireddy wrote: > + /* > + * We're only handling directories here, skip if it's not ours. Also, > skip > + * if the caller has already performed this check. > + */ > + if (!slot_dir_checked && > + lstat(path, )

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-17 Thread Bharath Rupireddy
On Wed, Aug 17, 2022 at 2:02 AM Nathan Bossart wrote: > > On Wed, Aug 10, 2022 at 03:28:25PM +0530, Bharath Rupireddy wrote: > > snprintf(path, sizeof(path), "pg_logical/mappings/%s", > > mapping_de->d_name); > > - if (lstat(path, ) == 0 && !S_ISREG(statbuf.st_mode)) >

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-16 Thread Nathan Bossart
On Wed, Aug 10, 2022 at 03:28:25PM +0530, Bharath Rupireddy wrote: > snprintf(path, sizeof(path), "pg_logical/mappings/%s", > mapping_de->d_name); > - if (lstat(path, ) == 0 && !S_ISREG(statbuf.st_mode)) > + if (get_dirent_type(path, mapping_de, false, LOG)

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-10 Thread Bharath Rupireddy
On Wed, Aug 10, 2022 at 2:11 PM Thomas Munro wrote: > > On Wed, Aug 10, 2022 at 7:15 PM Bharath Rupireddy > wrote: > > The main idea of using get_dirent_type() instead of lstat or stat is > > to benefit from avoiding system calls on platforms where the directory > > entry type is stored in

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-10 Thread Thomas Munro
On Wed, Aug 10, 2022 at 7:15 PM Bharath Rupireddy wrote: > The main idea of using get_dirent_type() instead of lstat or stat is > to benefit from avoiding system calls on platforms where the directory > entry type is stored in dirent structure, but not to cause errors for > lstat or stat system

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-10 Thread Bharath Rupireddy
On Tue, Aug 9, 2022 at 10:57 AM Thomas Munro wrote: > > OK, so there aren't many places in 0001 that error out where we > previously did not. Well, that's not true I believe. The 0001 patch introduces get_dirent_type() with elevel ERROR which means that lstat failures are now reported as ERRORs

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-09 Thread Nathan Bossart
On Tue, Aug 09, 2022 at 05:26:39PM +1200, Thomas Munro wrote: > The changes were not from elog-LOG to elog-ERROR, they were changes > from silently eating errors, but I take your (plural) general point. I think this was in reference to 0002. My comment was, at least. > OK, so there aren't many

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-08 Thread Thomas Munro
On Tue, Aug 9, 2022 at 4:28 PM Nathan Bossart wrote: > On Tue, Aug 09, 2022 at 09:29:16AM +0530, Bharath Rupireddy wrote: > > On Tue, Aug 9, 2022 at 9:20 AM Tom Lane wrote: > >> Hmmm ... I'll grant that ignoring lstat errors altogether isn't great. > >> But should the replacement behavior be

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-08 Thread Nathan Bossart
On Tue, Aug 09, 2022 at 09:29:16AM +0530, Bharath Rupireddy wrote: > On Tue, Aug 9, 2022 at 9:20 AM Tom Lane wrote: >> Hmmm ... I'll grant that ignoring lstat errors altogether isn't great. >> But should the replacement behavior be elog-LOG-and-press-on, >> or

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-08 Thread Andres Freund
On 2022-08-09 15:10:41 +1200, Thomas Munro wrote: > I was going to commit the first of these patches, but then I noticed Andres > said he was planning to, so I'll wait another day. I'd be happy for you to take this on...

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-08 Thread Bharath Rupireddy
On Tue, Aug 9, 2022 at 9:20 AM Tom Lane wrote: > > Thomas Munro writes: > > On Tue, Aug 9, 2022 at 3:27 PM Tom Lane wrote: > >> I have not tried to analyze the error-handling properties of 0001, > >> but if it's being equally cavalier then it shouldn't be committed > >> either. > > > 0001 does

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-08 Thread Tom Lane
Thomas Munro writes: > On Tue, Aug 9, 2022 at 3:27 PM Tom Lane wrote: >> I have not tried to analyze the error-handling properties of 0001, >> but if it's being equally cavalier then it shouldn't be committed >> either. > 0001 does introduce new errors, as mentioned in the commit message, in >

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-08 Thread Thomas Munro
On Tue, Aug 9, 2022 at 3:27 PM Tom Lane wrote: > Actually, having now read the patch, I don't think there is any > part of 0002 that is a good idea. It's blithely removing the > comments that explain why the existing coding is the way it is, > and not providing a shred of justification for

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-08 Thread Tom Lane
I wrote: > Thomas Munro writes: >> The only hunk I'm having second thoughts about is the following, which >> makes unexpected stray files break checkpoints: > Sounds like a pretty bad idea. What's the upside? Actually, having now read the patch, I don't think there is any part of 0002 that is

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-08 Thread Tom Lane
Thomas Munro writes: > The only hunk I'm having second thoughts about is the following, which > makes unexpected stray files break checkpoints: Sounds like a pretty bad idea. What's the upside? regards, tom lane

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-08 Thread Thomas Munro
Working on get_dirent_type() reminded me of this thread. I was going to commit the first of these patches, but then I noticed Andres said he was planning to, so I'll wait another day. Here they are, with commit messages but otherwise unchanged from Nathan's v12 except for a slight comment tweak:

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-07-18 Thread Nathan Bossart
On Mon, Jul 18, 2022 at 04:53:18PM +0530, Bharath Rupireddy wrote: > Just wondering - do we ever have a problem if we can't remove the > snapshot or mapping file? Besides running out of disk space, there appears to be a transaction ID wraparound risk with the mappings files. -- Nathan Bossart

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-07-18 Thread Bharath Rupireddy
On Fri, Jul 8, 2022 at 10:44 PM Nathan Bossart wrote: > > > 0002 - I'm not quite happy with this patch, with the change, > > checkpoint errors out, if it can't remove just a file - the comments > > there says it all. Is there any strong reason for this change? > > Andres noted several concerns

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-07-11 Thread Nathan Bossart
On Mon, Jul 11, 2022 at 01:25:33PM -0700, Andres Freund wrote: > On 2022-04-08 13:18:57 -0700, Nathan Bossart wrote: >> @@ -1035,32 +1036,9 @@ ParseConfigDirectory(const char *includedir, >> >> join_path_components(filename, directory, de->d_name); >>

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-07-11 Thread Andres Freund
Hi, On 2022-04-08 13:18:57 -0700, Nathan Bossart wrote: > @@ -1035,32 +1036,9 @@ ParseConfigDirectory(const char *includedir, > > join_path_components(filename, directory, de->d_name); > canonicalize_path(filename); > - if (stat(filename, ) == 0) > +

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-07-08 Thread Nathan Bossart
On Fri, Jul 08, 2022 at 09:39:10PM +0530, Bharath Rupireddy wrote: > 0001 - there are many places where lstat/stat is being used - don't we > need to replace all or most of them with get_dirent_type? It's been a while since I wrote this one, but I believe my intent was to replace as many

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-07-08 Thread Bharath Rupireddy
On Sat, Apr 9, 2022 at 1:49 AM Nathan Bossart wrote: > > On Wed, Mar 30, 2022 at 09:21:30AM -0700, Nathan Bossart wrote: > > Here is an updated patch set. > > rebased Thanks. 0001 - there are many places where lstat/stat is being used - don't we need to replace all or most of them with

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-04-08 Thread Nathan Bossart
On Wed, Mar 30, 2022 at 09:21:30AM -0700, Nathan Bossart wrote: > Here is an updated patch set. rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From a92ccf47c9c334eea5b8d07b8fcab7031181c37e Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 15 Feb 2022 09:40:53

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-03-30 Thread Nathan Bossart
On Tue, Mar 29, 2022 at 03:48:32PM -0700, Nathan Bossart wrote: > On Thu, Mar 24, 2022 at 01:17:01PM +1300, Thomas Munro wrote: >> /* we're only handling directories here, skip if it's not ours */ >> -if (lstat(path, ) == 0 && !S_ISDIR(statbuf.st_mode)) >> +if (lstat(path, ) != 0) >>

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-03-29 Thread Nathan Bossart
Thanks for taking a look! On Thu, Mar 24, 2022 at 01:17:01PM +1300, Thomas Munro wrote: > /* we're only handling directories here, skip if it's not ours */ > -if (lstat(path, ) == 0 && !S_ISDIR(statbuf.st_mode)) > +if (lstat(path, ) != 0) > +ereport(ERROR, > +

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-03-23 Thread Thomas Munro
On Wed, Feb 16, 2022 at 12:11 PM Nathan Bossart wrote: > On Tue, Feb 15, 2022 at 10:37:58AM -0800, Nathan Bossart wrote: > > On Tue, Feb 15, 2022 at 10:10:34AM -0800, Andres Freund wrote: > >> I generally think it'd be a good exercise to go through an use > >> get_dirent_type() in nearly all

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-15 Thread Nathan Bossart
On Tue, Feb 15, 2022 at 10:37:58AM -0800, Nathan Bossart wrote: > On Tue, Feb 15, 2022 at 10:10:34AM -0800, Andres Freund wrote: >> I generally think it'd be a good exercise to go through an use >> get_dirent_type() in nearly all ReadDir() style loops - it's a nice >> efficiency >> win in

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-15 Thread Nathan Bossart
On Tue, Feb 15, 2022 at 10:10:34AM -0800, Andres Freund wrote: > I generally think it'd be a good exercise to go through an use > get_dirent_type() in nearly all ReadDir() style loops - it's a nice efficiency > win in general, and IIRC a particularly big one on windows. > > It'd probably be good

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-15 Thread Andres Freund
Hi, On 2022-02-15 09:57:53 -0800, Nathan Bossart wrote: > IIUC you are advocating for something more like the attached patches. At least for patch 1 yes. Don't have the cycles just now to look at the others. I generally think it'd be a good exercise to go through an use get_dirent_type() in

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-15 Thread Nathan Bossart
On Tue, Feb 15, 2022 at 09:09:52AM -0800, Andres Freund wrote: > On 2022-02-10 21:30:45 +0530, Bharath Rupireddy wrote: >> Replace ReadDir with ReadDirExtended (in CheckPointSnapBuild) and >> get rid of lstat entirely. > > I think this might be based on a slight misunderstanding / bad phrasing on

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-15 Thread Andres Freund
Hi, On 2022-02-10 21:30:45 +0530, Bharath Rupireddy wrote: > From 4801ff2c3b1e7bc7076205b676d4e3bc4a4ed308 Mon Sep 17 00:00:00 2001 > From: Bharath Rupireddy > Date: Thu, 10 Feb 2022 15:58:58 + > Subject: [PATCH v8] Replace ReadDir with ReadDirExtended > > Replace ReadDir with

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-10 Thread Nathan Bossart
On Thu, Feb 10, 2022 at 09:30:45PM +0530, Bharath Rupireddy wrote: > In any case, let's remove the editor's lock/state file from those > comments and have just only "We just log a message if a file doesn't > fit the pattern". Attached v8 patch with that change. I've moved this one to

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-10 Thread Bharath Rupireddy
On Fri, Feb 4, 2022 at 5:33 AM Nathan Bossart wrote: > > Thanks. I get it. For syncing map files, we don't want to tolerate any > > errors, whereas removal of the old map files (lesser than cutoff LSN) > > can be tolerated in CheckPointLogicalRewriteHeap. > > LGTM. Andres noted upthread [0] that

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-03 Thread Nathan Bossart
On Thu, Feb 03, 2022 at 09:45:08AM +0530, Bharath Rupireddy wrote: > On Thu, Feb 3, 2022 at 12:07 AM Nathan Bossart > wrote: >> If there is a problem reading the directory, we will LOG and then exit the >> loop. If we didn't scan through all the entries in the directory, there is >> a chance

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-02 Thread Bharath Rupireddy
On Thu, Feb 3, 2022 at 12:07 AM Nathan Bossart wrote: > > On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote: > > On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart > > wrote: > >> However, I'm not sure about the change to ReadDirExtended(). That might be > >> okay for

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-02 Thread Nathan Bossart
On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote: > On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart > wrote: >> However, I'm not sure about the change to ReadDirExtended(). That might be >> okay for CheckPointSnapBuild(), which is just trying to remove old files, >> but

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-02 Thread Bharath Rupireddy
On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart wrote: > > On Mon, Jan 31, 2022 at 10:42:54AM +0530, Bharath Rupireddy wrote: > > After an off-list discussion with Andreas, proposing here a patch that > > basically replaces ReadDir call with ReadDirExtended and gets rid of > > lstat entirely. With

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-01 Thread Nathan Bossart
On Mon, Jan 31, 2022 at 10:42:54AM +0530, Bharath Rupireddy wrote: > After an off-list discussion with Andreas, proposing here a patch that > basically replaces ReadDir call with ReadDirExtended and gets rid of > lstat entirely. With this chance, the checkpoint will only care about > the snapshot

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-30 Thread Bharath Rupireddy
On Thu, Jan 27, 2022 at 6:31 AM Nathan Bossart wrote: > > I spent some time thinking about the right way to proceed here, and I came > up with the attached patches. The first patch just adds error checking for > various lstat() calls in the replication code. If lstat() fails, then it > probably

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-26 Thread Nathan Bossart
On Fri, Jan 21, 2022 at 11:49:56AM -0800, Andres Freund wrote: > On 2022-01-20 20:41:16 +, Bossart, Nathan wrote: >> Here's this part. > > And pushed to all branches. Thanks. Thanks! I spent some time thinking about the right way to proceed here, and I came up with the attached patches.

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-21 Thread Andres Freund
On 2022-01-20 20:41:16 +, Bossart, Nathan wrote: > Here's this part. And pushed to all branches. Thanks.

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-20 Thread Bossart, Nathan
Thanks for your feedback. On 1/20/22, 11:47 AM, "Andres Freund" wrote: > It seems odd to change a bunch of things to not be errors anymore, but then > add new sources of errors. If we try to deal with concurrent deletions or > permission issues - otherwise what's the point of making unlink() not

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-20 Thread Andres Freund
Hi, On 2022-01-20 19:15:08 +, Bossart, Nathan wrote: > > * Why is it okay to ignore lstat failure? Seems like we might > > as well not even have the lstat. > > I added error checking for lstat(). It seems odd to change a bunch of things to not be errors anymore, but then add new sources of

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-19 Thread Tom Lane
"Bossart, Nathan" writes: > I think the other side of this is that we don't want checkpointing to > continually fail because of a noncritical failure. That could also > lead to problems down the road. Yeah, a persistent failure to complete checkpoints is very nasty. Your disk will soon fill

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-19 Thread Bossart, Nathan
On 1/19/22, 11:08 AM, "Andres Freund" wrote: > On 2022-01-19 13:34:21 -0500, Tom Lane wrote: >> As far as the patch itself goes, I agree that failure to unlink >> is noncritical, because such a file would have no further effect >> and we can just ignore it. > > I don't agree. We iterate through

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-19 Thread Andres Freund
Hi, On 2022-01-19 13:34:21 -0500, Tom Lane wrote: > Bharath Rupireddy writes: > > On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud wrote: > >> Maybe I'm missing something but wouldn't > >> https://commitfest.postgresql.org/36/3448/ better solve the problem? > > > The error can cause the new

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-19 Thread Tom Lane
I wrote: > Anyway, I think possibly we can drop the bottom half of the loop > (the part trying to fsync non-removed files) in favor of fsync'ing > the directory afterwards. Thoughts? Oh, scratch that --- *this* loop doesn't care about the file contents, but other code does. However, don't we

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-19 Thread Tom Lane
Bharath Rupireddy writes: > On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud wrote: >> Maybe I'm missing something but wouldn't >> https://commitfest.postgresql.org/36/3448/ better solve the problem? > The error can cause the new background process proposed there in that > thread to restart,

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-15 Thread Bharath Rupireddy
On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud wrote: > > Hi, > > On Sat, Jan 15, 2022 at 02:04:12PM +0530, Bharath Rupireddy wrote: > > > > We had an issue where there were many mapping files generated during > > the crash recovery and end-of-recovery checkpoint was taking a lot of > > time. We

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-15 Thread Julien Rouhaud
Hi, On Sat, Jan 15, 2022 at 02:04:12PM +0530, Bharath Rupireddy wrote: > > We had an issue where there were many mapping files generated during > the crash recovery and end-of-recovery checkpoint was taking a lot of > time. We had to manually intervene and delete some of the mapping > files

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-15 Thread Bharath Rupireddy
On Fri, Jan 14, 2022 at 1:08 AM Andres Freund wrote: > > Hi, > > On 2021-12-31 18:12:37 +0530, Bharath Rupireddy wrote: > > Currently the server is erroring out when unable to remove/parse a > > logical rewrite file in CheckPointLogicalRewriteHeap wasting the > > amount of work the checkpoint has

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-13 Thread Andres Freund
Hi, On 2021-12-31 18:12:37 +0530, Bharath Rupireddy wrote: > Currently the server is erroring out when unable to remove/parse a > logical rewrite file in CheckPointLogicalRewriteHeap wasting the > amount of work the checkpoint has done and preventing the checkpoint > from finishing. This seems

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-13 Thread Bossart, Nathan
On 1/12/22, 10:03 PM, "Bharath Rupireddy" wrote: > On Thu, Jan 13, 2022 at 3:47 AM Bossart, Nathan wrote: >> The only feedback I have for the patch is that I don't think the new >> comments are necessary. > > I borrowed the comments as-is from the CheckPointSnapBuild introduced > by the commit

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-12 Thread Bharath Rupireddy
On Thu, Jan 13, 2022 at 3:47 AM Bossart, Nathan wrote: > > On 12/31/21, 4:44 AM, "Bharath Rupireddy" > wrote: > > Currently the server is erroring out when unable to remove/parse a > > logical rewrite file in CheckPointLogicalRewriteHeap wasting the > > amount of work the checkpoint has done

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-12 Thread Bossart, Nathan
On 12/31/21, 4:44 AM, "Bharath Rupireddy" wrote: > Currently the server is erroring out when unable to remove/parse a > logical rewrite file in CheckPointLogicalRewriteHeap wasting the > amount of work the checkpoint has done and preventing the checkpoint > from finishing. This is unlike

Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2021-12-31 Thread Bharath Rupireddy
Hi, Currently the server is erroring out when unable to remove/parse a logical rewrite file in CheckPointLogicalRewriteHeap wasting the amount of work the checkpoint has done and preventing the checkpoint from finishing. This is unlike CheckPointSnapBuild does for snapshot files i.e. it just