Re: New committers: Melanie Plageman, Richard Guo

2024-04-26 Thread Amul Sul
Many congratulations both of you..!!

Regards,
Amul

On Friday 26 April 2024, Jonathan S. Katz  wrote:

> The Core Team would like to extend our congratulations to Melanie Plageman
> and Richard Guo, who have accepted invitations to become our newest
> PostgreSQL committers.
>
> Please join us in wishing them much success and few reverts!
>
> Thanks,
>
> Jonathan
>


-- 
Regards,
Amul Sul
EDB: http://www.enterprisedb.com


Re: Add bump memory context type and use it for tuplesorts

2024-04-16 Thread Amul Sul
On Tue, Apr 16, 2024 at 3:44 PM David Rowley  wrote:

> On Tue, 16 Apr 2024 at 17:13, Amul Sul  wrote:
> > Attached is a small patch adding the missing BumpContext description to
> the
> > README.
>
> Thanks for noticing and working on the patch.
>
> There were a few things that were not quite accurate or are misleading:
>
> 1.
>
> > +These three memory contexts aim to free memory back to the operating
> system
>
> That's not true for bump. It's the worst of the 4.  Worse than aset.
> It only returns memory when the context is reset/deleted.
>
> 2.
>
> "These memory contexts were initially developed for ReorderBuffer, but
> may be useful elsewhere as long as the allocation patterns match."
>
> The above isn't true for bump.  It was written for tuplesort.  I think
> we can just remove that part now.  Slab and generation are both old
> enough not to care why they were conceived.
>
> Also since adding bump, I think the choice of which memory context to
> use is about 33% harder than it used to be when there were only 3
> context types.  I think this warrants giving more detail on what these
> 3 special-purpose memory allocators are good for.  I've added more
> details in the attached patch.  This includes more details about
> freeing malloc'd blocks
>
> I've tried to detail out enough of the specialities of the context
> type without going into extensive detail. My hope is that there will
> be enough detail for someone to choose the most suitable looking one
> and head over to the corresponding .c file to find out more.
>
> Is that about the right level of detail?
>

Yes, it looks much better now, thank you.

Regards,
Amul


Re: Add bump memory context type and use it for tuplesorts

2024-04-15 Thread Amul Sul
Attached is a small patch adding the missing BumpContext description to the
README.

Regards,
Amul


0001-Add-BumpContext-description-to-mmgr-README.patch
Description: Binary data


Re: Add system identifier to backup manifest

2024-03-14 Thread Amul Sul
On Thu, Mar 14, 2024 at 12:48 AM Robert Haas  wrote:

> On Fri, Mar 8, 2024 at 12:14 AM Amul Sul  wrote:
> > Thank you for the improvement.
> >
> > The caller of verify_control_file() has the full path of the control
> file that
> > can pass it and avoid recomputing. With this change, it doesn't really
> need
> > verifier_context argument -- only the manifest's system identifier is
> enough
> > along with the control file path.  Did the same in the attached delta
> patch
> > for v11-0002 patch, please have a look, thanks.
>
> Those seem like sensible changes. I incorporated them and committed. I
> also:
>
> * ran pgindent, which changed a bit of your formatting
> * changed some BAIL_OUT calls to die; I think we should hardly ever be
> using BAIL_OUT, as that terminates the entire TAP test run, not just
> the current file
>

Thank you, Robert.

Regards,
Amul


Re: Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).

2024-03-07 Thread Amul Sul
On Thu, Mar 7, 2024 at 11:02 PM Alvaro Herrera 
wrote:

> On 2024-Mar-07, Alvaro Herrera wrote:
>
> > Maybe we can add a flag RelationData->rd_ispkdeferred, so that
> > RelationGetPrimaryKeyIndex returned InvalidOid for deferrable PKs; then
> > logical replication would continue to not know about this PK, which
> > perhaps is what we want.  I'll do some testing with this.
>
> This seems to work okay.
>

Thank you for working on this, the patch works nicely.

Regards,
Amul


Re: Add system identifier to backup manifest

2024-03-07 Thread Amul Sul
On Fri, Mar 8, 2024 at 1:22 AM Robert Haas  wrote:

> On Thu, Mar 7, 2024 at 9:16 AM Robert Haas  wrote:
> > It could. I just thought this was clearer. I agree that it's a bit
> > long, but I don't think this is worth bikeshedding very much. If at a
> > later time somebody feels strongly that it needs to be changed, so be
> > it. Right now, getting on with the business at hand is more important,
> > IMHO.
>
> Here's a new version of the patch set, rebased over my version of 0001
> and with various other corrections:
>
> * Tidy up grammar in documentation.
> * In manifest_process_version, the test checked whether the manifest
> version == 1, but the comment talked about versions >= 2. Make the
> comment match the code.
> * In load_backup_manifest, avoid changing the existing placement of a
> variable declaration.
> * Rename verify_system_identifier to verify_control_file because if we
> were verifying multiple things about the control file we'd still want
> to only read it one.
> * Tweak the coding of verify_backup_file and verify_control_file to
> avoid repeated path construction.
> * Remove saw_system_identifier_field. This looks like it's trying to
> enforce a rule that the system identifier must immediately follow the
> version, but we don't insist on anything like that for files or wal
> ranges, so there seems to be no reason to do it here.
> * Remove bogus "unrecognized top-level field" test from
> 005_bad_manifest.pl. The JSON included here doesn't include any
> unrecognized top-level field, so the fact that we were getting that
> error message was wrong. After removing saw_system_identifier_field,
> we no longer get the wrong error message any more, so the test started
> failing.
> * Remove "expected system identifier" test from 005_bad_manifest.pl.
> This was basically a test that saw_system_identifier_field was
> working.
> * Header comment adjustment for
> json_manifest_finalize_system_identifier. The last sentence was
> cut-and-pasted from somewhere that it made sense to here, where it
> doesn't. There's only ever one system identifier.
>
>
Thank you for the improvement.

The caller of verify_control_file() has the full path of the control file
that
can pass it and avoid recomputing. With this change, it doesn't really need
verifier_context argument -- only the manifest's system identifier is enough
along with the control file path.  Did the same in the attached delta patch
for v11-0002 patch, please have a look, thanks.

Regards,
Amul


v11-0002-delta.patch
Description: Binary data


Re: Add system identifier to backup manifest

2024-03-06 Thread Amul Sul
On Thu, Mar 7, 2024 at 9:37 AM Michael Paquier  wrote:

> On Wed, Mar 06, 2024 at 11:05:36AM -0500, Robert Haas wrote:
> > So with that in mind, here's my proposal. This is an adjustment of
> > Amit's previous refactoring patch. He renamed the existing
> > get_controlfile() to get_dir_controlfile() and made a new
> > get_controlfile() that accepted the path to the control file itself. I
> > chose to instead leave the existing get_controlfile() alone and add a
> > new get_controlfile_by_exact_path(). I think this is better, because
> > most of the existing callers find it more convenient to pass the path
> > to the data directory rather than the path to the controlfile, so the
> > patch is smaller this way, and less prone to cause headaches for
> > people back-patching or maintaining out-of-core code. But it still
> > gives us a way to avoid repeatedly constructing the same pathname.
>
> Yes, that was my primary concern with the previous versions of the
> patch.
>
> > If nobody objects, I plan to commit this version.
>
> You are not changing silently the internals of get_controlfile(), so
> no objections here.  The name of the new routine could be shorter, but
> being short of ideas what you are proposing looks fine by me.
>

Could be get_controlfile_by_path() ?

Regards,
Amul


Re: Add system identifier to backup manifest

2024-03-03 Thread Amul Sul
On Fri, Mar 1, 2024 at 11:28 AM Michael Paquier  wrote:

> On Mon, Feb 19, 2024 at 12:06:19PM +0530, Amul Sul wrote:
> > Agreed, now they will have an error as _could not read file "":
> Is
> > a directory_. But, IIUC, that what usually happens with the dev version,
> and
> > the extension needs to be updated for compatibility with the newer
> version for
> > the same reason.
>
> I was reading through the remaining pieces of the patch set, and are
> you sure that there is a need for 0002 at all?  The only reason why
> get_dir_controlfile() is introduced is to be able to get the contents
> of a control file with a full path to it, and not a data folder.  Now,
> if I look closely, with 0002~0004 applied, the only two callers of
> get_controlfile() are pg_combinebackup.c and pg_verifybackup.c.  Both
> of them have an access to the backup directories, which point to the
> root of the data folder.  pg_combinebackup can continue to use
> backup_dirs[i].  pg_verifybackup has an access to the backup directory
> in the context data, if I'm reading this right, so you could just use
> that in verify_system_identifier().
>

Yes, you are correct. Both the current caller of get_controlfile() has
access to the root directory.

I have dropped the 0002 patch -- I don't have a very strong opinion to
refactor
get_controlfile() apart from saying that it might be good to have both
versions :) .

Regards,
Amul
From 6c54ab995d08d1843acaad71b2f8161019c72295 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Mon, 22 Jan 2024 10:45:19 +0530
Subject: [PATCH v9 1/2] pg_verifybackup: code refactor

Rename parser_context to manifest_data and make parse_manifest_file
return that instead of out-argument.
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 75 ++-
 1 file changed, 34 insertions(+), 41 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index ae8c18f3737..8561678a7d0 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -94,31 +94,28 @@ typedef struct manifest_wal_range
 } manifest_wal_range;
 
 /*
- * Details we need in callbacks that occur while parsing a backup manifest.
+ * All the data parsed from a backup_manifest file.
  */
-typedef struct parser_context
+typedef struct manifest_data
 {
-	manifest_files_hash *ht;
+	manifest_files_hash *files;
 	manifest_wal_range *first_wal_range;
 	manifest_wal_range *last_wal_range;
-} parser_context;
+} manifest_data;
 
 /*
  * All of the context information we need while checking a backup manifest.
  */
 typedef struct verifier_context
 {
-	manifest_files_hash *ht;
+	manifest_data *manifest;
 	char	   *backup_directory;
 	SimpleStringList ignore_list;
 	bool		exit_on_error;
 	bool		saw_any_error;
 } verifier_context;
 
-static void parse_manifest_file(char *manifest_path,
-manifest_files_hash **ht_p,
-manifest_wal_range **first_wal_range_p);
-
+static manifest_data *parse_manifest_file(char *manifest_path);
 static void verifybackup_per_file_cb(JsonManifestParseContext *context,
 	 char *pathname, size_t size,
 	 pg_checksum_type checksum_type,
@@ -142,8 +139,7 @@ static void verify_file_checksum(verifier_context *context,
  manifest_file *m, char *fullpath);
 static void parse_required_wal(verifier_context *context,
 			   char *pg_waldump_path,
-			   char *wal_directory,
-			   manifest_wal_range *first_wal_range);
+			   char *wal_directory);
 
 static void report_backup_error(verifier_context *context,
 const char *pg_restrict fmt,...)
@@ -185,7 +181,6 @@ main(int argc, char **argv)
 
 	int			c;
 	verifier_context context;
-	manifest_wal_range *first_wal_range;
 	char	   *manifest_path = NULL;
 	bool		no_parse_wal = false;
 	bool		quiet = false;
@@ -338,7 +333,7 @@ main(int argc, char **argv)
 	 * the manifest as fatal; there doesn't seem to be much point in trying to
 	 * verify the backup directory against a corrupted manifest.
 	 */
-	parse_manifest_file(manifest_path, , _wal_range);
+	context.manifest = parse_manifest_file(manifest_path);
 
 	/*
 	 * Now scan the files in the backup directory. At this stage, we verify
@@ -367,8 +362,7 @@ main(int argc, char **argv)
 	 * not to do so.
 	 */
 	if (!no_parse_wal)
-		parse_required_wal(, pg_waldump_path,
-		   wal_directory, first_wal_range);
+		parse_required_wal(, pg_waldump_path, wal_directory);
 
 	/*
 	 * If everything looks OK, tell the user this, unless we were asked to
@@ -385,9 +379,8 @@ main(int argc, char **argv)
  * all the files it mentions, and a linked list of all the WAL ranges it
  * mentions.
  */
-static void
-parse_manifest_file(char *manifest_path, manifest_files_hash **ht_p,
-	manifest_wal_range **first_wal_range_p)
+static manifest_data *
+parse_manifest_file(char *manifest_path)
 {
 	int			fd;
 	struct stat statbuf;
@@

Re: Add system identifier to backup manifest

2024-03-03 Thread Amul Sul
On Wed, Feb 21, 2024 at 10:01 AM Michael Paquier 
wrote:

> On Mon, Feb 19, 2024 at 12:06:19PM +0530, Amul Sul wrote:
> > On Mon, Feb 19, 2024 at 4:22 AM Michael Paquier 
> wrote:
> >> And the new option should be documented at the top of the init()
> >> routine for perldoc.
> >
> > Added in the attached version.
>
> I've done some wordsmithing on 0001 and it is OK, so I've applied it
> to move the needle.  Hope that helps.
>

Thank you very much.

Regards,
Amul


Re: PostgreSQL Contributors Updates

2024-03-03 Thread Amul Sul
On Sun, Mar 3, 2024 at 9:28 PM Joe Conway  wrote:

> All,
>
> The PostgreSQL Contributor Page
> (https://www.postgresql.org/community/contributors/) includes people who
> have made substantial, long-term contributions of time and effort to the
> PostgreSQL project. The PostgreSQL Contributors Team recognizes the
> following people for their contributions.
>
> New PostgreSQL Contributors:
>
> * Bertrand Drouvot
> * Gabriele Bartolini
> * Richard Guo
>
> New PostgreSQL Major Contributors:
>
> * Alexander Lakhin
> * Daniel Gustafsson
> * Dean Rasheed
> * John Naylor
> * Melanie Plageman
> * Nathan Bossart
>

Thank you and many congratulations to all.

Regards,
Amul


Re: Add system identifier to backup manifest

2024-02-18 Thread Amul Sul
On Mon, Feb 19, 2024 at 4:22 AM Michael Paquier  wrote:

> On Thu, Feb 15, 2024 at 05:41:46PM +0530, Robert Haas wrote:
> > On Thu, Feb 15, 2024 at 3:05 PM Amul Sul  wrote:
> > > Kindly have a look at the attached version.
> >
> > IMHO, 0001 looks fine, except probably the comment could be phrased a
> > bit more nicely.
>
> And the new option should be documented at the top of the init()
> routine for perldoc.
>

Added in the attached version.


> > That can be left for whoever commits this to
> > wordsmith. Michael, what are your plans?
>
> Not much, so feel free to not wait for me.  I've just read through the
> patch because I like the idea/feature :)
>

Thank you, that helped a lot.

> 0002 seems like a reasonable approach, but there's a hunk in the wrong
> > patch: 0004 modifies pg_combinebackup's check_control_files to use
> > get_dir_controlfile() rather than git_controlfile(), but it looks to
> > me like that should be part of 0002.
>

Fixed in the attached version.


> I'm slightly concerned about 0002 that silently changes the meaning of
> get_controlfile().  That would impact extension code without people
> knowing about it when compiling, just when they run their stuff under
> 17~.
>

Agreed, now they will have an error as _could not read file "": Is
a
directory_. But, IIUC, that what usually happens with the dev version, and
the
extension needs to be updated for compatibility with the newer version for
the
same reason.

Regards,
Amul
From 354014538b16579f005bd6f4ce771b1aa22b5e02 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Thu, 15 Feb 2024 12:10:23 +0530
Subject: [PATCH v8 1/4] Add option to force initdb in
 PostgreSQL::Test::Cluster:init()

---
 src/bin/pg_combinebackup/t/005_integrity.pl | 19 +++
 src/test/perl/PostgreSQL/Test/Cluster.pm| 15 ++-
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/src/bin/pg_combinebackup/t/005_integrity.pl b/src/bin/pg_combinebackup/t/005_integrity.pl
index 3b445d0e30f..5d425209211 100644
--- a/src/bin/pg_combinebackup/t/005_integrity.pl
+++ b/src/bin/pg_combinebackup/t/005_integrity.pl
@@ -18,18 +18,13 @@ $node1->init(has_archiving => 1, allows_streaming => 1);
 $node1->append_conf('postgresql.conf', 'summarize_wal = on');
 $node1->start;
 
-# Set up another new database instance. We don't want to use the cached
-# INITDB_TEMPLATE for this, because we want it to be a separate cluster
-# with a different system ID.
-my $node2;
-{
-	local $ENV{'INITDB_TEMPLATE'} = undef;
-
-	$node2 = PostgreSQL::Test::Cluster->new('node2');
-	$node2->init(has_archiving => 1, allows_streaming => 1);
-	$node2->append_conf('postgresql.conf', 'summarize_wal = on');
-	$node2->start;
-}
+# Set up another new database instance with force initdb option. We don't want
+# to initializing database system by copying initdb template for this, because
+# we want it to be a separate cluster with a different system ID.
+my $node2 = PostgreSQL::Test::Cluster->new('node2');
+$node2->init(force_initdb => 1, has_archiving => 1, allows_streaming => 1);
+$node2->append_conf('postgresql.conf', 'summarize_wal = on');
+$node2->start;
 
 # Take a full backup from node1.
 my $backup1path = $node1->backup_dir . '/backup1';
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 07da74cf562..2b4f9a48365 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -495,6 +495,10 @@ a directory that's only accessible to the current user to ensure that.
 On Windows, we use SSPI authentication to ensure the same (by pg_regress
 --config-auth).
 
+force_initdb => 1 will force to initialized the cluster by initdb. Otherwise, if
+available and if there aren't any parameters, use a previously initdb'd cluster
+as a template by copying it.
+
 WAL archiving can be enabled on this node by passing the keyword parameter
 has_archiving => 1. This is disabled by default.
 
@@ -517,6 +521,7 @@ sub init
 
 	local %ENV = $self->_get_env();
 
+	$params{force_initdb} = 0 unless defined $params{force_initdb};
 	$params{allows_streaming} = 0 unless defined $params{allows_streaming};
 	$params{has_archiving} = 0 unless defined $params{has_archiving};
 
@@ -529,14 +534,14 @@ sub init
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	# If available and if there aren't any parameters, use a previously
-	# initdb'd cluster as a template by copying it. For a lot of tests, that's
-	# substantially cheaper. Do so only if there aren't parameters, it doesn't
-	# seem worth figuring out whether they affect compatibility.
+	# For a lot of tests, that's substantially cheaper to copy previously
+	# initdb'd cluster as a template. Do so only if force_initdb => 0, and there
+	# aren't parameters, it 

Re: Add system identifier to backup manifest

2024-02-15 Thread Amul Sul
On Thu, Feb 15, 2024 at 7:18 AM Michael Paquier  wrote:

> On Wed, Feb 14, 2024 at 12:29:07PM +0530, Amul Sul wrote:
> > Ok, I did that way in the attached version, I have passed the control
> file's
> > full path as a second argument to verify_system_identifier() what we
> gets in
> > verify_backup_file(), but that is not doing any useful things with it,
> > since we
> > were using get_controlfile() to open the control file, which takes the
> > directory as an input and computes the full path on its own.
>
> I've read through the patch, and that's pretty cool.
>

Thank you for looking into this.


> -static void
> -parse_manifest_file(char *manifest_path, manifest_files_hash **ht_p,
> -   manifest_wal_range
> **first_wal_range_p)
> +static manifest_data *
> +parse_manifest_file(char *manifest_path)
>
> In 0001, should the comment describing this routine be updated as
> well?
>

Ok, updated in the attached version.


>
> +   identifier with pg_control of the backup directory or fails
> verification
>
> This is missing a  markup here.
>

Done, in the attached version.


>
> +  PostgreSQL 17, it is 2; in older
> versions,
> +  it is 1.
>
> Perhaps a couple of s here.
>
Done.


> +   if (strcmp(parse->manifest_version, "1") != 0 &&
> +   strcmp(parse->manifest_version, "2") != 0)
> +   json_manifest_parse_failure(parse->context,
> +
>  "unexpected manifest version");
> +
> +   /* Parse version. */
> +   version = strtoi64(parse->manifest_version, , 10);
> +   if (*ep)
> +   json_manifest_parse_failure(parse->context,
> +
>  "manifest version not an integer");
> +
> +   /* Invoke the callback for version */
> +   context->version_cb(context, version);
>
> Shouldn't these two checks be reversed?  And is there actually a need
> for the first check at all knowing that the version callback should be
> in charge of performing the validation vased on the version received?
>

Make sense, reversed the order.

I think, particular allowed versions should be placed at the central place,
and
the callback can check and react on the versions suitable to them, IMHO.


> +my $node2;
> +{
> +   local $ENV{'INITDB_TEMPLATE'} = undef;
>
> Not sure that it is a good idea to duplicate this pattern twice.
> Shouldn't this be something we'd want to control with an option in the
> init() method instead?
>

Yes, I did that in a separate patch, see 0001 patch.


> +static void
> +verify_system_identifier(verifier_context *context, char *controlpath)
>
> Relying both on controlpath, being a full path to the control file
> including the data directory, and context->backup_directory to read
> the contents of the control file looks a bit weird.  Wouldn't it be
> cleaner to just use one of them?
>

Well, yes, I had to have the same feeling, how about having another function
that can accept a full path of pg_control?

I tried in the 0002 patch, where the original function is renamed to
get_dir_controlfile(), which accepts the data directory path as before, and
get_controlfile() now accepts the full path of the pg_control file.

Kindly have a look at the attached version.

Regards,
Amul


v7-0004-Add-system-identifier-to-backup-manifest.patch
Description: Binary data


v7-0003-pg_verifybackup-code-refactor.patch
Description: Binary data


v7-0001-Add-option-to-force-initdb-in-PostgreSQL-Test-Clu.patch
Description: Binary data


v7-0002-Code-refactor-get_controlfile-to-accept-full-path.patch
Description: Binary data


Re: Add system identifier to backup manifest

2024-02-13 Thread Amul Sul
On Fri, Feb 2, 2024 at 12:03 AM Robert Haas  wrote:

> On Thu, Feb 1, 2024 at 2:18 AM Amul Sul  wrote:
> > I intended to minimize the out param of parse_manifest_file(), which
> currently
> > returns manifest_files_hash and manifest_wal_range, and I need two more
> --
> > manifest versions and the system identifier.
>
> Sure, but you could do context.ht = manifest_data->files instead of
> context.manifest = manifest_data. The question isn't whether you
> should return the whole manifest_data from parse_manifest_file -- I
> agree with that decision -- but rather whether you should feed the
> whole thing through into the context, or just the file hash.
>
> > Yeah, we can do that, but I think it is a bit inefficient to have
> strcmp()
> > check for the pg_control file on each verify_backup_file() call,
> despite, we
> > know that path.  Also, I think, we need additional handling to ensure
> that the
> > system identifier has been verified in verify_backup_file(), what if the
> > pg_control file itself missing from the backup -- might be a rare case,
> but
> > possible.
> >
> > For now, we can do the system identifier validation after
> > verify_backup_directory().
>
> Yes, that's another option, but I don't think it's as good.
>
> Suppose you do it that way. Then what will happen when the file is
> altogether missing or inaccessible? I think verify_backup_file() will
> complain, and then you'll have to do something ugly to avoid having
> verify_system_identifier() emit the same complaint all over again.
> Remember, unless you find some complicated way of passing data around,
> it won't know whether verify_backup_file() emitted a warning or not --
> it can redo the stat() and see what happens, but it's not absolutely
> guaranteed to be the same as what happened before. Making sure that
> you always emit any given complaint once rather than twice or zero
> times is going to be tricky.
>
> It seems more natural to me to just accept the (small) cost of a
> strcmp() in verify_backup_file(). If the initial stat() fails, it
> emits whatever complaint is appropriate and returns and the logic to
> check the system identifier is never reached. If it succeeds, you can
> proceed to try to open the file and do what you need to do.
>

Ok, I did that way in the attached version, I have passed the control file's
full path as a second argument to verify_system_identifier() what we gets in
verify_backup_file(), but that is not doing any useful things with it,
since we
were using get_controlfile() to open the control file, which takes the
directory as an input and computes the full path on its own.

Regards,
Amul


v6-0002-Add-system-identifier-to-backup-manifest.patch
Description: Binary data


v6-0001-pg_verifybackup-code-refactor.patch
Description: Binary data


Re: Add system identifier to backup manifest

2024-01-31 Thread Amul Sul
On Thu, Feb 1, 2024 at 3:06 AM Robert Haas  wrote:

> On Thu, Jan 25, 2024 at 2:52 AM Amul Sul  wrote:
> > Thank you for the review-comments, updated version attached.
>
> I generally agree with 0001. I spent a long time thinking about your
> decision to make verifier_context contain a pointer to manifest_data
> instead of, as it does currently, a pointer to manifest_files_hash. I
> don't think that's a horrible idea, but it also doesn't seem to be
> used anywhere currently. One advantage of the current approach is that
> we know that none of the code downstream of verify_backup_directory()
> or verify_backup_checksums() actually cares about anything other than
> the manifest_files_hash. That's kind of nice. If we didn't change this
> as you have done here, then we would need to continue passing the WAL
> ranges to parse_required_walI() and the system identifier would have
> to be passed explicitly to the code that checks the system identifier,
> but that's not such a bad thing, either. It makes it clear which
> functions are using which information.
>

I intended to minimize the out param of parse_manifest_file(), which
currently
returns manifest_files_hash and manifest_wal_range, and I need two more --
manifest versions and the system identifier.

But before you go change anything there, exactly when should 0002 be
> checking the system identifier in the control file? What happens now
> is that we first walk over the directory tree and make sure we have
> the files (verify_backup_directory) and then go through and verify
> checksums in a second pass (verify_backup_checksums). We do this
> because it lets us report problems that can be detected cheaply --
> like missing files -- relatively quickly, and problems that are more
> expensive to detect -- like mismatching checksums -- only after we've
> reported all the cheap-to-detect problems. At what stage should we
> verify the control file? I don't really like verifying it first, as
> you've done, because I think the error message change in
> 004_options.pl is a clear regression. When the whole directory is
> missing, it's much more pleasant to complain about the directory being
> missing than some file inside the directory being missing.
>
> What I'd be inclined to suggest is that you have verify_backup_file()
> notice when the file it's being asked to verify is the control file,
> and have it check the system identifier at that stage. I think if you
> do that, then the error message change in 004_options.pl goes away.
> Now, to do that, you'd need to have the whole manifest_data available
> from the context, not just the manifest_files_hash, so that you can
> see the expected system identifier. And, interestingly, if you take
> this approach, then it appears to me that 0001 is correct as-is and
> doesn't need any changes.
>

Yeah, we can do that, but I think it is a bit inefficient to have strcmp()
check for the pg_control file on each verify_backup_file() call, despite, we
know that path.  Also, I think, we need additional handling to ensure that
the
system identifier has been verified in verify_backup_file(), what if the
pg_control file itself missing from the backup -- might be a rare case, but
possible.

For now, we can do the system identifier validation after
verify_backup_directory().

Regards,
Amul


Re: Add system identifier to backup manifest

2024-01-24 Thread Amul Sul
On Wed, Jan 24, 2024 at 10:53 PM Robert Haas  wrote:

> On Mon, Jan 22, 2024 at 2:22 AM Amul Sul  wrote:
> > Thinking a bit more on this, I realized parse_manifest_file() has many
> out
> > parameters. Instead parse_manifest_file() should simply return manifest
> data
> > like load_backup_manifest().  Attached 0001 patch doing the same, and
> removed
> > parser_context structure, and added manifest_data, and did the required
> > adjustments to pg_verifybackup code.
>
> InitializeBackupManifest(, opt->manifest,
> -
> opt->manifest_checksum_type);
> +
> opt->manifest_checksum_type,
> +
> GetSystemIdentifier());
>
> InitializeBackupManifest() can just call GetSystemIdentifier() itself,
> instead of passing another parameter, I think.
>

Ok.


>
> +   if (manifest_version == 1)
> +   context->error_cb(context,
> + "%s: backup manifest
> doesn't support incremental changes",
> +
> private_context->backup_directory);
>
> I think this is weird. First, there doesn't seem to be any reason to
> bounce through error_cb() here. You could just call pg_fatal(), as we
> do elsewhere in this file. Second, there doesn't seem to be any need
> to include the backup directory in this error message. We include the
> file name (not the directory name) in errors that pertain to the file
> itself, like if we can't open or read it. But we don't do that for
> semantic errors about the manifest contents (cf.
> combinebackup_per_file_cb). This file would need a lot fewer charges
> if you didn't feed the backup directory name through here. Third, the
> error message is not well-chosen, because a user who looks at it won't
> understand WHY the manifest doesn't support incremental changes. I
> suggest "backup manifest version 1 does not support incremental
> backup".
>
> +   /* Incremental backups supported on manifest version 2 or later */
> +   if (manifest_version == 1)
> +   context->error_cb(context,
> + "incremental backups
> cannot be taken for this backup");
>
> Let's use the same error message as in the previous case here also.
>

Ok.


> +   for (i = 0; i < n_backups; i++)
> +   {
> +   if (manifests[i]->system_identifier != system_identifier)
> +   {
> +   char   *controlpath;
> +
> +   controlpath = psprintf("%s/%s",
> prior_backup_dirs[i], "global/pg_control");
> +
> +   pg_fatal("manifest is from different database
> system: manifest database system identifier is %llu, %s system
> identifier is %llu",
> +(unsigned long long)
> manifests[i]->system_identifier,
> +controlpath,
> +(unsigned long long)
> system_identifier);
> +   }
> +   }
>
> check_control_files() already verifies that all of the control files
> contain the same system identifier as each other, so what we're really
> checking here is that the backup manifest in each directory has the
> same system identifier as the control file in that same directory. One
> problem is that backup manifests are optional here, as per the comment
> in load_backup_manifests(), so you need to skip over NULL entries
> cleanly to avoid seg faulting if one is missing. I also think the
> error message should be changed. How about "%s: manifest system
> identifier is %llu, but control file has %llu"?
>

Ok.


> +   context->error_cb(context,
> + "manifest is from
> different database system: manifest database system identifier is
> %llu, pg_control database system identifier is %llu",
> + (unsigned long long)
> manifest_system_identifier,
> + (unsigned long long)
> system_identifier);
>
> And here, while I'm kibitzing, how about "manifest system identifier
> is %llu, but this system's identifier is %llu"?
>

I used "database system identifier" instead of "this system's identifier "
like
we are using in WalReceiverMain() and libpqrcv_identify_system().


> -   qr/could not open directory/,
> +   qr/could not open file/,
>
> I don't think that the expected error message here should be changing.
> Does it really, with the latest patch version? Why? Can we fix that?
>

Because, we were trying to access pg_control t

Re: Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).

2024-01-23 Thread Amul Sul
On Sat, Jan 20, 2024 at 7:55 AM vignesh C  wrote:

> On Fri, 22 Sept 2023 at 18:45, Amul Sul  wrote:
> >
> >
> >
> > On Wed, Sep 20, 2023 at 8:29 PM Alvaro Herrera 
> wrote:
> >>
> >> On 2023-Sep-20, Amul Sul wrote:
> >>
> >> > On the latest master head, I can see a $subject bug that seems to be
> related
> >> > commit #b0e96f311985:
> >> >
> >> > Here is the table definition:
> >> > create table foo(i int, j int, CONSTRAINT pk PRIMARY KEY(i)
> DEFERRABLE);
> >>
> >> Interesting, thanks for the report.  Your attribution to that commit is
> >> correct.  The table is dumped like this:
> >>
> >> CREATE TABLE public.foo (
> >> i integer CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT,
> >> j integer
> >> );
> >> ALTER TABLE ONLY public.foo
> >> ADD CONSTRAINT pk PRIMARY KEY (i) DEFERRABLE;
> >> ALTER TABLE ONLY public.foo DROP CONSTRAINT pgdump_throwaway_notnull_0;
> >>
> >> so the problem here is that the deferrable PK is not considered a reason
> >> to keep attnotnull set, so we produce a throwaway constraint that we
> >> then drop.  This is already bogus, but what is more bogus is the fact
> >> that the backend accepts the DROP CONSTRAINT at all.
> >>
> >> The pg_dump failing should be easy to fix, but fixing the backend to
> >> error out sounds more critical.  So, the reason for this behavior is
> >> that RelationGetIndexList doesn't want to list an index that isn't
> >> marked indimmediate as a primary key.  I can easily hack around that by
> >> doing
> >>
> >> diff --git a/src/backend/utils/cache/relcache.c
> b/src/backend/utils/cache/relcache.c
> >> index 7234cb3da6..971d9c8738 100644
> >> --- a/src/backend/utils/cache/relcache.c
> >> +++ b/src/backend/utils/cache/relcache.c
> >> @@ -4794,7 +4794,6 @@ RelationGetIndexList(Relation relation)
> >>  * check them.
> >>  */
> >> if (!index->indisunique ||
> >> -   !index->indimmediate ||
> >> !heap_attisnull(htup,
> Anum_pg_index_indpred, NULL))
> >> continue;
> >>
> >> @@ -4821,6 +4820,9 @@ RelationGetIndexList(Relation relation)
> >>  relation->rd_rel->relkind ==
> RELKIND_PARTITIONED_TABLE))
> >> pkeyIndex = index->indexrelid;
> >>
> >> +   if (!index->indimmediate)
> >> +   continue;
> >> +
> >> if (!index->indisvalid)
> >> continue;
> >>
> >>
> >> But of course this is not great, since it impacts unrelated bits of code
> >> that are relying on relation->pkindex or RelationGetIndexAttrBitmap
> >> having their current behavior with non-immediate index.
> >
> >
> > True, but still wondering why would relation->rd_pkattr skipped for a
> > deferrable primary key, which seems to be a bit incorrect to me since it
> > sensing that relation doesn't have PK at all.  Anyway, that is unrelated.
> >
> >>
> >> I think a real solution is to stop relying on RelationGetIndexAttrBitmap
> >> in ATExecDropNotNull().  (And, again, pg_dump needs some patch as well
> >> to avoid printing a throwaway NOT NULL constraint at this point.)
> >
> >
> > I might not have understood this, but I think, if it is ok to skip
> throwaway NOT
> > NULL for deferrable PK then that would be enough for the reported issue
> > to be fixed.  I quickly tried with the attached patch which looks
> sufficient
> > to skip that, but, TBH, I haven't thought carefully about this change.
>
> I did not see any test addition for this, can we add it?
>

Ok, added it in the attached version.

This was an experimental patch, I was looking for the comment on the
proposed
approach -- whether we could simply skip the throwaway NOT NULL constraint
for
deferred PK constraint.  Moreover,  skip that for any PK constraint.

Regards,
Amul


trial_skip_throwaway_non_null_v2.patch
Description: Binary data


Re: Add system identifier to backup manifest

2024-01-21 Thread Amul Sul
On Mon, Jan 22, 2024 at 10:08 AM Amul Sul  wrote:

>
>
> On Fri, Jan 19, 2024 at 10:36 PM Amul Sul  wrote:
>
>> On Wed, Jan 17, 2024 at 8:40 PM Robert Haas 
>> wrote:
>>
>>>
>>>
>> Updated version is attached.
>>
>
> Another updated version attached -- fix missing manifest version check in
> pg_verifybackup before system identifier validation.
>

Thinking a bit more on this, I realized parse_manifest_file() has many out
parameters. Instead parse_manifest_file() should simply return manifest data
like load_backup_manifest().  Attached 0001 patch doing the same, and
removed
parser_context structure, and added manifest_data, and did the required
adjustments to pg_verifybackup code.

Regards,
Amul


v4-0001-pg_verifybackup-code-refactor.patch
Description: Binary data


v4-0002-Add-system-identifier-to-backup-manifest.patch
Description: Binary data


Re: Add system identifier to backup manifest

2024-01-21 Thread Amul Sul
On Fri, Jan 19, 2024 at 10:36 PM Amul Sul  wrote:

> On Wed, Jan 17, 2024 at 8:40 PM Robert Haas  wrote:
>
>>
>>
> Updated version is attached.
>

Another updated version attached -- fix missing manifest version check in
pg_verifybackup before system identifier validation.

Regards,
Amul
From 7ff9e85acbf0789d16d29dc316ce2bd9382ac879 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Mon, 22 Jan 2024 10:05:20 +0530
Subject: [PATCH v3] Add system identifier to backup manifest

The backup manifest will be having a new item as "System-Identifier"
and its value is from "ControlFileData.system_identifier" when
manifest generated.  This help identify the correct database server
and/or backup while taking subsequent backup.
---
 doc/src/sgml/backup-manifest.sgml | 15 ++-
 doc/src/sgml/ref/pg_verifybackup.sgml |  3 +-
 src/backend/backup/backup_manifest.c  |  9 +-
 src/backend/backup/basebackup.c   |  3 +-
 src/backend/backup/basebackup_incremental.c   | 39 
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  | 20 
 src/bin/pg_combinebackup/load_manifest.c  | 62 ++--
 src/bin/pg_combinebackup/load_manifest.h  |  1 +
 src/bin/pg_combinebackup/pg_combinebackup.c   | 33 ++-
 src/bin/pg_combinebackup/t/005_integrity.pl   | 14 +++
 src/bin/pg_combinebackup/write_manifest.c |  8 +-
 src/bin/pg_combinebackup/write_manifest.h |  3 +-
 src/bin/pg_verifybackup/pg_verifybackup.c | 99 ++-
 src/bin/pg_verifybackup/t/003_corruption.pl   | 31 +-
 src/bin/pg_verifybackup/t/004_options.pl  |  2 +-
 src/bin/pg_verifybackup/t/005_bad_manifest.pl | 11 +++
 src/common/parse_manifest.c   | 83 +++-
 src/include/backup/backup_manifest.h  |  3 +-
 src/include/common/parse_manifest.h   |  6 ++
 19 files changed, 413 insertions(+), 32 deletions(-)

diff --git a/doc/src/sgml/backup-manifest.sgml b/doc/src/sgml/backup-manifest.sgml
index 771be1310a..a67462e3eb 100644
--- a/doc/src/sgml/backup-manifest.sgml
+++ b/doc/src/sgml/backup-manifest.sgml
@@ -37,7 +37,20 @@
 PostgreSQL-Backup-Manifest-Version
 
  
-  The associated value is always the integer 1.
+  The associated value is an integer. Beginning in
+  PostgreSQL 17, it is 2; in older versions,
+  it is 1.
+ 
+
+   
+
+   
+System-Identifier
+
+ 
+  The associated integer value is an unique system identifier to ensure
+  file match up with the installation that produced them. Available only
+  when PostgreSQL-Backup-Manifest-Version is 2 and later.
  
 

diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 36335e0a18..3051517d92 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -53,7 +53,8 @@ PostgreSQL documentation
Backup verification proceeds in four stages. First,
pg_verifybackup reads the
backup_manifest file. If that file
-   does not exist, cannot be read, is malformed, or fails verification
+   does not exist, cannot be read, is malformed, fails to match the system
+   identifier with pg_control of the backup directory or fails verification
against its own internal checksum, pg_verifybackup
will terminate with a fatal error.
   
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index 2c34e59752..612ff3add2 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -56,7 +56,8 @@ IsManifestEnabled(backup_manifest_info *manifest)
 void
 InitializeBackupManifest(backup_manifest_info *manifest,
 		 backup_manifest_option want_manifest,
-		 pg_checksum_type manifest_checksum_type)
+		 pg_checksum_type manifest_checksum_type,
+		 uint64 system_identifier)
 {
 	memset(manifest, 0, sizeof(backup_manifest_info));
 	manifest->checksum_type = manifest_checksum_type;
@@ -79,8 +80,10 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 
 	if (want_manifest != MANIFEST_OPTION_NO)
 		AppendToManifest(manifest,
-		 "{ \"PostgreSQL-Backup-Manifest-Version\": 1,\n"
-		 "\"Files\": [");
+		 "{ \"PostgreSQL-Backup-Manifest-Version\": 2,\n"
+		 "\"System-Identifier\": " UINT64_FORMAT ",\n"
+		 "\"Files\": [",
+		 system_identifier);
 }
 
 /*
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index d5b8ca21b7..315efc7536 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -256,7 +256,8 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 	backup_started_in_recovery = RecoveryInProgress();
 
 	InitializeBackupManifest(, opt->manifest,
-			 opt->manifest_checksum_type);
+			 opt->manifest_checksum_type,
+			 G

Re: Add system identifier to backup manifest

2024-01-19 Thread Amul Sul
On Thu, Jan 18, 2024 at 6:39 AM Sravan Kumar 
wrote:

> I have also done a review of the patch and some testing. The patch looks
> good, and I agree with Robert's comments.
>

Thank you for your review, testing and the offline discussion.

Regards,
Amul


Re: Add system identifier to backup manifest

2024-01-19 Thread Amul Sul
On Wed, Jan 17, 2024 at 8:40 PM Robert Haas  wrote:

> On Wed, Jan 17, 2024 at 6:31 AM Amul Sul  wrote:
> > With the attached patch, the backup manifest will have a new key item as
> > "System-Identifier" 64-bit integer whose value is derived from
> pg_control while
> > generating it, and the manifest version bumps to 2.
> >
> > This helps to identify the correct database server and/or backup for the
> > subsequent backup operations.  pg_verifybackup validates the manifest
> system
> > identifier against the backup control file and fails if they don’t match.
> > Similarly, pg_basebackup increment backup will fail if the manifest
> system
> > identifier does not match with the server system identifier.  The
> > pg_combinebackup is already a bit smarter -- checks the system
> identifier from
> > the pg_control of all the backups, with this patch the manifest system
> > identifier also validated.
>
> Thanks for working on this. Without this, I think what happens is that
> you can potentially take an incremental backup from the "wrong"
> server, if the states of the systems are such that all of the other
> sanity checks pass. When you run pg_combinebackup, it'll discover the
> problem and tell you, but you ideally want to discover such errors at
> backup time rather than at restore time. This addresses that. And,
> overall, I think it's a pretty good patch. But I nonetheless have a
> bunch of comments.
>

Thank you for the review.


>
> -  The associated value is always the integer 1.
> +  The associated value is the integer, either 1 or 2.
>
> is an integer. Beginning in PostgreSQL 17,
> it is 2; in older versions, it is 1.
>

Ok,


> + context.identity_cb = manifest_process_identity;
>
> I'm not really on board with calling the system identifier "identity"
> throughout the patch. I think it should just say system_identifier. If
> we were going to abbreviate, I'd prefer something like "sysident" that
> looks like it's derived from "system identifier" rather than
> "identity" which is a different word altogether. But I don't think we
> should abbreviate unless doing so creates *ridiculously* long
> identifier names.
>

Ok, used "system identifier" at all the places.


> +static void
> +manifest_process_identity(JsonManifestParseContext *context,
> +   int manifest_version,
> +   uint64 manifest_system_identifier)
> +{
> + uint64 system_identifier;
> +
> + /* Manifest system identifier available in version 2 or later */
> + if (manifest_version == 1)
> + return;
>
> I think you've got the wrong idea here. I think this function would
> only get called if System-Identifier is present in the manifest, so if
> it's a v1 manifest, this would never get called, so this if-statement
> would not ever do anything useful. I think what you should do is (1)
> if the client supplies a v1 manifest, reject it, because surely that's
> from an older server version that doesn't support incremental backup;
> but do that when the version is parsed rather than here; and (2) also
> detect and reject the case when it's supposedly a v2 manifest but this
> is absent.
>
> (1) should really be done when the version number is parsed, so I
> suspect you may need to add manifest_version_cb.
>
> +static void
> +combinebackup_identity_cb(JsonManifestParseContext *context,
> +   int manifest_version,
> +   uint64 manifest_system_identifier)
> +{
> + parser_context *private_context = context->private_data;
> + uint64 system_identifier = private_context->system_identifier;
> +
> + /* Manifest system identifier available in version 2 or later */
> + if (manifest_version == 1)
> + return;
>
> Very similar to the above case. Just reject a version 1 manifest as
> soon as we see the version number. In this function, set a flag
> indicating we saw the system identifier; if at the end of parsing that
> flag is not set, kaboom.
>

Ok, I added a version_cb callback. Using this pg_combinebackup &
pg_basebackup
will report an error for manifest version 1, whereas pg_verifybackup
doesn't (not needed IIUC).


>
> - parse_manifest_file(manifest_path, , _wal_range);
> + parse_manifest_file(manifest_path, , _wal_range,
> + context.backup_directory);
>
> Don't do this! parse_manifest_file() should just record everything
> found in the manifest in the context object. Syntax validation should
> happen while parsing the manifest (e.g. "CAT/DOG" is not a valid LSN
> and we should reject that at this stage) but semantic validation
> should happen later (e.g. "0/0" can't be a the correct backup end LSN
> but we don't figure that out while pa

Re: Add system identifier to backup manifest

2024-01-17 Thread Amul Sul
On Wed, Jan 17, 2024 at 5:15 PM Alvaro Herrera 
wrote:

> On 2024-Jan-17, Amul Sul wrote:
>
> > This helps to identify the correct database server and/or backup for the
> > subsequent backup operations.  pg_verifybackup validates the manifest
> system
> > identifier against the backup control file and fails if they don’t match.
> > Similarly, pg_basebackup increment backup will fail if the manifest
> system
> > identifier does not match with the server system identifier.  The
> > pg_combinebackup is already a bit smarter -- checks the system identifier
> > from
> > the pg_control of all the backups, with this patch the manifest system
> > identifier also validated.
>
> Hmm, okay, but what if I take a full backup from a primary server and
> later I want an incremental from a standby, or the other way around?
> Will this prevent me from using such a combination?
>

Yes, that worked for me where the system identifier was the same on
master as well standby.

Regards,
Amul


Add system identifier to backup manifest

2024-01-17 Thread Amul Sul
Hi All,

With the attached patch, the backup manifest will have a new key item as
"System-Identifier" 64-bit integer whose value is derived from pg_control
while
generating it, and the manifest version bumps to 2.

This helps to identify the correct database server and/or backup for the
subsequent backup operations.  pg_verifybackup validates the manifest system
identifier against the backup control file and fails if they don’t match.
Similarly, pg_basebackup increment backup will fail if the manifest system
identifier does not match with the server system identifier.  The
pg_combinebackup is already a bit smarter -- checks the system identifier
from
the pg_control of all the backups, with this patch the manifest system
identifier also validated.

For backward compatibility, the manifest system identifier validation will
be
skipped for version 1.

-- 
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
From 03d21efd7b03d17a55fc1dd1159e0838777f548a Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Wed, 17 Jan 2024 16:12:19 +0530
Subject: [PATCH v1] Add system identifier to backup manifest

The backup manifest will be having a new item as "System-Identifier"
and its value is from "ControlFileData.system_identifier" when
manifest generated.  This help identify the correct database server
and/or backup while taking subsequent backup.
---
 doc/src/sgml/backup-manifest.sgml | 13 +++-
 doc/src/sgml/ref/pg_verifybackup.sgml |  3 +-
 src/backend/backup/backup_manifest.c  |  9 ++-
 src/backend/backup/basebackup.c   |  3 +-
 src/backend/backup/basebackup_incremental.c   | 29 
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  | 20 +
 src/bin/pg_combinebackup/load_manifest.c  | 73 ---
 src/bin/pg_combinebackup/load_manifest.h  |  6 +-
 src/bin/pg_combinebackup/pg_combinebackup.c   | 16 ++--
 src/bin/pg_combinebackup/t/005_integrity.pl   | 14 
 src/bin/pg_combinebackup/write_manifest.c |  8 +-
 src/bin/pg_combinebackup/write_manifest.h |  3 +-
 src/bin/pg_verifybackup/pg_verifybackup.c | 59 ++-
 src/bin/pg_verifybackup/t/003_corruption.pl   | 16 +++-
 src/bin/pg_verifybackup/t/004_options.pl  |  2 +-
 src/bin/pg_verifybackup/t/005_bad_manifest.pl |  7 +-
 src/common/parse_manifest.c   | 36 -
 src/include/backup/backup_manifest.h  |  3 +-
 src/include/common/parse_manifest.h   |  4 +
 19 files changed, 286 insertions(+), 38 deletions(-)

diff --git a/doc/src/sgml/backup-manifest.sgml b/doc/src/sgml/backup-manifest.sgml
index 771be1310a..d0fc20ed0f 100644
--- a/doc/src/sgml/backup-manifest.sgml
+++ b/doc/src/sgml/backup-manifest.sgml
@@ -37,7 +37,18 @@
 PostgreSQL-Backup-Manifest-Version
 
  
-  The associated value is always the integer 1.
+  The associated value is the integer, either 1 or 2.
+ 
+
+   
+
+   
+System-Identifier
+
+ 
+  The associated integer value is an unique system identifier to ensure
+  file match up with the installation that produced them. Available only
+  when PostgreSQL-Backup-Manifest-Version is 2 and later.
  
 

diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 36335e0a18..3051517d92 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -53,7 +53,8 @@ PostgreSQL documentation
Backup verification proceeds in four stages. First,
pg_verifybackup reads the
backup_manifest file. If that file
-   does not exist, cannot be read, is malformed, or fails verification
+   does not exist, cannot be read, is malformed, fails to match the system
+   identifier with pg_control of the backup directory or fails verification
against its own internal checksum, pg_verifybackup
will terminate with a fatal error.
   
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index 2c34e59752..612ff3add2 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -56,7 +56,8 @@ IsManifestEnabled(backup_manifest_info *manifest)
 void
 InitializeBackupManifest(backup_manifest_info *manifest,
 		 backup_manifest_option want_manifest,
-		 pg_checksum_type manifest_checksum_type)
+		 pg_checksum_type manifest_checksum_type,
+		 uint64 system_identifier)
 {
 	memset(manifest, 0, sizeof(backup_manifest_info));
 	manifest->checksum_type = manifest_checksum_type;
@@ -79,8 +80,10 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 
 	if (want_manifest != MANIFEST_OPTION_NO)
 		AppendToManifest(manifest,
-		 "{ \"PostgreSQL-Backup-Manifest-Version\": 1,\n"
-		 "\"Files\": [");
+		 "{ \"PostgreSQL-Backup-Manifest-Version\": 2,\n"
+		 "\"System-Identifier\": " UINT64_FORMAT ",\n"
+		

Re: introduce dynamic shared memory registry

2024-01-08 Thread Amul Sul
On Mon, Jan 8, 2024 at 10:48 PM Nathan Bossart 
wrote:

> On Mon, Jan 08, 2024 at 11:13:42AM +0530, Amul Sul wrote:
> > +void *
> > +dsm_registry_init_or_attach(const char *key, size_t size,
> >
> > I think the name could be simple as dsm_registry_init() like we use
> > elsewhere e.g. ShmemInitHash() which doesn't say attach explicitly.
>
> That seems reasonable to me.
>
> > Similarly, I think dshash_find_or_insert() can be as simple as
> > dshash_search() and
> > accept HASHACTION like hash_search().
>
> I'm not totally sure what you mean here.  If you mean changing the dshash
> API, I'd argue that's a topic for another thread.
>

Yes, you are correct. I didn't realize that existing code -- now sure, why
wouldn't we implemented as the dynahash. Sorry for the noise.

Regards,
Amul


Re: introduce dynamic shared memory registry

2024-01-07 Thread Amul Sul
On Mon, Jan 8, 2024 at 10:53 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Sat, Jan 6, 2024 at 10:05 PM Nathan Bossart 
> wrote:
> >
> > I kept this the same, as I didn't see any need to tie the key size to
> > NAMEDATALEN.
>
> Thanks. A fresh look at the v5 patches left me with the following thoughts:
>
> 1. I think we need to add some notes about this new way of getting
> shared memory for external modules in the Shared Memory and
> LWLocks section in xfunc.sgml? This will at least tell there's
> another way for external modules to get shared memory, not just with
> the shmem_request_hook and shmem_startup_hook. What do you think?
>
> 2. FWIW, I'd like to call this whole feature "Support for named DSM
> segments in Postgres". Do you see anything wrong with this?
>
> 3. IIUC, this feature eventually makes both shmem_request_hook and
> shmem_startup_hook pointless, no? Or put another way, what's the
> significance of shmem request and startup hooks in lieu of this new
> feature? I think it's quite possible to get rid of the shmem request
> and startup hooks (of course, not now but at some point in future to
> not break the external modules), because all the external modules can
> allocate and initialize the same shared memory via
> dsm_registry_init_or_attach and its init_callback. All the external
> modules will then need to call dsm_registry_init_or_attach in their
> _PG_init callbacks and/or in their bg worker's main functions in case
> the modules intend to start up bg workers. Am I right?
>
> 4. With the understanding in comment #3, can pg_stat_statements and
> test_slru.c get rid of its shmem_request_hook and shmem_startup_hook
> and use dsm_registry_init_or_attach? It's not that this patch need to
> remove them now, but just asking if there's something in there that
> makes this new feature unusable.
>

+1, since doing for pg_prewarm, better to do for these modules as well.

A minor comment for v5:

+void *
+dsm_registry_init_or_attach(const char *key, size_t size,

I think the name could be simple as dsm_registry_init() like we use
elsewhere e.g. ShmemInitHash() which doesn't say attach explicitly.

Similarly, I think dshash_find_or_insert() can be as simple as
dshash_search() and
accept HASHACTION like hash_search().

Regards,
Amul


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2024-01-07 Thread Amul Sul
On Fri, Jan 5, 2024 at 12:28 AM Peter Eisentraut 
wrote:

> On 25.12.23 13:10, Amul Sul wrote:
> >
> I have committed this patch set.
>
> I couple of notes:
>
> You had included the moving of the AT_PASS_ADD_COL enum in the first
> patch.  This is not a good style.  Refactoring patches should not
> include semantic changes.  I have moved that change the final patch that
> introduced the new feature.
>
> I did not commit the 0002 patch that renamed some functions.  I think
> names like AlterColumn are too general, which makes this renaming
> possibly counterproductive.  I don't know a better name, maybe
> AlterTypeOrSimilar, but that's obviously silly.  I think leaving it at
> AlterType isn't too bad, since most of the code is indeed for ALTER TYPE
> support.  We can reconsider this if we have a better idea.
>
> In RememberAllDependentForRebuilding(), I dropped some of the additional
> errors that you introduced for the AT_SetExpression cases.  These didn't
> seem useful.  For example, it is not possible for a generated column to
> depend on another generated column, so there is no point checking for
> it.  Also, there were no test cases that covered any of these
> situations.  If we do want any of these, we should have tests and
> documentation for them.
>
> For the tests that examine the EXPLAIN plans, I had to add an ANALYZE
> after the SET EXPRESSION.  Otherwise, I got unstable test results,
> presumably because in some cases the analyze happened in the background.
>
>
Understood.

Thank you for your guidance and the commit.

Regards,
Amul


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-12-25 Thread Amul Sul
On Mon, Dec 18, 2023 at 3:01 PM Peter Eisentraut 
wrote:

> On 11.12.23 13:22, Amul Sul wrote:
> >
> > create table t1 (a int, b int generated always as (a + 1) stored);
> > alter table t1 add column c int, alter column b set expression as (a
> > + c);
> > ERROR:  42703: column "c" does not exist
> >
> > I think intuitively, this ought to work.  Maybe just moving the new
> > pass
> > after AT_PASS_ADD_COL would do it.
> >
> >
> > I think we can't support that (like alter type) since we need to place
> > this new
> > pass before AT_PASS_OLD_INDEX & AT_PASS_OLD_CONSTR to re-add indexes and
> > constraints for the validation.
>
> Could we have AT_PASS_ADD_COL before AT_PASS_OLD_*?  So overall it would be
>
> ...
> AT_PASS_ALTER_TYPE,
> AT_PASS_ADD_COL, // moved
> AT_PASS_SET_EXPRESSION,  // new
> AT_PASS_OLD_INDEX,
> AT_PASS_OLD_CONSTR,
> AT_PASS_ADD_CONSTR,
> ...
>
> This appears to not break any existing tests.
>

(Sorry, for the delay)

Agree. I did that change in 0001 patch.

Regards,
Amul


v7-0001-Code-refactor-convert-macro-listing-to-enum.patch
Description: Binary data


v7-0002-Code-refactor-separate-function-to-find-all-depen.patch
Description: Binary data


v7-0003-Allow-to-change-generated-column-expression.patch
Description: Binary data


Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2023-12-13 Thread Amul Sul
On Mon, Dec 11, 2023 at 10:42 AM Dilip Kumar  wrote:

> On Thu, Nov 30, 2023 at 3:30 PM Dilip Kumar  wrote:
> >
> > On Wed, Nov 29, 2023 at 4:58 PM Dilip Kumar 
> wrote:
>
> Here is the updated patch based on some comments by tender wang (those
> comments were sent only to me)
>

few nitpicks:

+
+   /*
+* Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+* SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+*/
+   bits16  bank_mask;
 } SlruCtlData;

...
...

+ int bankno = pageno & ctl->bank_mask;

I am a bit uncomfortable seeing it as a mask, why can't it be simply a
number
of banks (num_banks) and get the bank number through modulus op (pageno %
num_banks) instead of bitwise & operation (pageno & ctl->bank_mask) which
is a
bit difficult to read compared to modulus op which is quite simple,
straightforward and much common practice in hashing.

Are there any advantages of using &  over % ?

Also, a few places in 0002 and 0003 patch, need the bank number, it is
better
to have a macro for that.
---

 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64
segpage,
   void *data);
-
+extern bool check_slru_buffers(const char *name, int *newval);
 #endif /* SLRU_H */


Add an empty line after the declaration, in 0002 patch.
---

-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr
lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr
lsn,
+ int slotno)

Unrelated change for 0003 patch.
---

Regards,
Amul


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-12-11 Thread Amul Sul
On Tue, Nov 28, 2023 at 5:40 PM Peter Eisentraut 
wrote:

> On 23.11.23 15:13, Amul Sul wrote:
> > The exact sequencing of this seems to be tricky.  It's clear that we
> > need to do it earlier than at the end.  I also think it should be
> > strictly after AT_PASS_ALTER_TYPE so that the new expression can
> refer
> > to the new type of a column.  It should also be after
> AT_PASS_ADD_COL,
> > so that the new expression can refer to any newly added column.  But
> > then it's after AT_PASS_OLD_INDEX and AT_PASS_OLD_CONSTR, is that a
> > problem?
> >
> > AT_PASS_ALTER_TYPE and AT_PASS_ADD_COL cannot be together, the ALTER TYPE
> > cannot see that column, I think we can adopt the samebehaviour.
>
> With your v5 patch, I see the following behavior:
>
> create table t1 (a int, b int generated always as (a + 1) stored);
> alter table t1 add column c int, alter column b set expression as (a + c);
> ERROR:  42703: column "c" does not exist
>
> I think intuitively, this ought to work.  Maybe just moving the new pass
> after AT_PASS_ADD_COL would do it.
>
>
I think we can't support that (like alter type) since we need to place this
new
pass before AT_PASS_OLD_INDEX & AT_PASS_OLD_CONSTR to re-add indexes and
constraints for the validation.

While looking at this, I figured that converting the AT_PASS_* macros to
> an enum would make this code simpler and easier to follow.  For patches
> like yours you wouldn't have to renumber the whole list.  We could put
> this patch before yours if people agree with it.
>

Ok, 0001 patch does that.


>
> > I tried to reuse the code by borrowing code from ALTER TYPE, see if that
> > looks good to you.
> >
> > But I have concerns, with that code reuse where we drop and re-add the
> > indexes
> > and constraints which seems unnecessary for SET EXPRESSION where column
> > attributes will stay the same. I don't know why ATLER TYPE does that for
> > index
> > since finish_heap_swap() anyway does reindexing. We could skip re-adding
> > index for SET EXPRESSION which would be fine but we could not skip the
> > re-addition of constraints, since rebuilding constraints for checking
> might
> > need a good amount of code copy especially for foreign key constraints.
> >
> > Please have a look at the attached version, 0001 patch does the code
> > refactoring, and 0002 is the implementation, using the newly refactored
> > code to
> > re-add indexes and constraints for the validation. Added tests for the
> same.
>
> This looks reasonable after a first reading.  (I have found that using
> git diff --patience makes the 0001 patch look more readable.  Maybe
> that's helpful.)


Yeah, the attached version is generated with a better git-diff algorithm for
the readability.

Regards,
Amul


v6-0002-Code-refactor-separate-function-to-find-all-depen.patch
Description: Binary data


v6-0001-Code-refactor-convert-macro-listing-to-enum.patch
Description: Binary data


v6-0003-Allow-to-change-generated-column-expression.patch
Description: Binary data


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-23 Thread Amul Sul
On Mon, Nov 20, 2023 at 1:12 PM Peter Eisentraut 
wrote:

> On 17.11.23 13:25, Amul Sul wrote:
> > To fix this we should be doing something like ALTER COLUMN TYPE and the
> pass
> > should be AT_PASS_ALTER_TYPE (rename it or invent a new one near to
> that) so
> > that in ATRewriteCatalogs(), we would execute ATPostAlterTypeCleanup().
> >
> > I simply tried that by doing blind copy of code from
> > ATExecAlterColumnType() in
> > 0002 patch.  We don't really need to do all the stuff such as re-adding
> > indexes, constraints etc, but I am out of time for today to figure out
> the
> > optimum code and I will be away from work in the first half of the coming
> > week and the week after that. Therefore, I thought of sharing an
> approach to
> > get comments/thoughts on the direction, thanks.
>
> The exact sequencing of this seems to be tricky.  It's clear that we
> need to do it earlier than at the end.  I also think it should be
> strictly after AT_PASS_ALTER_TYPE so that the new expression can refer
> to the new type of a column.  It should also be after AT_PASS_ADD_COL,
> so that the new expression can refer to any newly added column.  But
> then it's after AT_PASS_OLD_INDEX and AT_PASS_OLD_CONSTR, is that a
> problem?
>

AT_PASS_ALTER_TYPE and AT_PASS_ADD_COL cannot be together, the ALTER TYPE
cannot see that column, I think we can adopt the same behaviour.

But, we need to have ALTER SET EXPRESSION after the ALTER TYPE since if we
add
the new generated expression for the current type (e.g.  int) and we would
alter the type (e.g. text or numeric) then that will be problematic in the
ATRewriteTable() where a new generation expression will generate value for
the
old type but the actual type is something else. Therefore I have added
AT_PASS_SET_EXPRESSION to execute after AT_PASS_ALTER_TYPE.

(It might be an option for the first version of this feature to not
> support altering columns that have constraints on them.  But we do need
> to support columns with indexes on them.  Does that work ok?  Does that
> depend on the relative order of AT_PASS_OLD_INDEX?)
>

I tried to reuse the code by borrowing code from ALTER TYPE, see if that
looks good to you.

But I have concerns, with that code reuse where we drop and re-add the
indexes
and constraints which seems unnecessary for SET EXPRESSION where column
attributes will stay the same. I don't know why ATLER TYPE does that for
index
since finish_heap_swap() anyway does reindexing. We could skip re-adding
index for SET EXPRESSION which would be fine but we could not skip the
re-addition of constraints, since rebuilding constraints for checking might
need a good amount of code copy especially for foreign key constraints.

Please have a look at the attached version, 0001 patch does the code
refactoring, and 0002 is the implementation, using the newly refactored
code to
re-add indexes and constraints for the validation. Added tests for the same.

Regards,
Amul
From e6418c3f36618c517b160ab71895975773d16f6c Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Wed, 22 Nov 2023 18:23:56 +0530
Subject: [PATCH v5 1/2] Code refactor: separate function to find all dependent
 object on column

Move code from ATExecAlterColumnType() that finds the all the object
that depends on the column to a separate function.

Also, renamed ATPostAlterTypeCleanup() and ATPostAlterTypeParse()
function for the general use.
---
 src/backend/commands/tablecmds.c | 474 ---
 1 file changed, 248 insertions(+), 226 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 323d9bf8702..ccc152f54e9 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -557,14 +557,16 @@ static void ATPrepAlterColumnType(List **wqueue,
 static bool ATColumnChangeRequiresRewrite(Node *expr, AttrNumber varattno);
 static ObjectAddress ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
 		   AlterTableCmd *cmd, LOCKMODE lockmode);
+static void RememberAllDependentForRebuilding(AlteredTableInfo *tab,
+			  Relation rel, AttrNumber attnum);
 static void RememberConstraintForRebuilding(Oid conoid, AlteredTableInfo *tab);
 static void RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab);
 static void RememberStatisticsForRebuilding(Oid stxoid, AlteredTableInfo *tab);
-static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab,
-   LOCKMODE lockmode);
-static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
- char *cmd, List **wqueue, LOCKMODE lockmode,
- bool rewrite);
+static void ATPostAlterColumnCleanup(List **wqueue, AlteredTableInfo *tab,
+	 LOCKMODE lockmode);
+static void ATPostAlterColumnParse(Oid oldId, Oid oldRelId, Oid refRelId,
+   char *cmd, List **wqueue, LOCKMODE lockmode,
+   bool r

Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-17 Thread Amul Sul
On Thu, Nov 16, 2023 at 7:05 PM Amul Sul  wrote:

> On Thu, Nov 16, 2023 at 2:50 AM Peter Eisentraut 
> wrote:
>
>> On 15.11.23 13:26, Amul Sul wrote:
>> > Question: Why are you using AT_PASS_ADD_OTHERCONSTR?  I don't know
>> if
>> > it's right or wrong, but if you have a specific reason, it would be
>> > good
>> > to know.
>> >
>> > I referred to ALTER COLUMN DEFAULT and used that.
>>
>> Hmm, I'm not sure if that is a good comparison.  For ALTER TABLE, SET
>> DEFAULT is just a catalog manipulation, it doesn't change any data, so
>> it's pretty easy.  SET EXPRESSION changes data, which other phases might
>> want to inspect?  For example, if you do SET EXPRESSION and add a
>> constraint in the same ALTER TABLE statement, do those run in the
>> correct order?
>>
>
> I think, you are correct, but currently AT_PASS_ADD_OTHERCONSTR is for
> AT_CookedColumnDefault, AT_ColumnDefault, and AT_AddIdentity.
> AT_CookedColumnDefault is only supported for CREATE TABLE.
> AT_ColumnDefault
> and AT_AddIdentity will be having errors while operating on the generated
> column. So
> that anomaly does not exist, but could be in future addition. I think it
> is better to
> use AT_PASS_MISC to keep this operation at last.
>
> While testing this, I found a serious problem with the patch that CHECK and
> FOREIGN KEY constraint check does not happens at rewrite, see this:
>
> create table a (y int primary key);
> insert into a values(1),(2);
> create table b (x int, y int generated always as(x) stored, foreign
> key(y) references a(y));
> insert into b values(1),(2);
> insert into b values(3);<-- an error, expected one
>
> alter table b alter column y set expression as (x*100);  <-- no
> error, NOT expected
>
> select * from b;
>  x |  y
> ---+-
>  1 | 100
>  2 | 200
> (2 rows)
>
> Also,
>
> delete from a;   <-- no error, NOT expected.
> select * from a;
>  y
> ---
> (0 rows)
>
> Shouldn't that have been handled by the ATRewriteTables() facility
> implicitly
> like NOT NULL constraints?  Or should we prepare a list of CHECK and FK
> constraints and pass it through tab->constraints?
>

To fix this we should be doing something like ALTER COLUMN TYPE and the pass
should be AT_PASS_ALTER_TYPE (rename it or invent a new one near to that) so
that in ATRewriteCatalogs(), we would execute ATPostAlterTypeCleanup().

I simply tried that by doing blind copy of code from
ATExecAlterColumnType() in
0002 patch.  We don't really need to do all the stuff such as re-adding
indexes, constraints etc, but I am out of time for today to figure out the
optimum code and I will be away from work in the first half of the coming
week and the week after that. Therefore, I thought of sharing an approach to
get comments/thoughts on the direction, thanks.

Regards,
Amul


v4-0001-Allow-to-change-generated-column-expression.patch
Description: Binary data


v4-0002-POC-FK-and-CHECK-constraint-check.patch
Description: Binary data


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-16 Thread Amul Sul
On Thu, Nov 16, 2023 at 2:50 AM Peter Eisentraut 
wrote:

> On 15.11.23 13:26, Amul Sul wrote:
> > Question: Why are you using AT_PASS_ADD_OTHERCONSTR?  I don't know if
> > it's right or wrong, but if you have a specific reason, it would be
> > good
> > to know.
> >
> > I referred to ALTER COLUMN DEFAULT and used that.
>
> Hmm, I'm not sure if that is a good comparison.  For ALTER TABLE, SET
> DEFAULT is just a catalog manipulation, it doesn't change any data, so
> it's pretty easy.  SET EXPRESSION changes data, which other phases might
> want to inspect?  For example, if you do SET EXPRESSION and add a
> constraint in the same ALTER TABLE statement, do those run in the
> correct order?
>

I think, you are correct, but currently AT_PASS_ADD_OTHERCONSTR is for
AT_CookedColumnDefault, AT_ColumnDefault, and AT_AddIdentity.
AT_CookedColumnDefault is only supported for CREATE TABLE.  AT_ColumnDefault
and AT_AddIdentity will be having errors while operating on the generated
column. So
that anomaly does not exist, but could be in future addition. I think it is
better to
use AT_PASS_MISC to keep this operation at last.

While testing this, I found a serious problem with the patch that CHECK and
FOREIGN KEY constraint check does not happens at rewrite, see this:

create table a (y int primary key);
insert into a values(1),(2);
create table b (x int, y int generated always as(x) stored, foreign key(y)
references a(y));
insert into b values(1),(2);
insert into b values(3);<-- an error, expected one

alter table b alter column y set expression as (x*100);  <-- no error,
NOT expected

select * from b;
 x |  y
---+-
 1 | 100
 2 | 200
(2 rows)

Also,

delete from a;   <-- no error, NOT expected.
select * from a;
 y
---
(0 rows)

Shouldn't that have been handled by the ATRewriteTables() facility
implicitly
like NOT NULL constraints?  Or should we prepare a list of CHECK and FK
constraints and pass it through tab->constraints?


> > Tiny comment: The error message in ATExecSetExpression() does not
> need
> > to mention "stored", since it would be also applicable to virtual
> > generated columns in the future.
> >
> > I had to have the same thought, but later decided when we do that
> > virtual column thing, we could simply change that. I am fine to do that
> > change
> > now as well, let me know your thought.
>
> Not a big deal, but I would change it now.
>
> Another small thing I found:  In ATExecColumnDefault(), there is an
> errhint() that suggests DROP EXPRESSION instead of DROP DEFAULT.  You
> could now add another hint that suggests SET EXPRESSION instead of SET
> DEFAULT.
>

Ok.

Regards,
Amul Sul


Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro

2023-11-15 Thread Amul Sul
On Wed, Nov 15, 2023 at 9:26 PM Nathan Bossart 
wrote:

> On Wed, Nov 15, 2023 at 09:27:18AM +0530, Amul Sul wrote:
> > Nevermind, I usually use git apply or git am, here are those errors:
> >
> > PG/ - (master) $ git apply
> ~/Downloads/retire_compatibility_macro_v1.patch
> > error: patch failed: src/backend/access/brin/brin.c:297
> > error: src/backend/access/brin/brin.c: patch does not apply
>
> I wonder if your mail client is modifying the patch.  Do you have the same
> issue if you download it from the archives?
>

Yes, you are correct. Surprisingly, the archive version applied cleanly.

Gmail is doing something, I usually use web login on chrome browser,  I
never
faced such issues with other's patches.  Anyway, will try both the versions
next
time for the same kind of issue, sorry for the noise.

Regards,
Amul


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-15 Thread Amul Sul
On Wed, Nov 15, 2023 at 5:09 PM Peter Eisentraut 
wrote:

> On 14.11.23 11:40, Amul Sul wrote:
> > Please have a look at the attached version, updating the syntax to have
> "AS"
> > after EXPRESSION and other changes suggested previously.
>
> The code structure looks good to me now.
>

Thank you for your review.


>
> Question: Why are you using AT_PASS_ADD_OTHERCONSTR?  I don't know if
> it's right or wrong, but if you have a specific reason, it would be good
> to know.
>

I referred to ALTER COLUMN DEFAULT and used that.



> I think ATExecSetExpression() needs to lock pg_attribute?  Did you lose
> that during the refactoring?
>

I have removed that intentionally since we were not updating anything in
pg_attribute like ALTER DROP EXPRESSION.


>
> Tiny comment: The error message in ATExecSetExpression() does not need
> to mention "stored", since it would be also applicable to virtual
> generated columns in the future.
>

I had to have the same thought, but later decided when we do that
virtual column thing, we could simply change that. I am fine to do that
change
now as well, let me know your thought.


> Documentation additions in alter_table.sgml should use one-space indent
> consistently.  Also, "This form replaces expression" is missing a "the"?
>

Ok, will fix that.

Regards,
Amul


Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro

2023-11-14 Thread Amul Sul
On Tue, Nov 14, 2023 at 9:21 PM Nathan Bossart 
wrote:

> On Tue, Nov 14, 2023 at 04:25:24PM +0530, Amul Sul wrote:
> > Changes looks pretty much straight forward, but patch failed to apply on
> the
> > latest master head(b41b1a7f490) at me.
>
> Thanks for taking a look.  Would you mind sharing the error(s) you are
> seeing?  The patch applies fine on cfbot and my machine, and check-world
> continues to pass.
>

Nevermind, I usually use git apply or git am, here are those errors:

PG/ - (master) $ git apply ~/Downloads/retire_compatibility_macro_v1.patch
error: patch failed: src/backend/access/brin/brin.c:297
error: src/backend/access/brin/brin.c: patch does not apply
error: patch failed: src/backend/access/gin/ginscan.c:251
error: src/backend/access/gin/ginscan.c: patch does not apply
error: patch failed: src/backend/access/transam/xact.c:1933
error: src/backend/access/transam/xact.c: patch does not apply
error: patch failed: src/backend/commands/analyze.c:583
error: src/backend/commands/analyze.c: patch does not apply
error: patch failed: src/backend/executor/nodeRecursiveunion.c:317
error: src/backend/executor/nodeRecursiveunion.c: patch does not apply
error: patch failed: src/backend/executor/nodeSetOp.c:631
error: src/backend/executor/nodeSetOp.c: patch does not apply
error: patch failed: src/backend/executor/nodeWindowAgg.c:216
error: src/backend/executor/nodeWindowAgg.c: patch does not apply
error: patch failed: src/backend/executor/spi.c:547
error: src/backend/executor/spi.c: patch does not apply
error: patch failed: src/backend/postmaster/autovacuum.c:555
error: src/backend/postmaster/autovacuum.c: patch does not apply
error: patch failed: src/backend/postmaster/bgwriter.c:182
error: src/backend/postmaster/bgwriter.c: patch does not apply
error: patch failed: src/backend/postmaster/checkpointer.c:290
error: src/backend/postmaster/checkpointer.c: patch does not apply
error: patch failed: src/backend/postmaster/walwriter.c:178
error: src/backend/postmaster/walwriter.c: patch does not apply
error: patch failed: src/backend/replication/logical/worker.c:3647
error: src/backend/replication/logical/worker.c: patch does not apply
error: patch failed: src/backend/statistics/extended_stats.c:2237
error: src/backend/statistics/extended_stats.c: patch does not apply
error: patch failed: src/backend/tcop/postgres.c:4457
error: src/backend/tcop/postgres.c: patch does not apply
error: patch failed: src/backend/utils/cache/evtcache.c:91
error: src/backend/utils/cache/evtcache.c: patch does not apply
error: patch failed: src/backend/utils/error/elog.c:1833
error: src/backend/utils/error/elog.c: patch does not apply
error: patch failed: src/include/utils/memutils.h:66
error: src/include/utils/memutils.h: patch does not apply
PG/ - (master) $

Regards,
Amul


Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro

2023-11-14 Thread Amul Sul
On Tue, Nov 14, 2023 at 12:30 AM Nathan Bossart 
wrote:

> I just found myself researching the difference between MemoryContextReset()
> and MemoryContextResetAndDeleteChildren(), and it turns out that as of
> commit eaa5808 (2015), there is none.
> MemoryContextResetAndDeleteChildren() is just a backwards compatibility
> macro for MemoryContextReset().  I found this surprising because it sounds
> like they do very different things.
>
> Shall we retire this backwards compatibility macro at this point?  A search
> of https://codesearch.debian.net/ does reveal a few external uses, so we
> could alternatively leave it around and just update Postgres to stop using
> it, but I don't think it would be too burdensome for extension authors to
> fix if we removed it completely.
>

+1

Patch attached.
>

Changes looks pretty much straight forward, but patch failed to apply on the
latest master head(b41b1a7f490) at me.

Regards,
Amul


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-14 Thread Amul Sul
On Mon, Nov 13, 2023 at 9:09 PM Peter Eisentraut 
wrote:

> On 13.11.23 14:07, Amul Sul wrote:
> > Also, it seems to me that the SET EXPRESSION variant should just do
> an
> > update of the catalog table instead of a drop and re-insert.
> >
> > I am not sure if that is sufficient; we need to get rid of the
> > dependencies of
> > existing expressions on other columns and/or objects that need to be
> > removed.
> > The drop and re-insert does that easily.
>
> Ok, good point.
>
> > The documentation needs some improvements:
> >
> > +ALTER [ COLUMN ]  > class="parameter">column_name SET EXPRESSION
>  > class="parameter">expression STORED
> >
> > If we're going to follow the Db2 syntax, there should be an "AS"
> after
> > EXPRESSION.  And the implemented syntax requires parentheses, so they
> > should appear in the documentation.
> >
> > Also, the keyword STORED shouldn't be there.  (The same command
> should
> > be applicable to virtual generated columns in the future.)
> >
> > I have omitted "AS" intentionally, to keep syntax similar to our existing
> > ALTERCOLUMN... SET DEFAULT .  Let me know if you want
> > me to add that.
>
> Well, my idea was to follow the Db2 syntax.  Otherwise, we are adding
> yet another slightly different syntax to the world.  Even if we think
> our idea is slightly better, it doesn't seem worth it.
>

Ok.

Please have a look at the attached version, updating the syntax to have "AS"
after EXPRESSION and other changes suggested previously.

Regards,
Amul


v4-0001-Allow-to-change-generated-column-expression.patch
Description: Binary data


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-13 Thread Amul Sul
On Mon, Nov 13, 2023 at 1:40 PM Peter Eisentraut 
wrote:

> On 09.11.23 13:00, Amul Sul wrote:
> > On Tue, Nov 7, 2023 at 8:21 PM Peter Eisentraut  > <mailto:pe...@eisentraut.org>> wrote:
> >
> > On 25.10.23 08:12, Amul Sul wrote:
> >  > Here is the rebase version for the latest master
> head(673a17e3120).
> >  >
> >  > I haven't done any other changes related to the ON UPDATE trigger
> > since that
> >  > seems non-trivial; need a bit of work to add trigger support in
> >  > ATRewriteTable().
> >  > Also, I am not sure yet, if we were doing these changes, and the
> > correct
> >  > direction
> >  > for that.
> >
> > I did some detailed testing on Db2, Oracle, and MariaDB (the three
> > existing implementations of this feature that I'm tracking), and
> > none of
> > them fire any row or statement triggers when the respective
> > statement to
> > alter the generation expression is run.  So I'm withdrawing my
> comment
> > that this should fire triggers.  (I suppose event triggers are
> > available
> > if anyone really needs the functionality.)
> >
> >
> > Thank you for the confirmation.
> >
> > Here is the updated version patch. Did minor changes to documents and
> tests.
>
> I don't like the renaming in the 0001 patch.  I think it would be better
> to keep the two subcommands (DROP and SET) separate.  There is some
> overlap now, but for example I'm thinking about virtual generated
> columns, then there will be even more conditionals in there.  Let's keep
> it separate for clarity.
>

Understood. Will do the same.


> Also, it seems to me that the SET EXPRESSION variant should just do an
> update of the catalog table instead of a drop and re-insert.
>

I am not sure if that is sufficient; we need to get rid of the dependencies
of
existing expressions on other columns and/or objects that need to be
removed.
The drop and re-insert does that easily.


> The documentation needs some improvements:
>
> +ALTER [ COLUMN ]  class="parameter">column_name SET EXPRESSION  class="parameter">expression STORED
>
> If we're going to follow the Db2 syntax, there should be an "AS" after
> EXPRESSION.  And the implemented syntax requires parentheses, so they
> should appear in the documentation.
>
> Also, the keyword STORED shouldn't be there.  (The same command should
> be applicable to virtual generated columns in the future.)
>

I have omitted "AS" intentionally, to keep syntax similar to our existing
ALTER COLUMN ... SET DEFAULT .  Let me know if you want
me to add that.

The STORED suggested by Vik[1].  I think we could skip that if there is no
need
to differentiate between stored and virtual columns at ALTER.

Regards,
Amul

1] postgr.es/m/d15cf691-55d0-e405-44ec-6448986c3...@postgresfriends.org


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-09 Thread Amul Sul
On Tue, Nov 7, 2023 at 8:21 PM Peter Eisentraut 
wrote:

> On 25.10.23 08:12, Amul Sul wrote:
> > Here is the rebase version for the latest master head(673a17e3120).
> >
> > I haven't done any other changes related to the ON UPDATE trigger since
> that
> > seems non-trivial; need a bit of work to add trigger support in
> > ATRewriteTable().
> > Also, I am not sure yet, if we were doing these changes, and the correct
> > direction
> > for that.
>
> I did some detailed testing on Db2, Oracle, and MariaDB (the three
> existing implementations of this feature that I'm tracking), and none of
> them fire any row or statement triggers when the respective statement to
> alter the generation expression is run.  So I'm withdrawing my comment
> that this should fire triggers.  (I suppose event triggers are available
> if anyone really needs the functionality.)
>

Thank you for the confirmation.

Here is the updated version patch. Did minor changes to documents and tests.

Regards,
Amul


v3-0001-Prerequisite-changes-rename-functions-enum.patch
Description: Binary data


v3-0002-Allow-to-change-generated-column-expression.patch
Description: Binary data


Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2023-11-07 Thread Amul Sul
On Fri, Nov 3, 2023 at 10:59 AM Dilip Kumar  wrote:

> On Mon, Oct 30, 2023 at 11:50 AM Dilip Kumar 
> wrote:
> > [...]
> [1] 0001-Make-all-SLRU-buffer-sizes-configurable: This is the same
> patch as the previous patch set
> [2] 0002-Add-a-buffer-mapping-table-for-SLRUs: Patch to introduce
> buffer mapping hash table
> [3] 0003-Partition-wise-slru-locks: Partition the hash table and also
> introduce partition-wise locks: this is a merge of 0003 and 0004 from
> the previous patch set but instead of bank-wise locks it has
> partition-wise locks and LRU counter.
> [4] 0004-Merge-partition-locks-array-with-buffer-locks-array: merging
> buffer locks and bank locks in the same array so that the bank-wise
> LRU counter does not fetch the next cache line in a hot function
> SlruRecentlyUsed()(same as 0005 from the previous patch set)
> [5] 0005-Ensure-slru-buffer-slots-are-in-multiple-of-number-of: Ensure
> that the number of slots is in multiple of the number of banks
> [...]


Here are some minor comments:

+ * By default, we'll use 1MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-   return Min(256, Max(4, NBuffers / 256));
+   /* Use configured value if provided. */
+   if (commit_ts_buffers > 0)
+   return Max(16, commit_ts_buffers);
+   return Min(256, Max(16, NBuffers / 256));

Do you mean "4MB of for every 1GB"  in the comment?
--

diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 5087cdce51..78d017ad85 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -16,7 +16,6 @@
 #include "replication/origin.h"
 #include "storage/sync.h"

-
 extern PGDLLIMPORT bool track_commit_timestamp;

A spurious change.
--

@@ -168,10 +180,19 @@ SimpleLruShmemSize(int nslots, int nlsns)

if (nlsns > 0)
sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));/*
group_lsn[] */
-
return BUFFERALIGN(sz) + BLCKSZ * nslots;
 }

Another spurious change in 0002 patch.
--

+/*
+ * The slru buffer mapping table is partitioned to reduce contention. To
+ * determine which partition lock a given pageno requires, compute the
pageno's
+ * hash code with SlruBufTableHashCode(), then apply SlruPartitionLock().
+ */

I didn't see SlruBufTableHashCode() & SlruPartitionLock() functions
anywhere in
your patches, is that outdated comment?
--

-   sz += MAXALIGN(nslots * sizeof(LWLockPadded));  /* buffer_locks[] */
-   sz += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(LWLockPadded)); /*
part_locks[] */
+   sz += MAXALIGN((nslots + SLRU_NUM_PARTITIONS) * sizeof(LWLockPadded));
 /* locks[] */

I am a bit uncomfortable with these changes, merging parts and buffer locks
making it hard to understand the code. Not sure what we were getting out of
this?
--

Subject: [PATCH v4 5/5] Ensure slru buffer slots are in multiple of numbe of
 partitions

I think the 0005 patch can be merged to 0001.

Regards,
Amul


Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2023-11-07 Thread Amul Sul
On Mon, Nov 6, 2023 at 4:44 PM Andrey M. Borodin 
wrote:

>
>
> > On 6 Nov 2023, at 14:31, Alvaro Herrera  wrote:
> >
> > dynahash is notoriously slow, which is why we have simplehash.h since
> > commit b30d3ea824c5.  Maybe we could use that instead.
>
> Dynahash has lock partitioning. Simplehash has not, AFAIK.
> The thing is we do not really need a hash function - pageno is already a
> best hash function itself. And we do not need to cope with collisions much
> - we can evict a collided buffer.
>
> Given this we do not need a hashtable at all. That’s exact reasoning how
> banks emerged, I started implementing dynahsh patch in April 2021 and found
> out that “banks” approach is cleaner. However the term “bank” is not common
> in software, it’s taken from hardware cache.
>

I agree that we don't need the hash function to generate hash value out of
pageno which itself is sufficient, but I don't understand how we can get
rid of
the hash table itself -- how we would map the pageno and the slot number?
That mapping is not needed at all?

Regards,
Amul


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-10-25 Thread Amul Sul
Here is the rebase version for the latest master head(673a17e3120).

I haven't done any other changes related to the ON UPDATE trigger since that
seems non-trivial; need a bit of work to add trigger support in
ATRewriteTable().
Also, I am not sure yet, if we were doing these changes, and the correct
direction
for that.

Regards,
Amul
From 0b6ca9d74ecb7debfe02af340843fa80c937684f Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Mon, 9 Oct 2023 12:00:04 +0530
Subject: [PATCH v2 2/2] Allow to change generated column expression

---
 doc/src/sgml/ref/alter_table.sgml   |  14 +-
 src/backend/commands/tablecmds.c|  91 +
 src/backend/parser/gram.y   |  10 ++
 src/bin/psql/tab-complete.c |   2 +-
 src/include/nodes/parsenodes.h  |   2 +-
 src/test/regress/expected/generated.out | 167 
 src/test/regress/sql/generated.sql  |  36 -
 7 files changed, 265 insertions(+), 57 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 2c4138e4e9f..84bf8fa6ef3 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -46,6 +46,7 @@ ALTER TABLE [ IF EXISTS ] name
 ALTER [ COLUMN ] column_name SET DEFAULT expression
 ALTER [ COLUMN ] column_name DROP DEFAULT
 ALTER [ COLUMN ] column_name { SET | DROP } NOT NULL
+ALTER [ COLUMN ] column_name SET EXPRESSION expression STORED
 ALTER [ COLUMN ] column_name DROP EXPRESSION [ IF EXISTS ]
 ALTER [ COLUMN ] column_name ADD GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ]
 ALTER [ COLUMN ] column_name { SET GENERATED { ALWAYS | BY DEFAULT } | SET sequence_option | RESTART [ [ WITH ] restart ] } [...]
@@ -256,13 +257,18 @@ WITH ( MODULUS numeric_literal, REM
 

 
-   
+   
 DROP EXPRESSION [ IF EXISTS ]
 
  
-  This form turns a stored generated column into a normal base column.
-  Existing data in the columns is retained, but future changes will no
-  longer apply the generation expression.
+  The SET form replaces stored generated value for a
+  column.  Existing data in the columns is rewritten and all the future
+  changes will apply the new generation expression.
+ 
+ 
+  The DROP form turns a stored generated column into a
+  normal base column.  Existing data in the columns is retained, but future
+  changes will no longer apply the generation expression.
  
 
  
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e56f3af8e84..2f1d7d3531d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -457,7 +457,8 @@ static ObjectAddress ATExecSetIdentity(Relation rel, const char *colName,
 static ObjectAddress ATExecDropIdentity(Relation rel, const char *colName, bool missing_ok, LOCKMODE lockmode);
 static void ATPrepColumnExpression(Relation rel, AlterTableCmd *cmd,
    bool recurse, bool recursing, LOCKMODE lockmode);
-static ObjectAddress ATExecColumnExpression(Relation rel, const char *colName,
+static ObjectAddress ATExecColumnExpression(AlteredTableInfo *tab, Relation rel,
+			const char *colName, Node *newDefault,
 			bool missing_ok, LOCKMODE lockmode);
 static ObjectAddress ATExecSetStatistics(Relation rel, const char *colName, int16 colNum,
 		 Node *newValue, LOCKMODE lockmode);
@@ -4851,7 +4852,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode, context);
 			pass = AT_PASS_COL_ATTRS;
 			break;
-		case AT_ColumnExpression: /* ALTER COLUMN EXPRESSION */
+		case AT_ColumnExpression:	/* ALTER COLUMN SET/DROP EXPRESSION */
 			ATSimplePermissions(cmd->subtype, rel, ATT_TABLE | ATT_FOREIGN_TABLE);
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode, context);
 			ATPrepColumnExpression(rel, cmd, recurse, recursing, lockmode);
@@ -5237,7 +5238,8 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 			address = ATExecSetAttNotNull(wqueue, rel, cmd->name, lockmode);
 			break;
 		case AT_ColumnExpression:
-			address = ATExecColumnExpression(rel, cmd->name, cmd->missing_ok, lockmode);
+			address = ATExecColumnExpression(tab, rel, cmd->name, cmd->def,
+			 cmd->missing_ok, lockmode);
 			break;
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
 			address = ATExecSetStatistics(rel, cmd->name, cmd->num, cmd->def, lockmode);
@@ -8314,16 +8316,22 @@ static void
 ATPrepColumnExpression(Relation rel, AlterTableCmd *cmd, bool recurse, bool recursing, LOCKMODE lockmode)
 {
 	/*
-	 * Reject ONLY if there are child tables.  We could implement this, but it
-	 * is a bit complicated.  GENERATED clauses must be attached to the column
-	 * definition and cannot be added later like DEFAULT, so if a child table
-	 * has a generation expression that the parent does not have, the chil

Re: Adding facility for injection points (or probe points?) for more advanced tests

2023-10-24 Thread Amul Sul
On Wed, Oct 25, 2023 at 9:43 AM Michael Paquier  wrote:

> Hi all,
>
> I don't remember how many times in the last few years when I've had to
> hack the backend to produce a test case that involves a weird race
> condition across multiple processes running in the backend, to be able
> to prove a point or just test a fix (one recent case: 2b8e5273e949).
> Usually, I come to hardcoding stuff for the following situations:
> - Trigger a PANIC, to force recovery.
> - A FATAL, to take down a session, or just an ERROR.
> - palloc() failure injection.
> - Sleep to slow down a code path.
> - Pause and release with condition variable.


+1 for the feature.

TWIMW, here[1] is an interesting talk from pgconf.in 2020 on the similar
topic.

1] https://pgconf.in/conferences/pgconfin2020/program/proposals/101

Regards,
Amul Sul


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-10-09 Thread Amul Sul
On Fri, Oct 6, 2023 at 6:03 PM Peter Eisentraut 
wrote:

> On 28.08.23 11:54, Amul Sul wrote:
> > Thanks for the review comments, I have fixed those in the attached
> > version. In
> > addition to that, extended syntax to have the STORE keyword as suggested
> by
> > Vik.
>
> An additional comment: When you change the generation expression, you
> need to run ON UPDATE triggers on all rows, if there are any triggers
> defined.  That is because someone could have triggers defined on the
> column to either check for valid values or propagate values somewhere
> else, and if the expression changes, that is kind of like an UPDATE.
>
> Similarly, I think we should consider how logical decoding should handle
> this operation.  I'd imagine it should generate UPDATE events on all
> rows.  A test case in test_decoding would be useful.
>

If I am not mistaken, the existing table rewrite facilities for ALTER TABLE
don't have support to run triggers or generate an event for each row,
right?

Do you expect to write a new code to handle this rewriting?

Regards,
Amul


Re: Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).

2023-09-22 Thread Amul Sul
On Wed, Sep 20, 2023 at 8:29 PM Alvaro Herrera 
wrote:

> On 2023-Sep-20, Amul Sul wrote:
>
> > On the latest master head, I can see a $subject bug that seems to be
> related
> > commit #b0e96f311985:
> >
> > Here is the table definition:
> > create table foo(i int, j int, CONSTRAINT pk PRIMARY KEY(i) DEFERRABLE);
>
> Interesting, thanks for the report.  Your attribution to that commit is
> correct.  The table is dumped like this:
>
> CREATE TABLE public.foo (
> i integer CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT,
> j integer
> );
> ALTER TABLE ONLY public.foo
> ADD CONSTRAINT pk PRIMARY KEY (i) DEFERRABLE;
> ALTER TABLE ONLY public.foo DROP CONSTRAINT pgdump_throwaway_notnull_0;
>
> so the problem here is that the deferrable PK is not considered a reason
> to keep attnotnull set, so we produce a throwaway constraint that we
> then drop.  This is already bogus, but what is more bogus is the fact
> that the backend accepts the DROP CONSTRAINT at all.
>
> The pg_dump failing should be easy to fix, but fixing the backend to
> error out sounds more critical.  So, the reason for this behavior is
> that RelationGetIndexList doesn't want to list an index that isn't
> marked indimmediate as a primary key.  I can easily hack around that by
> doing
>
> diff --git a/src/backend/utils/cache/relcache.c
> b/src/backend/utils/cache/relcache.c
> index 7234cb3da6..971d9c8738 100644
> --- a/src/backend/utils/cache/relcache.c
> +++ b/src/backend/utils/cache/relcache.c
> @@ -4794,7 +4794,6 @@ RelationGetIndexList(Relation relation)
>  * check them.
>  */
> if (!index->indisunique ||
> -   !index->indimmediate ||
> !heap_attisnull(htup,
> Anum_pg_index_indpred, NULL))
> continue;
>
> @@ -4821,6 +4820,9 @@ RelationGetIndexList(Relation relation)
>  relation->rd_rel->relkind ==
> RELKIND_PARTITIONED_TABLE))
> pkeyIndex = index->indexrelid;
>
> +   if (!index->indimmediate)
> +   continue;
> +
> if (!index->indisvalid)
> continue;
>
>
> But of course this is not great, since it impacts unrelated bits of code
> that are relying on relation->pkindex or RelationGetIndexAttrBitmap
> having their current behavior with non-immediate index.
>

True, but still wondering why would relation->rd_pkattr skipped for a
deferrable primary key, which seems to be a bit incorrect to me since it
sensing that relation doesn't have PK at all.  Anyway, that is unrelated.


> I think a real solution is to stop relying on RelationGetIndexAttrBitmap
> in ATExecDropNotNull().  (And, again, pg_dump needs some patch as well
> to avoid printing a throwaway NOT NULL constraint at this point.)
>

I might not have understood this, but I think, if it is ok to skip
throwaway NOT
NULL for deferrable PK then that would be enough for the reported issue
to be fixed.  I quickly tried with the attached patch which looks sufficient
to skip that, but, TBH, I haven't thought carefully about this change.

Regards,
Amul
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f7b61766921..5a4e915dc62 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -8543,7 +8543,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 	appendPQExpBufferStr(q,
 		 "LEFT JOIN pg_catalog.pg_constraint copk ON "
 		 "(copk.conrelid = src.tbloid\n"
-		 "   AND copk.contype = 'p' AND "
+		 "   AND copk.contype = 'p' AND copk.condeferrable = false AND "
 		 "copk.conkey @> array[a.attnum])\n"
 		 "WHERE a.attnum > 0::pg_catalog.int2\n"
 		 "ORDER BY a.attrelid, a.attnum");


Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).

2023-09-20 Thread Amul Sul
Hi,

On the latest master head, I can see a $subject bug that seems to be related
commit #b0e96f311985:

Here is the table definition:
create table foo(i int, j int, CONSTRAINT pk PRIMARY KEY(i) DEFERRABLE);

And after restore from the dump, it shows a descriptor where column 'i' not
marked NOT NULL:

=# \d foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 i  | integer |   |  |
 j  | integer |   |  |
Indexes:
"pk" PRIMARY KEY, btree (i) DEFERRABLE

The pg_attribute entry:

=# select attname, attnotnull from pg_attribute
where attrelid  = 'foo'::regclass and attnum > 0;

 attname | attnotnull
-+
 i   | f
 j   | f
(2 rows)

-- 
Regards,
Amul Sul
EDB: http://www.enterprisedb.com


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-09-18 Thread Amul Sul
On Thu, Sep 14, 2023 at 7:23 PM Ashutosh Bapat 
wrote:

> Hi Amul,
> I share others opinion that this feature is useful.
>
> >> On Fri, 25 Aug 2023 at 03:06, Vik Fearing 
> wrote:
> >>>
> >>>
> >>> I don't like this part of the patch at all.  Not only is the
> >>> documentation only half baked, but the entire concept of the two
> >>> commands is different.  Especially since I believe the command should
> >>> also create a generated column from a non-generated one.
> >>
> >>
> >> But I have to agree with Vik Fearing, we can make this patch better,
> should we?
> >> I totally understand your intentions to keep the code flow simple and
> reuse existing code as much
> >> as possible. But in terms of semantics of these commands, they are
> quite different from each other.
> >> And in terms of reading of the code, this makes it even harder to
> understand what is going on here.
> >> So, in my view, consider split these commands.
> >
> >
> > Ok, probably, I would work in that direction. I did the same thing that
> > SET/DROP DEFAULT does, despite semantic differences, and also, if I am
> not
> > missing anything, the code complexity should be the same as that.
>
> If we allow SET EXPRESSION to convert a non-generated column to a
> generated one, the current way of handling ONLY would yield mismatch
> between parent and child. That's not allowed as per the documentation
> [1]. In that sense not allowing SET to change the GENERATED status is
> better. I think that can be added as a V2 feature, if it overly
> complicates the patch Or at least till a point that becomes part of
> SQL standard.
>

Yes, that going to be a bit complicated including the case trying to convert
the non-generated column of a child table where need to find all the
ancestors
and siblings and make the same changes.

Regards,
Amul


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-09-14 Thread Amul Sul
On Wed, Sep 13, 2023 at 2:28 PM Maxim Orlov  wrote:

> Hi!
>
> I'm pretty much like the idea of the patch. Looks like an overlook in SQL
> standard for me.
> Anyway, patch apply with no conflicts and implements described
> functionality.
>
>
Thank you for looking at this.


> On Fri, 25 Aug 2023 at 03:06, Vik Fearing  wrote:
>
>>
>> I don't like this part of the patch at all.  Not only is the
>> documentation only half baked, but the entire concept of the two
>> commands is different.  Especially since I believe the command should
>> also create a generated column from a non-generated one.
>
>
> But I have to agree with Vik Fearing, we can make this patch better,
> should we?
> I totally understand your intentions to keep the code flow simple and reuse
> existing code as much
> as possible. But in terms of semantics of these commands, they are quite
> different from each other.
> And in terms of reading of the code, this makes it even harder to
> understand what is going on here.
> So, in my view, consider split these commands.
>

Ok, probably, I would work in that direction. I did the same thing that
SET/DROP DEFAULT does, despite semantic differences, and also, if I am not
missing anything, the code complexity should be the same as that.

Regards,
Amul


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-08-28 Thread Amul Sul
On Fri, Aug 25, 2023 at 5:35 AM Vik Fearing  wrote:

> On 8/2/23 12:35, Amul Sul wrote:
> > Hi,
> >
> > Currently, we have an option to drop the expression of stored generated
> > columns
> > as:
> >
> > ALTER [ COLUMN ] column_name DROP EXPRESSION [ IF EXISTS ]
> >
> > But don't have support to update that expression. The attached patch
> > provides
> > that as:
> >
> > ALTER [ COLUMN ] column_name SET EXPRESSION expression
>
> I love this idea.  It is something that the standard SQL language is
> lacking and I am submitting a paper to correct that based on this.  I
> will know in October what the committee thinks of it.  Thanks!
>
>
Great, thank you so much.


> > Note that this form of ALTER is meant to work for the column which is
> > already generated.
>
> Why?  SQL does not have a way to convert a non-generated column into a
> generated column, and this seems like as good a way as any.
>

Well, I had to have the same thought but Peter Eisentraut thinks that we
should
have that in a separate patch & I am fine with that.

> To keep the code flow simple, I have renamed the existing function that
> was
> > in use for DROP EXPRESSION so that it can be used for SET EXPRESSION as
> well,
> > which is a similar design as SET/DROP DEFAULT. I kept this renaming code
> > changes in a separate patch to minimize the diff in the main patch.
>
> I don't like this part of the patch at all.  Not only is the
> documentation only half baked, but the entire concept of the two
> commands is different.  Especially since I believe the command should
> also create a generated column from a non-generated one.
>

I am not sure I understood this, why would that break the documentation
even if
we allow non-generated columns to be generated. This makes the code flow
simple
and doesn't have any issue for the future extension to allow non-generated
columns too.


>
> Is is possible to compare the old and new expressions and no-op if they
> are the same?
>
>
> psql (17devel)
> Type "help" for help.
>
> postgres=# create table t (c integer generated always as (null) stored);
> CREATE TABLE
> postgres=# select relfilenode from pg_class where oid = 't'::regclass;
>   relfilenode
> -
> 16384
> (1 row)
>
> postgres=# alter table t alter column c set expression (null);
> ALTER TABLE
> postgres=# select relfilenode from pg_class where oid = 't'::regclass;
>   relfilenode
> -
> 16393
> (1 row)
>
>
> I am not saying we should make every useless case avoid rewriting the
> table, but if there are simple wins, we should take them.  (I don't know
> how feasible this is.)
>

I think that is feasible, but I am not sure if we want to do that & add an
extra
code for the case, which is not really breaking anything except making the
system do extra work for the user's thoughtless action.


>
> I think repeating the STORED keyword should be required here to
> future-proof virtual generated columns.
>

Agree, added in the v2 version posted a few minutes ago.

Regards,
Amul


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-08-28 Thread Amul Sul
On Thu, Aug 24, 2023 at 9:36 AM Vaibhav Dalvi <
vaibhav.da...@enterprisedb.com> wrote:

> Hi Amul,
>
>
> On Wed, Aug 2, 2023 at 4:06 PM Amul Sul  wrote:
>
>> Hi,
>>
>> Currently, we have an option to drop the expression of stored generated
>> columns
>> as:
>>
>> ALTER [ COLUMN ] column_name DROP EXPRESSION [ IF EXISTS ]
>>
>> But don't have support to update that expression. The attached patch
>> provides
>> that as:
>>
>> ALTER [ COLUMN ] column_name SET EXPRESSION expression
>>
>> +1 to the idea.
>

Thank you.


> 3. The AlteredTableInfo structure has member Relation, So need to pass
> parameter Relation separately?
>
>> static ObjectAddress ATExecColumnExpression(AlteredTableInfo *tab,
>> Relation rel,
>>  const char *colName, Node *newDefault,
>>  bool missing_ok, LOCKMODE lockmode);
>
>
Yeah, but I think, let it be since other AT routines have the same.

Thanks for the review comments, I have fixed those in the attached version.
In
addition to that, extended syntax to have the STORE keyword as suggested by
Vik.

Regards,
Amul


v2-0001-Prerequisite-changes-rename-functions-enum.patch
Description: Binary data


v2-0002-Allow-to-change-generated-column-expression.patch
Description: Binary data


Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-08-02 Thread Amul Sul
On Wed, Aug 2, 2023 at 9:16 PM jian he  wrote:

> On Wed, Aug 2, 2023 at 6:36 PM Amul Sul  wrote:
> >
> > Hi,
> >
> > Currently, we have an option to drop the expression of stored generated
> columns
> > as:
> >
> > ALTER [ COLUMN ] column_name DROP EXPRESSION [ IF EXISTS ]
> >
> > But don't have support to update that expression. The attached patch
> provides
> > that as:
> >
> > ALTER [ COLUMN ] column_name SET EXPRESSION expression
> >
> > Note that this form of ALTER is meant to work for the column which is
> already
> > generated. It then changes the generation expression in the catalog and
> rewrite
> > the table, using the existing table rewrite facilities for ALTER TABLE.
> > Otherwise, an error will be reported.
> >
> > To keep the code flow simple, I have renamed the existing function that
> was in
> > use for DROP EXPRESSION so that it can be used for SET EXPRESSION as
> well,
> > which is a similar design as SET/DROP DEFAULT. I kept this renaming code
> > changes in a separate patch to minimize the diff in the main patch.
> >
> > Demo:
> > -- Create table
> > CREATE TABLE t1 (x int, y int GENERATED ALWAYS AS (x * 2) STORED);
> > INSERT INTO t1 VALUES(generate_series(1,3));
> >
> > -- Check the generated data
> > SELECT * FROM t1;
> >  x | y
> > ---+---
> >  1 | 2
> >  2 | 4
> >  3 | 6
> > (3 rows)
> >
> > -- Alter the expression
> > ALTER TABLE t1 ALTER COLUMN y SET EXPRESSION (x * 4);
> >
> > -- Check the new data
> > SELECT * FROM t1;
> >  x | y
> > ---+
> >  1 |  4
> >  2 |  8
> >  3 | 12
> > (3 rows)
> >
> > Thank you.
> > --
> > Regards,
> > Amul Sul
> > EDB: http://www.enterprisedb.com
> -
> setup.
>
> BEGIN;
> set search_path = test;
> DROP TABLE if exists gtest_parent, gtest_child;
>
> CREATE TABLE gtest_parent (f1 date NOT NULL, f2 bigint, f3 bigint
> GENERATED ALWAYS AS (f2 * 2) STORED) PARTITION BY RANGE (f1);
>
> CREATE TABLE gtest_child PARTITION OF gtest_parent
>   FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');  -- inherits gen expr
>
> CREATE TABLE gtest_child2 PARTITION OF gtest_parent (
> f3 WITH OPTIONS GENERATED ALWAYS AS (f2 * 22) STORED  -- overrides gen
> expr
> ) FOR VALUES FROM ('2016-08-01') TO ('2016-09-01');
>
> CREATE TABLE gtest_child3 (f1 date NOT NULL, f2 bigint, f3 bigint
> GENERATED ALWAYS AS (f2 * 33) STORED);
> ALTER TABLE gtest_parent ATTACH PARTITION gtest_child3 FOR VALUES FROM
> ('2016-09-01') TO ('2016-10-01');
>
> INSERT INTO gtest_parent (f1, f2) VALUES ('2016-07-15', 1);
> INSERT INTO gtest_parent (f1, f2) VALUES ('2016-07-15', 2);
> INSERT INTO gtest_parent (f1, f2) VALUES ('2016-08-15', 3);
> UPDATE gtest_parent SET f1 = f1 + 60 WHERE f2 = 1;
>
> ALTER TABLE ONLY gtest_parent ALTER COLUMN f3 SET EXPRESSION (f2 * 4);
> ALTER TABLE gtest_child ALTER COLUMN f3 SET EXPRESSION (f2 * 10);
> COMMIT;
>
> set search_path = test;
> SELECT table_name, column_name, is_generated, generation_expression
> FROM information_schema.columns
> WHERE table_name  in ('gtest_child','gtest_child1',
> 'gtest_child2','gtest_child3')
> order by 1,2;
> result:
>   table_name  | column_name | is_generated | generation_expression
> --+-+--+---
>  gtest_child  | f1  | NEVER|
>  gtest_child  | f1  | NEVER|
>  gtest_child  | f2  | NEVER|
>  gtest_child  | f2  | NEVER|
>  gtest_child  | f3  | ALWAYS   | (f2 * 2)
>  gtest_child  | f3  | ALWAYS   | (f2 * 10)
>  gtest_child2 | f1  | NEVER|
>  gtest_child2 | f1  | NEVER|
>  gtest_child2 | f2  | NEVER|
>  gtest_child2 | f2  | NEVER|
>  gtest_child2 | f3  | ALWAYS   | (f2 * 22)
>  gtest_child2 | f3  | ALWAYS   | (f2 * 2)
>  gtest_child3 | f1  | NEVER|
>  gtest_child3 | f1  | NEVER|
>  gtest_child3 | f2  | NEVER|
>  gtest_child3 | f2  | NEVER|
>  gtest_child3 | f3  | ALWAYS   | (f2 * 2)
>  gtest_child3 | f3  | ALWAYS   | (f2 * 33)
> (18 rows)
>
> one partition, one column 2 generated expression. Is this the expected
> behavior?


That is not expected & acceptable. But, somehow, I am not able to reproduce
this behavior. Could you please retry this experiment by adding
"table_schema"
in your output query?

Thank you.

Regards,
Amul


ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-08-02 Thread Amul Sul
Hi,

Currently, we have an option to drop the expression of stored generated
columns
as:

ALTER [ COLUMN ] column_name DROP EXPRESSION [ IF EXISTS ]

But don't have support to update that expression. The attached patch
provides
that as:

ALTER [ COLUMN ] column_name SET EXPRESSION expression

Note that this form of ALTER is meant to work for the column which is
already
generated. It then changes the generation expression in the catalog and
rewrite
the table, using the existing table rewrite facilities for ALTER TABLE.
Otherwise, an error will be reported.

To keep the code flow simple, I have renamed the existing function that was
in
use for DROP EXPRESSION so that it can be used for SET EXPRESSION as well,
which is a similar design as SET/DROP DEFAULT. I kept this renaming code
changes in a separate patch to minimize the diff in the main patch.

Demo:
-- Create table
CREATE TABLE t1 (x int, y int GENERATED ALWAYS AS (x * 2) STORED);
INSERT INTO t1 VALUES(generate_series(1,3));

-- Check the generated data
SELECT * FROM t1;
 x | y
---+---
 1 | 2
 2 | 4
 3 | 6
(3 rows)

-- Alter the expression
ALTER TABLE t1 ALTER COLUMN y SET EXPRESSION (x * 4);

-- Check the new data
SELECT * FROM t1;
 x | y
---+
 1 |  4
 2 |  8
 3 | 12
(3 rows)

Thank you.
-- 
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
From ef1448f7852000d5b701f9e3c7fe88737670740a Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Mon, 31 Jul 2023 15:43:51 +0530
Subject: [PATCH v1 2/2] Allow to change generated column expression

---
 doc/src/sgml/ref/alter_table.sgml   |  14 +-
 src/backend/commands/tablecmds.c|  88 +
 src/backend/parser/gram.y   |  10 ++
 src/bin/psql/tab-complete.c |   2 +-
 src/test/regress/expected/generated.out | 167 
 src/test/regress/sql/generated.sql  |  36 -
 6 files changed, 262 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index d4d93eeb7c6..1b68dea8d9b 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -46,6 +46,7 @@ ALTER TABLE [ IF EXISTS ] name
 ALTER [ COLUMN ] column_name SET DEFAULT expression
 ALTER [ COLUMN ] column_name DROP DEFAULT
 ALTER [ COLUMN ] column_name { SET | DROP } NOT NULL
+ALTER [ COLUMN ] column_name SET EXPRESSION expression
 ALTER [ COLUMN ] column_name DROP EXPRESSION [ IF EXISTS ]
 ALTER [ COLUMN ] column_name ADD GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ]
 ALTER [ COLUMN ] column_name { SET GENERATED { ALWAYS | BY DEFAULT } | SET sequence_option | RESTART [ [ WITH ] restart ] } [...]
@@ -255,13 +256,18 @@ WITH ( MODULUS numeric_literal, REM
 

 
-   
+   
 DROP EXPRESSION [ IF EXISTS ]
 
  
-  This form turns a stored generated column into a normal base column.
-  Existing data in the columns is retained, but future changes will no
-  longer apply the generation expression.
+  The SET form replaces stored generated value for a
+  column.  Existing data in the columns is rewritten and all the future
+  changes will apply the new generation expression.
+ 
+ 
+  The DROP form turns a stored generated column into a
+  normal base column.  Existing data in the columns is retained, but future
+  changes will no longer apply the generation expression.
  
 
  
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3b499fc0d8e..df26afe16cb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -456,7 +456,8 @@ static ObjectAddress ATExecSetIdentity(Relation rel, const char *colName,
 static ObjectAddress ATExecDropIdentity(Relation rel, const char *colName, bool missing_ok, LOCKMODE lockmode);
 static void ATPrepColumnExpression(Relation rel, AlterTableCmd *cmd,
    bool recurse, bool recursing, LOCKMODE lockmode);
-static ObjectAddress ATExecColumnExpression(Relation rel, const char *colName,
+static ObjectAddress ATExecColumnExpression(AlteredTableInfo *tab, Relation rel,
+			const char *colName, Node *newDefault,
 			bool missing_ok, LOCKMODE lockmode);
 static ObjectAddress ATExecSetStatistics(Relation rel, const char *colName, int16 colNum,
 		 Node *newValue, LOCKMODE lockmode);
@@ -5056,7 +5057,8 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 			ATExecCheckNotNull(tab, rel, cmd->name, lockmode);
 			break;
 		case AT_ColumnExpression:
-			address = ATExecColumnExpression(rel, cmd->name, cmd->missing_ok, lockmode);
+			address = ATExecColumnExpression(tab, rel, cmd->name, cmd->def,
+			 cmd->missing_ok, lockmode);
 			break;
 		case AT_SetStatistics:	/* ALTER COLUMN SET STATISTICS */
 			address = ATExecSetStatistics(rel, cmd->name, cmd->num, cmd->def, lockmode);
@@ -8015,16 +8017,21 @@ static void
 ATPrepColumnExpression(Relation rel, Alter

Re: New PostgreSQL Contributors

2023-07-30 Thread Amul Sul
On Fri, Jul 28, 2023 at 8:59 PM Christoph Berg  wrote:

> The PostgreSQL contributors team has been looking over the community
> activity and, over the first half of this year, has been recognizing
> new contributors to be listed on
>
> https://www.postgresql.org/community/contributors/
>
> New PostgreSQL Contributors:
>
>   Ajin Cherian
>   Alexander Kukushkin
>   Alexander Lakhin
>   Dmitry Dolgov
>   Hou Zhijie
>   Ilya Kosmodemiansky
>   Melanie Plageman
>   Michael Banck
>   Michael Brewer
>   Paul Jungwirth
>   Peter Smith
>   Vigneshwaran C
>
> New PostgreSQL Major Contributors:
>
>   Julien Rouhaud
>   Stacey Haysler
>   Steve Singer
>

Many congratulations !!

Regards,
Amul


Dumping policy on a table belonging to an extension is expected?

2023-07-05 Thread Amul Sul
Hi,

I have a ROW LEVEL SECURITY policy on the table part of an extension, and
while
dumping the database where that extension is installed, dumps the policy of
that table too even though not dumpling that table .

Here is quick tests, where I have added following SQL to adminpack--1.0.sql
extension file:

CREATE TABLE public.foo (i int CHECK(i > 0));
ALTER TABLE public.foo ENABLE ROW LEVEL SECURITY;
CREATE POLICY foo_policy ON public.foo USING (true);

After installation and creation of this extension, the dump output will have
policy without that table:

--
-- Name: foo; Type: ROW SECURITY; Schema: public; Owner: amul
--

ALTER TABLE public.foo ENABLE ROW LEVEL SECURITY;

--
-- Name: foo foo_policy; Type: POLICY; Schema: public; Owner: amul
--

CREATE POLICY foo_policy ON public.foo USING (true);


I am not sure if that is expected behaviour. The code comment in
checkExtensionMembership() seems to be doing intentionally:

 * In 9.6 and above, mark the member object to have any non-initial ACL,
 * policies, and security labels dumped.

The question is why were we doing this? Shouldn't skip this policy if it is
part of the create-extension script?

Also, If you try to drop this policy, get dropped without any warning/error
unlike tables or other objects which are not allowed to drop at all.

-- 
Regards,
Amul Sul
EDB: http://www.enterprisedb.com


Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada

2023-04-21 Thread Amul Sul
Many congratulations to all.

On Thursday, April 20, 2023, Tom Lane  wrote:

> The Core Team would like to extend our congratulations to
> Nathan Bossart, Amit Langote, and Masahiko Sawada, who have
> accepted invitations to become our newest Postgres committers.
>
> Please join me in wishing them much success and few reverts.
>
> regards, tom lane
>
>
>

-- 
Regards,
Amul Sul
EDB: http://www.enterprisedb.com


remove duplicate comment.

2023-02-15 Thread Amul Sul
Hi,

The attached patch removes the comment line noting the same as the
previous paragraph of the ExecUpdateAct() prolog comment.

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index f419c47065a..645e62f4a76 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1948,9 +1948,6 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
  * caller is also in charge of doing EvalPlanQual if the tuple is found to
  * be concurrently updated.  However, in case of a cross-partition update,
  * this routine does it.
- *
- * Caller is in charge of doing EvalPlanQual as necessary, and of keeping
- * indexes current for the update.
  */
 static TM_Result
 ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,


Re: Error-safe user functions

2022-12-27 Thread Amul Sul
On Tue, Dec 27, 2022 at 11:17 PM Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > Here's a patch that covers the ltree and intarray contrib modules.
>
> I would probably have done this a little differently --- I think
> the added "res" parameters aren't really necessary for most of
> these.  But it's not worth arguing over.
>

Also, it would be good if we can pass "escontext" through the "state"
argument of makepool() like commit 78212f210114 done for makepol() of
tsquery.c. Attached patch is the updated version that does the same.

Regards,
Amul


v2-ltree-intarray-error-safe.patch
Description: Binary data


Re: Error-safe user functions

2022-12-14 Thread Amul Sul
On Thu, Dec 15, 2022 at 11:16 AM Tom Lane  wrote:
>
> Amul Sul  writes:
> > There are other a bunch of hard errors from get_multirange_io_data(),
> > get_range_io_data() and its subroutine can hit, shouldn't we care
> > about those?
>
> I think those are all "internal" errors, ie not reachable as a
> consequence of bad input data.  Do you see a reason to think
> differently?

Make sense, I was worried about the internal errors as well as an
error that the user can cause while declaring multi-range e.g. shell
type, but realized that case gets checked at creating that multi-range
type.

Regards,
Amul




Re: Error-safe user functions

2022-12-14 Thread Amul Sul
On Thu, Dec 15, 2022 at 9:03 AM Tom Lane  wrote:
>
> Here are some proposed patches for converting range_in and multirange_in.
>
> 0001 tackles the straightforward part, which is trapping syntax errors
> and called-input-function errors.  The only thing that I think might
> be controversial here is that I chose to change the signatures of
> the exposed functions range_serialize and make_range rather than
> inventing xxx_safe variants.  I think this is all right, because
> AFAIK the only likely reason for extensions to call either of those
> is that custom types' canonical functions would need to call
> range_serialize --- and those will need to be touched anyway,
> see 0002.
>
> What 0001 does not cover is trapping errors occurring in range
> canonicalize functions.  I'd first thought maybe doing that wasn't
> worth the trouble, but it's not really very hard to fix the built-in
> canonicalize functions, as shown in 0002.  Probably extensions would
> not find it much harder, and in any case they're not really required
> to make their errors soft.
>
> Any objections?
>

There are other a bunch of hard errors from get_multirange_io_data(),
get_range_io_data() and its subroutine can hit, shouldn't we care
about those?


Regards,
Amul




Re: Error-safe user functions

2022-12-13 Thread Amul Sul
On Mon, Dec 12, 2022 at 12:00 AM Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > Maybe as we work through the remaining input functions (there are about
> > 60 core candidates left on my list) we should mark them with a comment
> > if no adjustment is needed.
>
> I did a quick pass through them last night.  Assuming that we don't
> need to touch the unimplemented input functions (eg for pseudotypes),
> I count these core functions as still needing work:
>
> aclitemin
> bit_in
> box_in
> bpcharin
> byteain
> cash_in
> cidin
> cidr_in
> circle_in
> inet_in
> int2vectorin
> jsonpath_in
> line_in
> lseg_in
> macaddr8_in
> macaddr_in

Attaching patches changing these functions except bpcharin,
byteain, jsonpath_in, and cidin. I am continuing work on the next
items below:

> multirange_in
> namein
> oidin
> oidvectorin
> path_in
> pg_lsn_in
> pg_snapshot_in
> point_in
> poly_in
> range_in
> regclassin
> regcollationin
> regconfigin
> regdictionaryin
> regnamespacein
> regoperatorin
> regoperin
> regprocedurein
> regprocin
> regrolein
> regtypein
> tidin
> tsqueryin
> tsvectorin
> uuid_in
> varbit_in
> varcharin
> xid8in
> xidin
> xml_in
>
> and these contrib functions:
>
> hstore:
> hstore_in
> intarray:
> bqarr_in
> isn:
> ean13_in
> isbn_in
> ismn_in
> issn_in
> upc_in
> ltree:
> ltree_in
> lquery_in
> ltxtq_in
> seg:
> seg_in
>
> Maybe we should have a conversation about which of these are
> highest priority to get to a credible feature.  We clearly need
> to fix the remaining SQL-spec types (varchar and bpchar, mainly).
> At the other extreme, likely nobody would weep if we never fixed
> int2vectorin, for instance.
>
> I'm a little concerned about the cost-benefit of fixing the reg* types.
> The ones that accept type names actually use the core grammar to parse
> those.  Now, we probably could fix the grammar to be non-throwing, but
> it'd be very invasive and I'm not sure about the performance impact.
> It might be best to content ourselves with soft reporting of lookup
> failures, as opposed to syntax problems.
>

Regards,
Amul
From 4c4c18bd8104114351ca58a73a9d92fbb0c85dd2 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Tue, 13 Dec 2022 17:29:04 +0530
Subject: [PATCH v1 13/14] Change macaddr8_in to allow non-throw error
 reporting

---
 src/backend/utils/adt/mac8.c   | 36 +-
 src/test/regress/expected/macaddr8.out | 25 ++
 src/test/regress/sql/macaddr8.sql  |  6 +
 3 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/src/backend/utils/adt/mac8.c b/src/backend/utils/adt/mac8.c
index 24d219f6386..adb253a6ac0 100644
--- a/src/backend/utils/adt/mac8.c
+++ b/src/backend/utils/adt/mac8.c
@@ -35,7 +35,8 @@
 #define lobits(addr) \
   ((unsigned long)(((addr)->e<<24) | ((addr)->f<<16) | ((addr)->g<<8) | ((addr)->h)))
 
-static unsigned char hex2_to_uchar(const unsigned char *ptr, const unsigned char *str);
+static unsigned char hex2_to_uchar(const unsigned char *ptr, const unsigned char *str,
+   Node *escontext);
 
 static const signed char hexlookup[128] = {
 	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -58,7 +59,8 @@ static const signed char hexlookup[128] = {
  * the entire string, which is used only for error reporting.
  */
 static inline unsigned char
-hex2_to_uchar(const unsigned char *ptr, const unsigned char *str)
+hex2_to_uchar(const unsigned char *ptr, const unsigned char *str,
+			  Node *escontext)
 {
 	unsigned char ret = 0;
 	signed char lookup;
@@ -88,13 +90,10 @@ hex2_to_uchar(const unsigned char *ptr, const unsigned char *str)
 	return ret;
 
 invalid_input:
-	ereport(ERROR,
+	ereturn(escontext, 0,
 			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 			 errmsg("invalid input syntax for type %s: \"%s\"", "macaddr8",
 	str)));
-
-	/* We do not actually reach here */
-	return 0;
 }
 
 /*
@@ -104,6 +103,7 @@ Datum
 macaddr8_in(PG_FUNCTION_ARGS)
 {
 	const unsigned char *str = (unsigned char *) PG_GETARG_CSTRING(0);
+	Node	   *escontext = fcinfo->context;
 	const unsigned char *ptr = str;
 	macaddr8   *result;
 	unsigned char a = 0,
@@ -136,32 +136,32 @@ macaddr8_in(PG_FUNCTION_ARGS)
 		switch (count)
 		{
 			case 1:
-a = hex2_to_uchar(ptr, str);
+a = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 2:
-b = hex2_to_uchar(ptr, str);
+b = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 3:
-c = hex2_to_uchar(ptr, str);
+c = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 4:
-d = hex2_to_uchar(ptr, str);
+d = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 5:
-e = hex2_to_uchar(ptr, str);
+

Re: Error-safe user functions

2022-12-09 Thread Amul Sul
On Fri, Dec 9, 2022 at 9:08 PM Andrew Dunstan  wrote:
>
>
> On 2022-12-09 Fr 10:16, Tom Lane wrote:
> > Andrew Dunstan  writes:
> >> On 2022-12-08 Th 21:59, Tom Lane wrote:
> >>> Yeah, I was planning to take a look at that before walking away from
> >>> this stuff.  (I'm sure not volunteering to convert ALL the input
> >>> functions, but I'll do the datetime code.)
> >> Awesome. Perhaps if there are no more comments you can commit what you
> >> currently have so people can start work on other input functions.
> > Pushed.
>
>
> Great!
>
>
> > As I said, I'll take a look at the datetime area.  Do we
> > have any volunteers for other input functions?
> >
> >
>
>
> I am currently looking at the json types. I think that will be enough to
> let us rework the sql/json patches as discussed a couple of months ago.
>

I will pick a few other input functions, thanks.

Regards,
Amul




Re: Tables not getting vacuumed in postgres

2022-11-08 Thread Amul Sul
On Tue, Nov 8, 2022 at 6:11 PM Karthik Jagadish (kjagadis)
 wrote:
>
> Hi,
>
>
>
> Thanks for the response.
>
>
>
> But what I understand that insert update and delete would still work and will 
> not interfere with vacuuming process. Yes we do perform a lot of updates on 
> that particular table which is not vacuuming. Does it mean that it waiting 
> for the lock to be released?
>

Well, yes, that won't interfere but the primary job of autovacuum is
to remove the bloat, if the dead tuple(s) is visible to any
transaction, then not going to remove that.


>
>
> Regards,
>
> Karthik
>
>
>
> From: Amul Sul 
> Date: Tuesday, 8 November 2022 at 5:38 PM
> To: Karthik Jagadish (kjagadis) 
> Cc: pgsql-hack...@postgresql.org , Prasanna 
> Satyanarayanan (prassaty) , Chandruganth Ayyavoo Selvam 
> (chaayyav) , Jaganbabu M (jmunusam) 
> Subject: Re: Tables not getting vacuumed in postgres
>
> On Tue, Nov 8, 2022 at 5:00 PM Karthik Jagadish (kjagadis)
>  wrote:
> >
> > Hi,
> >
> > We have a NMS application where we are using postgres as database, what we 
> > are noticing is that vacuuming is not happening for certain tables for 2-3 
> > days and eventually the table bloats and disk space is running out.
> >
> > What could be the reason for auto vacuuming not happening for certain 
> > tables?
> >
>
> Check if there is any long-running or prepared transaction.
>
> Regards,
> Amul




Re: Tables not getting vacuumed in postgres

2022-11-08 Thread Amul Sul
On Tue, Nov 8, 2022 at 5:00 PM Karthik Jagadish (kjagadis)
 wrote:
>
> Hi,
>
> We have a NMS application where we are using postgres as database, what we 
> are noticing is that vacuuming is not happening for certain tables for 2-3 
> days and eventually the table bloats and disk space is running out.
>
> What could be the reason for auto vacuuming not happening for certain tables?
>

Check if there is any long-running or prepared transaction.

Regards,
Amul




Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

2022-10-27 Thread Amul Sul
On Fri, Oct 28, 2022 at 10:43 AM Tom Lane  wrote:
>
> David Rowley  writes:
> > On Fri, 28 Oct 2022 at 16:51, Amul Sul  wrote:
> >> If we
> >> are too sure that the output usually comes in the same order then the
> >> ORDER BY clause that exists in other tests seems useless. I am a bit
> >> confused & what could be a possible bug?
>
> > You can't claim that if this test shouldn't get an ORDER BY that all
> > tests shouldn't have an ORDER BY. That's just crazy. What if the test
> > is doing something like testing sort?!
>
> The general policy is that we'll add ORDER BY when a test is demonstrated
> to have unstable output order for identifiable environmental reasons
> (e.g. locale dependency) or timing reasons (e.g. background autovacuum
> sometimes changing statistics).  But the key word there is "identifiable".
> Without some evidence as to what's causing this, it remains possible
> that it's a code bug not the fault of the test case.
>
> regress.sgml explains the policy further:
>
>   You might wonder why we don't order all the regression test queries 
> explicitly
>   to get rid of this issue once and for all.  The reason is that that would
>   make the regression tests less useful, not more, since they'd tend
>   to exercise query plan types that produce ordered results to the
>   exclusion of those that don't.
>

Understood. Thanks for the clarification.

Regards,
Amul




Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

2022-10-27 Thread Amul Sul
On Fri, Oct 28, 2022 at 10:28 AM David Rowley  wrote:
>
> On Fri, 28 Oct 2022 at 16:51, Amul Sul  wrote:
> >
> > On Thu, Oct 27, 2022 at 6:54 PM Tom Lane  wrote:
> > > Please be specific about the circumstances in which the output is
> > > unstable for you.  With zero information to go on, it seems about as
> > > likely that this change is masking a bug as that it's a good idea.
> > >
> >
> > At the first glance, I thought the patch is pretty much obvious, and
> > we usually add an ORDER BY clause to ensure stable output.
>
> Unfortunately, you'll need to do better than that. We're not in the
> business of accepting patches with zero justification for why they're
> required. If you're not willing to do the analysis on why the order
> changes sometimes, why should we accept your patch?
>

Unfortunately the test is not failing at me. Otherwise, I would have
done that analysis. When I saw the patch for the first time, somehow,
I didn't think anything spurious due to my misconception that we
usually add the ORDER BY clause for the select queries just to be
sure.

> If you can't find the problem then you should modify insert.sql to
> EXPLAIN the problem query to see if the plan has changed between the
> passing and failing run. The only thing that comes to mind about why
> this test might produce rows in a different order would be if a
> parallel Append was sorting the subpaths by cost (See
> create_append_path's call to list_sort) and the costs were for some
> reason coming out differently sometimes. It's hard to imagine why this
> query would be parallelised though. If you show us the EXPLAIN from a
> passing and failing run, it might help us see the problem.
>

Understood.

> > If we
> > are too sure that the output usually comes in the same order then the
> > ORDER BY clause that exists in other tests seems useless. I am a bit
> > confused & what could be a possible bug?
>
> You can't claim that if this test shouldn't get an ORDER BY that all
> tests shouldn't have an ORDER BY. That's just crazy. What if the test
> is doing something like testing sort?!
>

That I can understand that the sorted output doesn't need further
sorting. I am just referring to the simple SELECT queries that do not
have any sorting.

Thanks & Regards,
Amul




Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

2022-10-27 Thread Amul Sul
On Thu, Oct 27, 2022 at 6:54 PM Tom Lane  wrote:
>
> Nishant Sharma  writes:
> > We would like to share a proposal of a patch, where we have added order by
> > clause in two select statements in src/test/regress/sql/insert.sql file and
> > respective changes in src/test/regress/expected/insert.out output file.
>
> > This would help in generating output in consistent sequence, as sometimes
> > we have observed change in sequence in output.
>
> Please be specific about the circumstances in which the output is
> unstable for you.  With zero information to go on, it seems about as
> likely that this change is masking a bug as that it's a good idea.
>

At the first glance, I thought the patch is pretty much obvious, and
we usually add an ORDER BY clause to ensure stable output.   If we
are too sure that the output usually comes in the same order then the
ORDER BY clause that exists in other tests seems useless. I am a bit
confused & what could be a possible bug?

I have tested on my Centos and the Mac OS, insert.sql test is giving
stable output, I didn't find failure in the subsequent runs too but I
am not sure if that is enough evidence to skip the ORDER BY clause.

Regards,
Amul




Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

2022-10-18 Thread Amul Sul
On Tue, Oct 18, 2022 at 12:01 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> In standby mode, the state machine in WaitForWALToBecomeAvailable()
> reads WAL from pg_wal after failing to read from the archive. This is
> currently implemented in XLogFileReadAnyTLI() by calling
> XLogFileRead() with source XLOG_FROM_PG_WAL after it fails with source
> XLOG_FROM_PG_ARCHIVE and the current source isn't changed at all.
> Also, passing the source to XLogFileReadAnyTLI() in
> WaitForWALToBecomeAvailable() isn't straight i.e. it's not necessary
> to pass in XLOG_FROM_ANY at all. These things make the state machine a
> bit complicated and hard to understand.
>
> The attached patch attempts to simplify the code a bit by changing the
> current source to XLOG_FROM_PG_WAL after failing in
> XLOG_FROM_PG_ARCHIVE so that the state machine can move smoothly to
> read from pg_wal. And we can just pass the current source to
> XLogFileReadAnyTLI(). It also enables us to reduce a bit of extra
> XLogFileRead() code in XLogFileReadAnyTLI().
>
> Thoughts?

+1

Regards,
Amul




Re: Convert macros to static inline functions

2022-10-04 Thread Amul Sul
On Tue, Oct 4, 2022 at 12:00 PM Peter Eisentraut
 wrote:
>
> On 16.05.22 10:27, Peter Eisentraut wrote:
> > Inspired by [0], I looked to convert more macros to inline functions.
>
> Here is another one from the same batch of work that I somehow didn't
> send in last time.
>
I think assertion can be placed outside of the IF-block and braces can
be removed.

> (IMO it's questionable whether this one should be an inline function or
> macro at all, rather than a normal external function.)
IMO, it should be inlined with RelationGetSmgr().

Regards,
Amul




Re: making relfilenodes 56 bits

2022-09-20 Thread Amul Sul
On Fri, Sep 9, 2022 at 3:32 PM Dilip Kumar  wrote:
>
> On Thu, Sep 8, 2022 at 4:10 PM Dilip Kumar  wrote:
>
> > On a separate note, while reviewing the latest patch I see there is some 
> > risk of using the unflushed relfilenumber in GetNewRelFileNumber() 
> > function.  Basically, in the current code, the flushing logic is tightly 
> > coupled with the logging new relfilenumber logic and that might not work 
> > with all the values of the VAR_RELNUMBER_NEW_XLOG_THRESHOLD.  So the idea 
> > is we need to keep the flushing logic separate from the logging, I am 
> > working on the idea and I will post the patch soon.
>
> I have fixed the issue, so now we will track nextRelFileNumber,
> loggedRelFileNumber and flushedRelFileNumber.  So whenever
> nextRelFileNumber is just VAR_RELNUMBER_NEW_XLOG_THRESHOLD behind the
> loggedRelFileNumber we will log VAR_RELNUMBER_PER_XLOG more
> relfilenumbers.  And whenever nextRelFileNumber reaches the
> flushedRelFileNumber then we will do XlogFlush for WAL upto the last
> loggedRelFileNumber.  Ideally flushedRelFileNumber should always be
> VAR_RELNUMBER_PER_XLOG number behind the loggedRelFileNumber so we can
> avoid tracking the flushedRelFileNumber.  But I feel keeping track of
> the flushedRelFileNumber looks cleaner and easier to understand.  For
> more details refer to the code in GetNewRelFileNumber().
>

Here are a few minor suggestions I came across while reading this
patch, might be useful:

+#ifdef USE_ASSERT_CHECKING
+
+   {

Unnecessary space after USE_ASSERT_CHECKING.
--

+   return InvalidRelFileNumber;/* placate compiler */

I don't think we needed this after the error on the latest branches.
--

+   LWLockAcquire(RelFileNumberGenLock, LW_SHARED);
+   if (shutdown)
+   checkPoint.nextRelFileNumber = ShmemVariableCache->nextRelFileNumber;
+   else
+   checkPoint.nextRelFileNumber = ShmemVariableCache->loggedRelFileNumber;
+
+   LWLockRelease(RelFileNumberGenLock);

This is done for the good reason, I think, it should have a comment
describing different checkPoint.nextRelFileNumber assignment
need and crash recovery perspective.
--

+#define SizeOfRelFileLocatorBackend \
+   (offsetof(RelFileLocatorBackend, backend) + sizeof(BackendId))

Can append empty parenthesis "()" to the macro name, to look like a
function call at use or change the macro name to uppercase?
--

 +   if (val < 0 || val > MAX_RELFILENUMBER)
..
 if ((relfilenumber) < 0 || (relfilenumber) > MAX_RELFILENUMBER) \

How about adding a macro for this condition as RelFileNumberIsValid()?
We can replace all the checks referring to MAX_RELFILENUMBER with this.

Regards,
Amul




Re: tweak to a few index tests to hits ambuildempty() routine.

2022-09-14 Thread Amul Sul
On Wed, Sep 14, 2022 at 12:16 PM  wrote:
>
> I still wonder, if assert doesn't catch why that place is marked as
> covered here?
> https://coverage.postgresql.org/src/backend/access/gin/ginvacuum.c.gcov.html
>

Probably other tests cover that.

Regards,
Amul




Re: Refactoring postgres_fdw/connection.c

2022-08-03 Thread Amul Sul
;changing_xact_state = false;
 }
 
+/*
+ * Commit all remote transactions or subtransactions during pre-commit.
+ *
+ * If parallel_commit is enabled at this connection cache entry and
+ * the result of "sql" needs to be gotten later, return true and append
+ * this entry to "pending_entries".
+ *
+ * "toplevel" should be set to true if toplevel (main) transaction is
+ * committed, false otherwise.
+ */
+static bool
+pgfdw_exec_pre_commit(ConnCacheEntry *entry, const char *sql,
+	  List **pending_entries, bool toplevel)
+{
+	PGresult   *res;
+
+	/*
+	 * If abort cleanup previously failed for this connection, we can't issue
+	 * any more commands against it.
+	 */
+	pgfdw_reject_incomplete_xact_state_change(entry);
+
+	entry->changing_xact_state = true;
+	if (entry->parallel_commit)
+	{
+		do_sql_command_begin(entry->conn, sql);
+		*pending_entries = lappend(*pending_entries, entry);
+		return true;
+	}
+	do_sql_command(entry->conn, sql);
+	entry->changing_xact_state = false;
+
+	if (!toplevel)
+		return false;
+
+	/*
+	 * If there were any errors in subtransactions, and we made prepared
+	 * statements, do a DEALLOCATE ALL to make sure we get rid of all prepared
+	 * statements. This is annoying and not terribly bulletproof, but it's
+	 * probably not worth trying harder.
+	 *
+	 * DEALLOCATE ALL only exists in 8.3 and later, so this constrains how old
+	 * a server postgres_fdw can communicate with.  We intentionally ignore
+	 * errors in the DEALLOCATE, so that we can hobble along to some extent
+	 * with older servers (leaking prepared statements as we go; but we don't
+	 * really support update operations pre-8.3 anyway).
+	 */
+	if (entry->have_prep_stmt && entry->have_error)
+	{
+		res = PQexec(entry->conn, "DEALLOCATE ALL");
+		PQclear(res);
+	}
+	entry->have_prep_stmt = false;
+	entry->have_error = false;
+
+	return false;
+}
+
 /*
  * Finish pre-commit cleanup of connections on each of which we've sent a
  * COMMIT command to the remote server.
-- 
2.18.0

From f8aaf91d0fdb88c6726c45f924d8dea19bf98cff Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Wed, 3 Aug 2022 10:17:09 -0400
Subject: [PATCH v2 4/4] TRIAL: cleanup xact callback - WIP

---
 contrib/postgres_fdw/connection.c | 420 +++---
 1 file changed, 213 insertions(+), 207 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 6e23046ad69..181d2f83e32 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -110,9 +110,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 	 bool ignore_errors);
 static bool pgfdw_get_result_timed(PGconn *conn, TimestampTz endtime,
    PGresult **result, bool *timed_out);
-static void pgfdw_abort_cleanup(ConnCacheEntry *entry, bool toplevel);
-static bool pgfdw_exec_pre_commit(ConnCacheEntry *entry, const char *sql,
-  List **pending_entries, bool toplevel);
+static void pgfdw_abort_cleanup(bool toplevel);
+static void pgfdw_exec_pre_commit(bool toplevel);
 static void pgfdw_finish_pre_commit(List *pending_entries, const char *sql,
 	bool toplevel);
 static bool UserMappingPasswordRequired(UserMapping *user);
@@ -880,80 +879,47 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 static void
 pgfdw_xact_callback(XactEvent event, void *arg)
 {
-	HASH_SEQ_STATUS scan;
-	ConnCacheEntry *entry;
-	List	   *pending_entries = NIL;
-
 	/* Quick exit if no connections were touched in this transaction. */
 	if (!xact_got_connection)
 		return;
 
-	/*
-	 * Scan all connection cache entries to find open remote transactions, and
-	 * close them.
-	 */
-	hash_seq_init(, ConnectionHash);
-	while ((entry = (ConnCacheEntry *) hash_seq_search()))
+	switch (event)
 	{
-		/* Ignore cache entry if no open connection right now */
-		if (entry->conn == NULL)
-			continue;
-
-		/* If it has an open remote transaction, try to close it */
-		if (entry->xact_depth > 0)
-		{
-			elog(DEBUG3, "closing remote transaction on connection %p",
- entry->conn);
+		case XACT_EVENT_PARALLEL_PRE_COMMIT:
+		case XACT_EVENT_PRE_COMMIT:
 
-			switch (event)
-			{
-case XACT_EVENT_PARALLEL_PRE_COMMIT:
-case XACT_EVENT_PRE_COMMIT:
-
-	/* Commit all remote transactions during pre-commit */
-	if (pgfdw_exec_pre_commit(entry, "COMMIT TRANSACTION",
-			  _entries, true))
-		continue;
-	break;
-case XACT_EVENT_PRE_PREPARE:
-
-	/*
-	 * We disallow any remote transactions, since it's not
-	 * very reasonable to hold them open until the prepared
-	 * transaction is committed.  For the moment, throw error
-	 * unconditionally; later we might allow read-only cases.
-	 * Note that the error will cause us to come right back
-	 * here with event == XACT_EVENT_ABORT, so we'll clean up
-	 * the connection state at that poi

Re: [Patch] ALTER SYSTEM READ ONLY

2022-07-27 Thread Amul Sul
Hi,

On Thu, Jul 28, 2022 at 4:05 AM Jacob Champion  wrote:
>
> On Fri, Apr 8, 2022 at 7:27 AM Amul Sul  wrote:
> > Attached is rebase version for the latest maste head(#891624f0ec).
>
> Hi Amul,
>
> I'm going through past CF triage emails today; I noticed that this
> patch dropped out of the commitfest when you withdrew it in January,
> but it hasn't been added back with the most recent patchset you
> posted. Was that intended, or did you want to re-register it for
> review?
>

Yes, there is a plan to re-register it again but not anytime soon,
once we start to rework the design.

Regards,
Amul




Re: making relfilenodes 56 bits

2022-07-25 Thread Amul Sul
On Fri, Jul 22, 2022 at 4:21 PM vignesh C  wrote:
>
> On Wed, Jul 20, 2022 at 4:57 PM Dilip Kumar  wrote:
> >
> > On Mon, Jul 18, 2022 at 4:51 PM Dilip Kumar  wrote:
> > >
> > > I was doing some more testing by setting the FirstNormalRelFileNumber
> > > to a high value(more than 32 bits) I have noticed a couple of problems
> > > there e.g. relpath is still using OIDCHARS macro which says max
> > > relfilenumber file name can be only 10 character long which is no
> > > longer true.  So there we need to change this value to 20 and also
> > > need to carefully rename the macros and other variable names used for
> > > this purpose.
> > >
> > > Similarly there was some issue in macro in buf_internal.h while
> > > fetching the relfilenumber.  So I will relook into all those issues
> > > and repost the patch soon.
> >
> > I have fixed these existing issues and there was also some issue in
> > pg_dump.c which was creating problems in upgrading to the same version
> > while using a higher range of the relfilenumber.
> >
> > There was also an issue where the user table from the old cluster's
> > relfilenode could conflict with the system table of the new cluster.
> > As a solution currently for system table object (while creating
> > storage first time) we are keeping the low range of relfilenumber,
> > basically we are using the same relfilenumber as OID so that during
> > upgrade the normal user table from the old cluster will not conflict
> > with the system tables in the new cluster.  But with this solution
> > Robert told me (in off list chat) a problem that in future if we want
> > to make relfilenumber completely unique within a cluster by
> > implementing the CREATEDB differently then we can not do that as we
> > have created fixed relfilenodes for the system tables.
> >
> > I am not sure what exactly we can do to avoid that because even if we
> > do something  to avoid that in the new cluster the old cluster might
> > be already using the non-unique relfilenode so after upgrading the new
> > cluster will also get those non-unique relfilenode.
>
> Thanks for the patch, my comments from the initial review:
> 1) Since we have changed the macros to inline functions, should we
> change the function names similar to the other inline functions in the
> same file like: ClearBufferTag, InitBufferTag & BufferTagsEqual:
> -#define BUFFERTAGS_EQUAL(a,b) \
> -( \
> -   RelFileLocatorEquals((a).rlocator, (b).rlocator) && \
> -   (a).blockNum == (b).blockNum && \
> -   (a).forkNum == (b).forkNum \
> -)
> +static inline void
> +CLEAR_BUFFERTAG(BufferTag *tag)
> +{
> +   tag->rlocator.spcOid = InvalidOid;
> +   tag->rlocator.dbOid = InvalidOid;
> +   tag->rlocator.relNumber = InvalidRelFileNumber;
> +   tag->forkNum = InvalidForkNumber;
> +   tag->blockNum = InvalidBlockNumber;
> +}
>
> 2) We could move this macros along with the other macros at the top of the 
> file:
> +/*
> + * The freeNext field is either the index of the next freelist entry,
> + * or one of these special values:
> + */
> +#define FREENEXT_END_OF_LIST   (-1)
> +#define FREENEXT_NOT_IN_LIST   (-2)
>
> 3) typo thn should be then:
> + * can raise it as necessary if we end up with more mapped relations. For
> + * now, we just pick a round number that is modestly larger thn the expected
> + * number of mappings.
> + */
>

Few more typos in 0004 patch as well:

the a value
interger
previosly
currenly

> 4) There is one whitespace issue:
> git am v10-0004-Widen-relfilenumber-from-32-bits-to-56-bits.patch
> Applying: Widen relfilenumber from 32 bits to 56 bits
> .git/rebase-apply/patch:1500: space before tab in indent.
> (relfilenumber; \
> warning: 1 line adds whitespace errors.
>
> Regards,
> Vignesh
>

Regards,
Amul




Re: GetStandbyFlushRecPtr() : OUT param is not optional like elsewhere.

2022-07-20 Thread Amul Sul
Thanks Aleksander and Álvaro for your inputs.

I understand this change is not making any improvement to the current
code. I was a bit concerned regarding the design and consistency of
the function that exists for the server in recovery and for the server
that is not in recovery.  I was trying to write following snippet
where I am interested only for XLogRecPtr:

recPtr = RecoveryInProgress() ? GetStandbyFlushRecPtr(NULL) :
GetFlushRecPtr(NULL);

But I can't write this since I have to pass an argument for
GetStandbyFlushRecPtr() but that is not compulsory for
GetFlushRecPtr().

I agree to reject proposed changes since that is not useful
immediately and have no rule for the optional argument for the
similar-looking function.

Regards,
Amul




GetStandbyFlushRecPtr() : OUT param is not optional like elsewhere.

2022-07-20 Thread Amul Sul
Hi,

If you look at GetFlushRecPtr() function the OUT parameter for
TimeLineID is optional and this is not only one, see
GetWalRcvFlushRecPtr(), GetXLogReplayRecPtr(), etc.

I think we have missed that for GetStandbyFlushRecPtr(), to be
inlined, we should change this as follow:

--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -3156,7 +3156,8 @@ GetStandbyFlushRecPtr(TimeLineID *tli)
receivePtr = GetWalRcvFlushRecPtr(NULL, );
replayPtr = GetXLogReplayRecPtr();

-   *tli = replayTLI;
+   if (tli)
+   *tli = replayTLI;

Thoughts?
--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com




Re: PostgreSQL Limits: maximum number of columns in SELECT result

2022-05-31 Thread Amul Sul
On Tue, May 31, 2022 at 12:46 PM Vladimir Sitnikov
 wrote:
>
> Hi,
>
> Today I hit "ERROR: target lists can have at most 1664 entries", and I was 
> surprised the limit was not documented.
>
> I suggest that the limit of "1664 columns per tuple" (or whatever is the 
> right term) should be added
> to the list at https://www.postgresql.org/docs/current/limits.html e.g. after 
> "columns per table".
>

Rather, I think the "columns per table" limit needs to be updated to 1664.

Regards,
Amul




Re: Convert macros to static inline functions

2022-05-16 Thread Amul Sul
On Mon, May 16, 2022 at 1:58 PM Peter Eisentraut
 wrote:
>
>
> Inspired by [0], I looked to convert more macros to inline functions.
> The attached patches are organized "bottom up" in terms of their API
> layering; some of the later ones depend on some of the earlier ones.
>

All the patches look good to me, except the following that are minor
things that can be ignored if you want.

0002 patch:

+static inline OffsetNumber
+PageGetMaxOffsetNumber(Page page)
+{
+   if (((PageHeader) page)->pd_lower <= SizeOfPageHeaderData)
+   return 0;
+   else
+   return PageHeader) page)->pd_lower - SizeOfPageHeaderData)
/ sizeof(ItemIdData));
+}

The "else" is not necessary, we can have the return statement directly
which would save some indentation as well. The Similar pattern can be
considered for 0004 and 0007 patches as well.
--

0004 patch:

+static inline void
+XLogFromFileName(const char *fname, TimeLineID *tli, XLogSegNo
*logSegNo, int wal_segsz_bytes)
+{
+   uint32  log;
+   uint32  seg;
+   sscanf(fname, "%08X%08X%08X", tli, , );
+   *logSegNo = (uint64) log * XLogSegmentsPerXLogId(wal_segsz_bytes) + seg;
+}

Can we have a blank line after variable declarations that we usually have?
--

0006 patch:
+static inline Datum
+fetch_att(const void *T, bool attbyval, int attlen)
+{
+   if (attbyval)
+   {
+#if SIZEOF_DATUM == 8
+   if (attlen == sizeof(Datum))
+   return *((const Datum *) T);
+   else
+#endif

Can we have a switch case like store_att_byval() instead of if-else,
code would look more symmetric, IMO.

Regards,
Amul




Re: Correct comment in ProcedureCreate() for pgstat_create_function() call.

2022-05-12 Thread Amul Sul
Sorry, hit the send button too early :|

Attached here !!

On Fri, May 13, 2022 at 10:20 AM Amul Sul  wrote:
>
> Hi,
>
> PFA, attached patch to $SUBJECT.
>
> --
> Regards,
> Amul Sul
> EDB: http://www.enterprisedb.com


code_comment.patch
Description: Binary data


Correct comment in ProcedureCreate() for pgstat_create_function() call.

2022-05-12 Thread Amul Sul
Hi,

PFA, attached patch to $SUBJECT.

-- 
Regards,
Amul Sul
EDB: http://www.enterprisedb.com




Re: Make relfile tombstone files conditional on WAL level

2022-05-12 Thread Amul Sul
Hi Dilip,

On Fri, Mar 4, 2022 at 11:07 AM Dilip Kumar  wrote:
>
> On Mon, Feb 21, 2022 at 1:21 PM Dilip Kumar  wrote:
> >
> > On Thu, Jan 6, 2022 at 1:43 PM Dilip Kumar  wrote:
> >
> >  2) GetNewRelFileNode() will not loop for checking the file existence
> > > and retry with other relfilenode.
> >
> >
> > Open Issues- there are currently 2 open issues in the patch 1) Issue
> > as discussed above about removing the loop, so currently in this patch
> > the loop is removed.  2) During upgrade from the previous version we
> > need to advance the nextrelfilenode to the current relfilenode we are
> > setting for the object in order to avoid the conflict.
>
>
> In this version I have fixed both of these issues.  Thanks Robert for
> suggesting the solution for both of these problems in our offlist
> discussion.  Basically, for the first problem we can flush the xlog
> immediately because we are actually logging the WAL every time after
> we allocate 64 relfilenode so this should not have much impact on the
> performance and I have added the same in the comments.  And during
> pg_upgrade, whenever we are assigning the relfilenode as part of the
> upgrade we will set that relfilenode + 1 as nextRelFileNode to be
> assigned so that we never generate the conflicting relfilenode.
>
> The only part I do not like in the patch is that before this patch we
> could directly access the buftag->rnode.  But since now we are not
> having directly relfilenode as part of the buffertag and instead of
> that we are keeping individual fields (i.e. dbOid, tbsOid and relNode)
> in the buffer tag.  So if we have to directly get the relfilenode we
> need to generate it.  However those changes are very limited to just 1
> or 2 file so maybe not that bad.
>

v5 patch needs a rebase and here are a few comments for 0002, I found
while reading that, hope that helps:

+/* Number of RelFileNode to prefetch (preallocate) per XLOG write */
+#define VAR_RFN_PREFETCH   8192
+

Should it be 64, as per comment in XLogPutNextRelFileNode for XLogFlush() ?
---

+   /*
+* Check for the wraparound for the relnode counter.
+*
+* XXX Actually the relnode is 56 bits wide so we don't need to worry about
+* the wraparound case.
+*/
+   if (ShmemVariableCache->nextRelNode > MAX_RELFILENODE)

Very rare case, should use unlikely()?
---

+/*
+ * Max value of the relfilnode.  Relfilenode will be of 56bits wide for more
+ * details refer comments atop BufferTag.
+ */
+#define MAX_RELFILENODEuint64) 1) << 56) - 1)

Should there be 57-bit shifts here? Instead, I think we should use
INT64CONST(0xFF) to be consistent with PG_*_MAX
declarations, thoughts?
---

+   /* If we run out of logged for use RelNode then we must log more */
+   if (ShmemVariableCache->relnodecount == 0)

Might relnodecount never go below, but just to be safer should check
<= 0 instead.
---

Few typos:
Simmialr
Simmilar
agains
idealy

Regards,
Amul




Re: Proposal for internal Numeric to Uint64 conversion function.

2022-05-04 Thread Amul Sul
On Tue, May 3, 2022 at 8:04 PM Peter Eisentraut
 wrote:
>
> On 03.05.22 08:50, Amul Sul wrote:
> >> Do you have any data that supports removing DirectionFunctionCall()
> >> invocations?  I suppose some performance benefit could be expected, or
> >> what do you have in mind?
> >>
> > Not really, the suggestion to avoid DirectionFunctionCall() is from Tom.
> >
> > For a trial, I have called the money(numeric) function 10M times to
> > see the difference with and without patch but there is not much.
> > (I don't have any knowledge of micro profiling/benchmarking).
>
> Ok.  I have lost track of what you are trying to achieve with this patch
> set.  It's apparently not for performance, and in terms of refactoring
> you end up with more lines of code than before, so that doesn't seem too
> appealing either.  So I'm not sure what the end goal is.
>

Well, that is still worth it, IMHO.

Refactoring allows us to place numeric_pg_lsn to the correct file
where it should be. This refactoring, giving a function that converts
numeric to uint64. The same function is used in other routines of
pg_lsn.c which is otherwise using DirectFunctionCall().

Refactoring of numeric_int8() giving a function to convert numeric
to int64 which can be used at the many places without using
DirectFuctionCall(), and that is not the sole reason of it -- we have
a function that converts int64 to numeric but for the opposite
conversion needs to use DirectFuctionCall() which doesn't make
sense.

Along with this refactoring there are different routing calling
functions for the numeric arithmetic operation through
DirectFunctionCall() which isn't necessary since the required
arithmetic functions are already accessible.

The last benefit that is not completely worthless is avoiding
DirectFunctionCall() one less function call stack and one less work
that the system needs to do the same task.

Regards,
Amul




Re: Proposal for internal Numeric to Uint64 conversion function.

2022-05-03 Thread Amul Sul
On Mon, May 2, 2022 at 8:23 PM Peter Eisentraut
 wrote:
>
> On 22.04.22 14:26, Amul Sul wrote:
> > Yes, I think we can do cleanup to some extent.  Attaching the
> > following patches that first intend to remove DirectFunctionCall as
> > much as possible:
>
> Do you have any data that supports removing DirectionFunctionCall()
> invocations?  I suppose some performance benefit could be expected, or
> what do you have in mind?
>

Not really, the suggestion to avoid DirectionFunctionCall() is from Tom.

For a trial, I have called the money(numeric) function 10M times to
see the difference with and without patch but there is not much.
(I don't have any knowledge of micro profiling/benchmarking).

Next I have looked in the profile stack for numeric_mul_opt_error()
execution which is called by the numeric_cash() via
DirectionFunctionCall() to numeric_mul() see this:

0.99%  postgres  postgres[.] numeric_mul_opt_error
|
--- numeric_mul_opt_error
   |
   |--94.86%-- numeric_mul
   |  DirectFunctionCall2Coll
   |  numeric_cash
   |  ExecInterpExpr
   |  ExecScan
   |  ExecSeqScan
   |  standard_ExecutorRun
   |  ExecutorRun
   |  PortalRunSelect
   |  PortalRun
   |  exec_simple_query
   |  PostgresMain
   |  ServerLoop
   |  PostmasterMain
   |  main
   |  __libc_start_main
   |
--5.14%-- DirectFunctionCall2Coll
  numeric_cash
  ExecInterpExpr
  ExecScan
  ExecSeqScan
  standard_ExecutorRun
  ExecutorRun
  PortalRunSelect
  PortalRun
  exec_simple_query
  PostgresMain
  ServerLoop
  PostmasterMain
  main
  __libc_start_main

AFAIC, DirectionFunctionCall() involves some cost and the question is
why not to call numeric_mul_opt_error() directly, if that is possible.

Regards,
Amul




Re: [Patch] ALTER SYSTEM READ ONLY

2022-04-26 Thread Amul Sul
On Sat, Apr 23, 2022 at 1:34 PM Bharath Rupireddy
 wrote:
>
> On Mon, Mar 15, 2021 at 12:56 PM Amul Sul  wrote:
> > >
> > > It is a very minor change, so I rebased the patch. Please take a look, if 
> > > that works for you.
> > >
> >
> > Thanks, I am getting one more failure for the vacuumlazy.c. on the
> > latest master head(d75288fb27b), I fixed that in attached version.
>
> Thanks Amul! I haven't looked at the whole thread, I may be repeating
> things here, please bear with me.
>

Np, thanks for looking into it.

> 1) Is the pg_prohibit_wal() only user sets the wal prohibit mode? Or
> do we still allow via 'ALTER SYSTEM READ ONLY/READ WRITE'? If not, I
> think the patches still have ALTER SYSTEM READ ONLY references.

Could you please point me to what those references are? I didn't find
any in the v45 version.

> 2) IIUC, the idea of this patch is not to generate any new WAL when
> set as default_transaction_read_only and transaction_read_only can't
> guarantee that?

No. Complete WAL write should be disabled, in other words XLogInsert()
should be restricted.

> 3) IMO, the function name pg_prohibit_wal doesn't look good where it
> also allows one to set WAL writes, how about the following functions -
> pg_prohibit_wal or pg_disallow_wal_{generation, inserts} or
> pg_allow_wal or pg_allow_wal_{generation, inserts} without any
> arguments and if needed a common function
> pg_set_wal_generation_state(read-only/read-write) something like that?

There are already similar suggestions before too, but none of that
finalized yet, there are other more challenges that need to be
handled, so we can keep this work at last.

> 4) It looks like only the checkpointer is setting the WAL prohibit
> state? Is there a strong reason for that? Why can't the backend take a
> lock on prohibit state in shared memory and set it and let the
> checkpointer read it and block itself from writing WAL?

Once WAL prohibited state transition is initiated and should be
completed, there is no fallback. What if the backed exit before the
complete transition? Similarly, even if the checkpointer exits,
that will be restarted again and will complete the state transition.

> 5) Is SIGUSR1 (which is multiplexed) being sent without a "reason" to
> checkpointer? Why?

Simply want to wake up the checkpointer process without asking for
specific work in the handle function. Another suitable choice will be
SIGINT, we can choose that too if needed.

> 6) What happens for long-running or in-progress transactions if
> someone prohibits WAL in the midst of them? Do these txns fail? Or do
> we say that we will allow them to run to completion? Or do we fail
> those txns at commit time? One might use this feature to say not let
> server go out of disk space, but if we allow in-progress txns to
> generate/write WAL, then how can one achieve that with this feature?
> Say, I monitor my server in such a way that at 90% of disk space,
> prohibit WAL to avoid server crash. But if this feature allows
> in-progress txns to generate WAL, then the server may still crash?

Read-only transactions will be allowed to continue, and if that
transaction tries to write or any other transaction that has performed
any writes already then the session running that transaction will be
terminated -- the design is described in the first mail of this
thread.

> 7) What are the other use-cases (I can think of - to avoid out of disk
> crashes, block/freeze writes to database when the server is
> compromised) with this feature? Any usages during/before failover,
> promotion or after it?

The important use case is for failover to avoid split-brain situations.

> 8) Is there a strong reason that we've picked up conditional variable
> wal_prohibit_cv over mutex/lock for updating WALProhibit shared
> memory?

I am not sure how that can be done using mutex or lock.

> 9) Any tests that you are planning to add?

Yes, we can. I have added very sophisticated tests that cover most of
my code changes, but that is not enough for such critical code
changes, have a lot of chances of improvement and adding more tests
for this module as well as other parts e.g. some missing coverage of
gin, gists, brin, core features where this patch is adding checks, etc.
Any help will be greatly appreciated.

Regards,
Amul




Re: tweak to a few index tests to hits ambuildempty() routine.

2022-04-25 Thread Amul Sul
On Mon, Apr 25, 2022 at 7:23 PM Alvaro Herrera  wrote:
>
> On 2022-Apr-25, Alvaro Herrera wrote:
>
> > On 2022-Apr-25, Alvaro Herrera wrote:
> >
> > > I added one change to include spgist too, which was uncovered, and
> > > pushed this.
> >

Thanks for the commit with the improvement.

Regards,
Amul




Re: Proposal for internal Numeric to Uint64 conversion function.

2022-04-22 Thread Amul Sul
On Fri, Mar 18, 2022 at 1:17 AM Greg Stark  wrote:
>
> On Fri, 11 Mar 2022 at 15:17, Tom Lane  wrote:
> >
> > Amul Sul  writes:
>
> > >  Yeah, that's true, I am too not sure if we really need to refactor
> > >  all those; If we want, I can give it a try.
> >
> > The patch as-presented isn't very compelling for
> > lack of callers of the new function
>
> Tom, are you saying you think we're not interested in just adding this
> function unless it's part of this refactoring?
>
> Amul, do you think if we did numeric_to_int64/numeric_to_uint64 as a
> refactored API and a second patch that made numeric_pg_lsn and other
> consumers use it it would clean up the code significantly?

Sorry for the long absence.

Yes, I think we can do cleanup to some extent.  Attaching the
following patches that first intend to remove DirectFunctionCall as
much as possible:

0001:
Refactors and moves numeric_pg_lsn to pg_lsn.c file. Function gut is
in numeric.c as numeric_to_uint64_type() generalised function that can
be used elsewhere too.

0002:
Does little more cleanup to pg_lsn.c file -- replace few
DirectFunctionCall1() by the numeric_to_uint64_type().

0003:
Refactors numeric_int8() function and replace few
DirectFunctionCall1() to numeric_int8 by the newly added
numeric_to_int64() and numeric_to_int64_type().
numeric_to_int64_type() version can be called only when you want to
refer to the specific type name in the error message like
numeric_to_uint64_type, e.g.MONEY type.

0004:
Adding float8_to_numeric and numeric_to_float8 by refactoring
float8_numeric and numeric_float8 respectively. I am a bit confused
about whether the type should be referred to as float8 or double.
Replaces a few DirectFunctionCall() calls by these c functions.

0005:
This one replaces all DirectFunctionCall needed for the numeric
arithmetic operations.

Regards,
Amul
From fd5e1ccffe2c1aa1c619fb0f31389f35fe554e5b Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Fri, 22 Apr 2022 06:17:51 -0400
Subject: [PATCH v2 5/5] Call numeric arithmetic functions directly

---
 contrib/btree_gist/btree_numeric.c  | 17 +++--
 src/backend/access/brin/brin_minmax_multi.c | 14 
 src/backend/utils/adt/cash.c| 15 
 src/backend/utils/adt/dbsize.c  | 33 +
 src/backend/utils/adt/formatting.c  | 15 
 src/backend/utils/adt/jsonb.c   |  7 ++--
 src/backend/utils/adt/numeric.c | 39 +++--
 src/backend/utils/adt/pg_lsn.c  | 16 -
 src/backend/utils/adt/rangetypes.c  | 10 +++---
 9 files changed, 80 insertions(+), 86 deletions(-)

diff --git a/contrib/btree_gist/btree_numeric.c b/contrib/btree_gist/btree_numeric.c
index 35e466cdd94..6c50cfa41f2 100644
--- a/contrib/btree_gist/btree_numeric.c
+++ b/contrib/btree_gist/btree_numeric.c
@@ -174,17 +174,10 @@ gbt_numeric_penalty(PG_FUNCTION_ARGS)
 	ok = gbt_var_key_readable(org);
 	uk = gbt_var_key_readable((GBT_VARKEY *) DatumGetPointer(uni));
 
-	us = DatumGetNumeric(DirectFunctionCall2(numeric_sub,
-			 PointerGetDatum(uk.upper),
-			 PointerGetDatum(uk.lower)));
+	us = numeric_sub_opt_error((Numeric) uk.upper, (Numeric) uk.lower, NULL);
+	os = numeric_sub_opt_error((Numeric) ok.upper, (Numeric) ok.lower, NULL);
 
-	os = DatumGetNumeric(DirectFunctionCall2(numeric_sub,
-			 PointerGetDatum(ok.upper),
-			 PointerGetDatum(ok.lower)));
-
-	ds = DatumGetNumeric(DirectFunctionCall2(numeric_sub,
-			 NumericGetDatum(us),
-			 NumericGetDatum(os)));
+	ds = numeric_sub_opt_error(us, os, NULL);
 
 	if (numeric_is_nan(us))
 	{
@@ -202,9 +195,7 @@ gbt_numeric_penalty(PG_FUNCTION_ARGS)
 		if (DirectFunctionCall2(numeric_gt, NumericGetDatum(ds), NumericGetDatum(nul)))
 		{
 			*result += FLT_MIN;
-			os = DatumGetNumeric(DirectFunctionCall2(numeric_div,
-	 NumericGetDatum(ds),
-	 NumericGetDatum(us)));
+			os = numeric_div_opt_error(ds, us, NULL);
 			*result += (float4) DatumGetFloat8(DirectFunctionCall1(numeric_float8_no_overflow, NumericGetDatum(os)));
 		}
 	}
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 764efa381f2..c84a70d1ad9 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2008,19 +2008,21 @@ brin_minmax_multi_distance_tid(PG_FUNCTION_ARGS)
 Datum
 brin_minmax_multi_distance_numeric(PG_FUNCTION_ARGS)
 {
-	Datum		d;
-	Datum		a1 = PG_GETARG_DATUM(0);
-	Datum		a2 = PG_GETARG_DATUM(1);
+	Numeric		res;
+	Numeric		a1 = PG_GETARG_NUMERIC(0);
+	Numeric		a2 = PG_GETARG_NUMERIC(1);
 
 	/*
 	 * We know the values are range boundaries, but the range may be collapsed
 	 * (i.e. single points), with equal values.
 	 */
-	Assert(DatumGetBool(DirectFunctionCall2(numeric_le, a1, a2)));
+	Assert(DatumGetBool(DirectFunctionCall2(numeric_

Re: using an end-of-recovery record in all cases

2022-04-20 Thread Amul Sul
 On Tue, Apr 19, 2022 at 2:14 AM Robert Haas  wrote:
>
> On Sat, Jan 15, 2022 at 11:52 PM Julien Rouhaud  wrote:
> > The cfbot reports that this version of the patch doesn't apply anymore:
>
> Here is a new version of the patch which, unlike v1, I think is
> something we could seriously consider applying (not before v16, of
> course). It now removes CHECKPOINT_END_OF_RECOVERY completely, and I
> attach a second patch as well which nukes checkPoint.PrevTimeLineID as
> well.
>
> I mentioned two problems with $SUBJECT in the first email with this
> subject line. One was a bug, which Noah has since fixed (thanks,
> Noah). The other problem is that LogStandbySnapshot() and a bunch of
> its friends expect latestCompletedXid to always be a normal XID, which
> is a problem because (1) we currently set nextXid to 3 and (2) at
> startup, we compute latestCompletedXid = nextXid - 1. The current code
> dodges this kind of accidentally: the checkpoint that happens at
> startup is a "shutdown checkpoint" and thus skips logging a standby
> snapshot, since a shutdown checkpoint is a sure indicator that there
> are no running transactions. With the changes, the checkpoint at
> startup happens after we've started allowing write transactions, and
> thus a standby snapshot needs to be logged also. In the cases where
> the end-of-recovery record was already being used, the problem could
> have happened already, except for the fact that those cases involve a
> standby promotion, which doesn't happen during initdb. I explored a
> few possible ways of solving this problem.
>
> The first thing I considered was replacing latestCompletedXid with a
> name like incompleteXidHorizon. The idea is that, where
> latestCompletedXid is the highest XID that is known to have committed
> or aborted, incompleteXidHorizon would be the lowest XID such that all
> known commits or aborts are for lower XIDs. In effect,
> incompleteXidHorizon would be latestCompletedXid + 1. Since
> latestCompletedXid is always normal except when it's 2,
> incompleteXidHorizon would be normal in all cases. Initially this
> seemed fairly promising, but it kind of fell down when I realized that
> we copy latestCompletedXid into
> ComputeXidHorizonsResult.latest_completed. That seemed to me to make
> the consequences of the change a bit more far-reaching than I liked.
> Also, it wasn't entirely clear to me that I wouldn't be introducing
> any off-by-one errors into various wraparound calculations with this
> approach.
>
> The second thing I considered was skipping LogStandbySnapshot() during
> initdb. There are two ways of doing this and neither of them seem that
> great to me. Something that does work is to skip LogStandbySnapshot()
> when in single user mode, but there is no particular reason why WAL
> generated in single user mode couldn't be replayed on a standby, so
> this doesn't seem great. It's too big a hammer for what we really
> want, which is just to suppress this during initdb. Another way of
> approaching it is to skip it in bootstrap mode, but that actually
> doesn't work: initdb then fails during the post-bootstrapping step
> rather than during bootstrapping. I thought about patching around that
> by forcing the code that generates checkpoint records to forcibly
> advance an XID of 3 to 4, but that seemed like working around the
> problem from the wrong end.
>
> So ... I decided that the best approach here seems to be to just skip
> FirstNormalTransactionId and use FirstNormalTransactionId + 1 for the
> first write transaction that the cluster ever processes. That's very
> simple and doesn't seem likely to break anything else. On the downside
> it seems a bit grotty, but I don't see anything better, and on the
> whole, I think with this approach we come out substantially ahead.

IIUC, the failure was something like this on initdb:

running bootstrap script ... TRAP:
FailedAssertion("TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid)",
File: "procarray.c", Line: 2892, PID: 60363)

/bin/postgres(ExceptionalCondition+0xb9)[0xb3917d]
/bin/postgres(GetRunningTransactionData+0x36c)[0x96aa26]
/bin/postgres(LogStandbySnapshot+0x64)[0x974393]
/bin/postgres(CreateCheckPoint+0x67f)[0x5928bf]
/bin/postgres(RequestCheckpoint+0x26)[0x8ca649]
/bin/postgres(StartupXLOG+0xf51)[0x591126]
/bin/postgres(InitPostgres+0x188)[0xb4f2ac]
/bin/postgres(BootstrapModeMain+0x4d3)[0x5ac6de]
/bin/postgres(main+0x275)[0x7ca72e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f71af82d445]
/bin/postgres[0x48aae9]
child process was terminated by signal 6: Aborted
initdb: removing data directory "/inst/data"

That was happening because RequestCheckpoint() was called from StartupXLOG()
unconditionally, but with the v5 patch that is not true.

If my understanding is correct then we don't need any handling
for latestCompletedXid, at least in this patch.

Regards,
Amul




Re: using an end-of-recovery record in all cases

2022-04-18 Thread Amul Sul
On Tue, Apr 19, 2022 at 2:14 AM Robert Haas  wrote:
>
> On Sat, Jan 15, 2022 at 11:52 PM Julien Rouhaud  wrote:
> > The cfbot reports that this version of the patch doesn't apply anymore:
>
> Here is a new version of the patch which, unlike v1, I think is
> something we could seriously consider applying (not before v16, of
> course). It now removes CHECKPOINT_END_OF_RECOVERY completely, and I
> attach a second patch as well which nukes checkPoint.PrevTimeLineID as
> well.
>
> I mentioned two problems with $SUBJECT in the first email with this
> subject line. One was a bug, which Noah has since fixed (thanks,
> Noah). The other problem is that LogStandbySnapshot() and a bunch of
> its friends expect latestCompletedXid to always be a normal XID, which
> is a problem because (1) we currently set nextXid to 3 and (2) at
> startup, we compute latestCompletedXid = nextXid - 1. The current code
> dodges this kind of accidentally: the checkpoint that happens at
> startup is a "shutdown checkpoint" and thus skips logging a standby
> snapshot, since a shutdown checkpoint is a sure indicator that there
> are no running transactions. With the changes, the checkpoint at
> startup happens after we've started allowing write transactions, and
> thus a standby snapshot needs to be logged also. In the cases where
> the end-of-recovery record was already being used, the problem could
> have happened already, except for the fact that those cases involve a
> standby promotion, which doesn't happen during initdb. I explored a
> few possible ways of solving this problem.
>
> The first thing I considered was replacing latestCompletedXid with a
> name like incompleteXidHorizon. The idea is that, where
> latestCompletedXid is the highest XID that is known to have committed
> or aborted, incompleteXidHorizon would be the lowest XID such that all
> known commits or aborts are for lower XIDs. In effect,
> incompleteXidHorizon would be latestCompletedXid + 1. Since
> latestCompletedXid is always normal except when it's 2,
> incompleteXidHorizon would be normal in all cases. Initially this
> seemed fairly promising, but it kind of fell down when I realized that
> we copy latestCompletedXid into
> ComputeXidHorizonsResult.latest_completed. That seemed to me to make
> the consequences of the change a bit more far-reaching than I liked.
> Also, it wasn't entirely clear to me that I wouldn't be introducing
> any off-by-one errors into various wraparound calculations with this
> approach.
>
> The second thing I considered was skipping LogStandbySnapshot() during
> initdb. There are two ways of doing this and neither of them seem that
> great to me. Something that does work is to skip LogStandbySnapshot()
> when in single user mode, but there is no particular reason why WAL
> generated in single user mode couldn't be replayed on a standby, so
> this doesn't seem great. It's too big a hammer for what we really
> want, which is just to suppress this during initdb. Another way of
> approaching it is to skip it in bootstrap mode, but that actually
> doesn't work: initdb then fails during the post-bootstrapping step
> rather than during bootstrapping. I thought about patching around that
> by forcing the code that generates checkpoint records to forcibly
> advance an XID of 3 to 4, but that seemed like working around the
> problem from the wrong end.
>
> So ... I decided that the best approach here seems to be to just skip
> FirstNormalTransactionId and use FirstNormalTransactionId + 1 for the
> first write transaction that the cluster ever processes. That's very
> simple and doesn't seem likely to break anything else. On the downside
> it seems a bit grotty, but I don't see anything better, and on the
> whole, I think with this approach we come out substantially ahead.
> 0001 removes 3 times as many lines as it inserts, and 0002 saves a few
> more lines of code.
>
> Now, I still don't really know that there isn't some theoretical
> difficulty here that makes this whole approach a non-starter, but I
> also can't think of what it might be. If the promotion case, which has
> used the end-of-recovery record for many years, is basically safe,
> despite the fact that it switches TLIs, then it seems to me that the
> crash recovery case, which doesn't have that complication, ought to be
> safe too. But I might well be missing something, so if you see a
> problem, please speak up!
>

  /*
- * If this was a promotion, request an (online) checkpoint now. This isn't
- * required for consistency, but the last restartpoint might be far back,
- * and in case of a crash, recovering from it might take a longer than is
- * appropriate now that we're not in standby mode anymore.
+ * Request an (online) checkpoint now. This isn't required for consistency,
+ * but the last restartpoint might be far back, and in case of a crash,
+ * recovering from it might take a longer than is appropriate now that
+ * we're not in standby mode anymore.
  */
- if (promoted)
- 

minor code correction in typecmds.c

2022-02-16 Thread Amul Sul
Hi,

The attached patch replaces the hard-coded type alignment value with
the defined macro for the same.

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index ce8c1badb39..ef6f14785a0 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -1910,10 +1910,10 @@ makeMultirangeConstructors(const char *name, Oid namespace,
 	allParamTypes = ObjectIdGetDatum(rangeArrayOid);
 	allParameterTypes = construct_array(,
 		1, OIDOID,
-		sizeof(Oid), true, 'i');
+		sizeof(Oid), true, TYPALIGN_INT);
 	paramModes = CharGetDatum(FUNC_PARAM_VARIADIC);
 	parameterModes = construct_array(, 1, CHAROID,
-	 1, true, 'c');
+	 1, true, TYPALIGN_CHAR);
 	myself = ProcedureCreate(name,	/* name: same as multirange type */
 			 namespace,
 			 false, /* replace */


Re: Proposal for internal Numeric to Uint64 conversion function.

2022-02-16 Thread Amul Sul
On Wed, Feb 16, 2022 at 4:50 PM Peter Eisentraut
 wrote:
>
> On 16.02.22 06:00, Amul Sul wrote:
> > Currently, numeric_pg_lsn is the only one that accepts the Numeric
> > value and converts to uint64 and that is the reason all the type
> > conversion code is embedded into it.
>
> There are other functions such as numeric_int8() that work similarly.
> If you are going to refactor, then they should all be treated similarly.
>   I'm not sure if it's going to end up being beneficial.

Yeah, that's true, I am too not sure if we really need to refactor
all those; If we want, I can give it a try.

The intention here is to add a function that will convert numeric to
uint64 --  we don't have any as of now, if I am not wrong.

Regards,
Amul




Proposal for internal Numeric to Uint64 conversion function.

2022-02-15 Thread Amul Sul
Hi,

Currently, numeric_pg_lsn is the only one that accepts the Numeric
value and converts to uint64 and that is the reason all the type
conversion code is embedded into it.

I think it would be helpful if that numeric-to-uint64 conversion code
is extracted as a separate function so that any other SQL function
like numeric_pg_lsn which wants to accept values up to the uint64
range can use that function.

Thoughts? Comments?

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
From 98feb2bb92d2534cccaea57cd9b9777193bacae4 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Tue, 15 Feb 2022 23:57:26 -0500
Subject: [PATCH v1] Refactor numeric_pg_lsn

---
 src/backend/utils/adt/numeric.c | 54 +
 src/include/utils/numeric.h |  1 +
 2 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 3208789f75e..7bb6c1d7f7c 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -4209,6 +4209,39 @@ int64_div_fast_to_numeric(int64 val1, int log10val2)
 	return res;
 }
 
+/*
+ * typeName is the user-visible type name of uint64 used for the error
+ * reporting.
+ */
+uint64
+numeric_to_uint64_type(Numeric num, char *typeName)
+{
+	NumericVar	x;
+	uint64		result;
+
+	if (NUMERIC_IS_SPECIAL(num))
+	{
+		if (NUMERIC_IS_NAN(num))
+			ereport(ERROR,
+	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+	 errmsg("cannot convert NaN to %s", typeName)));
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+	 errmsg("cannot convert infinity to %s", typeName)));
+	}
+
+	/* Convert to variable format and thence to pg_lsn */
+	init_var_from_num(num, );
+
+	if (!numericvar_to_uint64(, ))
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("%s out of range", typeName)));
+
+	return result;
+}
+
 Datum
 int4_numeric(PG_FUNCTION_ARGS)
 {
@@ -4545,28 +4578,9 @@ Datum
 numeric_pg_lsn(PG_FUNCTION_ARGS)
 {
 	Numeric		num = PG_GETARG_NUMERIC(0);
-	NumericVar	x;
 	XLogRecPtr	result;
 
-	if (NUMERIC_IS_SPECIAL(num))
-	{
-		if (NUMERIC_IS_NAN(num))
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot convert NaN to %s", "pg_lsn")));
-		else
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot convert infinity to %s", "pg_lsn")));
-	}
-
-	/* Convert to variable format and thence to pg_lsn */
-	init_var_from_num(num, );
-
-	if (!numericvar_to_uint64(, (uint64 *) ))
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("pg_lsn out of range")));
+	result = (XLogRecPtr) numeric_to_uint64_type(num, "pg_lsn");
 
 	PG_RETURN_LSN(result);
 }
diff --git a/src/include/utils/numeric.h b/src/include/utils/numeric.h
index 21cd5a5cbf5..1b9ad3b5b58 100644
--- a/src/include/utils/numeric.h
+++ b/src/include/utils/numeric.h
@@ -74,6 +74,7 @@ extern char *numeric_normalize(Numeric num);
 
 extern Numeric int64_to_numeric(int64 val);
 extern Numeric int64_div_fast_to_numeric(int64 val1, int log10val2);
+extern uint64 numeric_to_uint64_type(Numeric num, char *typeName);
 
 extern Numeric numeric_add_opt_error(Numeric num1, Numeric num2,
 	 bool *have_error);
-- 
2.18.0



Re: TAP test to cover "EndOfLogTLI != replayTLI" case

2022-02-14 Thread Amul Sul
On Mon, Feb 14, 2022 at 3:07 PM Heikki Linnakangas  wrote:
>
> On 28/01/2022 12:10, Amul Sul wrote:
> > On Thu, Jan 27, 2022 at 12:01 PM Michael Paquier  
> > wrote:
> >>
> >> After reading this patch and this thread, I have noticed that you are
> >> testing the same thing as Heikki here, patch 0001:
> >> https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25...@iki.fi
> >>
> >> The patch sent on the other thread has a better description and shape,
> >> so perhaps we'd better drop what is posted here in favor of the other
> >> version.  Thoughts?
> >
> > Yes, I do agree with you. Thanks for the comparison.
>
> Pushed the test from that other thread now. Thanks!
>

Thank you !

Regards,
Amul




Re: TAP test to cover "EndOfLogTLI != replayTLI" case

2022-01-28 Thread Amul Sul
On Thu, Jan 27, 2022 at 12:01 PM Michael Paquier  wrote:
>
> On Thu, Jan 20, 2022 at 12:13:08PM +0530, Amul Sul wrote:
> > I found the cause for the test failing on window -- is due to the
> > custom archive command setting which wasn't setting the correct window
> > archive directory path.
>
> After reading this patch and this thread, I have noticed that you are
> testing the same thing as Heikki here, patch 0001:
> https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25...@iki.fi
>
> The patch sent on the other thread has a better description and shape,
> so perhaps we'd better drop what is posted here in favor of the other
> version.  Thoughts?

Yes, I do agree with you. Thanks for the comparison.

Regards,
Amul




Re: generalized conveyor belt storage

2022-01-27 Thread Amul Sul
 On Wed, Jan 5, 2022 at 1:12 AM Robert Haas  wrote:
>
> On Wed, Dec 29, 2021 at 7:08 AM Amul Sul  wrote:
> > Thought patch is WIP, here are a few comments that I found while
> > reading the patch and thought might help:
> >
> > +   {
> > +   if (meta->cbm_oldest_index_segment ==
> > +   meta->cbm_newest_index_segment)
> > +   elog(ERROR, "must remove last index segment when only one 
> > remains");
> > +   meta->cbm_oldest_index_segment = segno;
> > +   }
> >
> > How about having to assert or elog to ensure that 'segno' is indeed
> > the successor?
>
> This code doesn't have any inexpensive way to know whether segno is
> the successor. To figure that out you'd need to look at the latest
> index page that does exist, but this function's job is just to look at
> the metapage. Besides, the eventual caller will have just looked up
> that value in order to pass it to this function, so double-checking
> what we just computed right before doesn't really make sense.
>

Ok.

Assert(meta->cbm_oldest_index_segment < segno) was in my mind since
the segment to be removed is between meta->cbm_oldest_index_segment
and segno, IIUC.

> [...]
> Updated patch attached, also with a fix for the mistake in the readme
> that Matthias found, and a few other bug fixes.
>

Few more comments for this version:

+void
+cb_cache_invalidate(CBCache *cache, CBPageNo index_start,
+   uint64 index_segments_moved)
+{
+   if (index_segments_moved != cache->index_segments_moved)
+   {
+   cb_iseg_reset(cache->iseg);
+   cache->index_segments_moved = index_segments_moved;
+   }
+   else if (index_start > cache->oldest_possible_start)
+   {
+   cb_iseg_iterator it;
+   cb_iseg_entry *entry;
+
+   cb_iseg_start_iterate(cache->iseg, );
+   while ((entry = cb_iseg_iterate(cache->iseg, )) != NULL)
+   if (entry->index_segment_start < index_start)
+   cb_iseg_delete_item(cache->iseg, entry);
+   }
+}

Shouldn't update oldest_possible_start, once all the entries preceding
the index_start are deleted?
--

+CBSegNo
+cbfsmpage_find_free_segment(Page page)
+{
+   CBFSMPageData *fsmp = cb_fsmpage_get_special(page);
+   unsignedi;
+   unsignedj;
+
+   StaticAssertStmt(CB_FSMPAGE_FREESPACE_BYTES % sizeof(uint64) == 0,
+"CB_FSMPAGE_FREESPACE_BYTES should be a multiple of 8");
+

I am a bit confused about this assertion, why is that so?
Why should CB_FSMPAGE_FREESPACE_BYTES be multiple of 8?
Do you mean CB_FSM_SEGMENTS_PER_FSMPAGE instead of CB_FSMPAGE_FREESPACE_BYTES?
--

+/*
+ * Add index entries for logical pages beginning at 'pageno'.
+ *
+ * It is the caller's responsibility to supply the correct index
page, and
+ * to make sure that there is enough room for the entries to be added.
+ */
+void
+cb_indexpage_add_index_entries(Page page,
+  unsigned pageoffset,
+  unsigned num_index_entries,
+  CBSegNo *index_entries)

The first comment line says "...logical pages beginning at 'pageno'",
there is nothing 'pageno' in that function, does it mean pageoffset?
--

+ * Sets *pageno to the first logical page covered by this index page.
+ *
+ * Returns the segment number to which the obsolete index entry points.
+ */
+CBSegNo
+cb_indexpage_get_obsolete_entry(Page page, unsigned *pageoffset,
+   CBPageNo *first_pageno)
+{
+   CBIndexPageData *ipd = cb_indexpage_get_special(page);
+
+   *first_pageno = ipd->cbidx_first_page;
+
+   while (*pageoffset < CB_INDEXPAGE_INDEX_ENTRIES &&
+  ipd->cbidx_entry[*pageoffset] != CB_INVALID_SEGMENT)
+   ++*pageoffset;
+
+   return ipd->cbidx_entry[*pageoffset];
+}

Here I think *first_pageno should be instead of *pageno in the comment
line.  The second line says "Returns the segment number to which the
obsolete index entry points." I am not sure if that is correct or I
might have misunderstood this? Look like function returns
CB_INVALID_SEGMENT or ipd->cbidx_entry[CB_INDEXPAGE_INDEX_ENTRIES]
value which could be a garbage value due to out-of-bound memory
access.
--

+/*
+ * Copy the indicated number of index entries out of the metapage.
+ */
+void
+cb_metapage_get_index_entries(CBMetapageData *meta, unsigned num_index_entries,
+ CBSegNo *index_entries)
+{
+   Assert(num_index_entries <= cb_metapage_get_index_entries_used(meta));

IMO, having elog instead of assert would be good, like
cb_metapage_remove_index_entries() has an elog for (used < count).
--

+ * Copyright (c) 2016-2021, PostgreSQL Global Development Group

This rather is a question for my knowledge, why this copyright year
starting from 2016, I thought we would add the current year only for the
new file to be added.

Regards,
Amul




Re: Correct error message for end-of-recovery record TLI

2022-01-24 Thread Amul Sul
On Tue, Jan 25, 2022 at 10:08 AM Michael Paquier  wrote:
>
> On Wed, Dec 01, 2021 at 07:09:34PM +, Bossart, Nathan wrote:
> > The patch no longer applied, so I went ahead and rebased it.
>
> This was on the CF stack for some time, so applied.  I have also
> changed the messages produced for the shutdown and online checkpoint
> records as they used the same messages so as one can get more
> context depending on the record types.

A ton of thanks for the improvement & the commit.

Regards,
Amul




Re: TAP test to cover "EndOfLogTLI != replayTLI" case

2022-01-19 Thread Amul Sul
On Tue, Jan 18, 2022 at 10:31 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 18 Jan 2022 12:25:15 +0800, Julien Rouhaud  wrote 
> in
> > Hi,
> >
> > On Mon, Jan 17, 2022 at 09:33:57PM +0800, Julien Rouhaud wrote:
> > >
> > > Thanks for the updated patch!  Note that thanks to Andres and Thomas 
> > > work, you
> > > can now easily rely on the exact same CI than the cfbot on your own github
> > > repository, if you need to debug something on a platform you don't have 
> > > access
> > > to.  You can check the documentation at [1] for more detail on how to 
> > > setup the
> > > CI.
> >
> > The cfbot reports that the patch still fails on Windows but also failed on
> > macos with the same error: https://cirrus-ci.com/task/5655777858813952:
> >
> > [14:20:43.950] #   Failed test 'check standby content before timeline 
> > switch 0/500FAF8'
> > [14:20:43.950] #   at t/003_recovery_targets.pl line 239.
> > [14:20:43.950] #  got: '6000'
> > [14:20:43.950] # expected: '7000'
> > [14:20:43.950] # Looks like you failed 1 test of 10.
> >
> > I'm switching the CF entry to Waiting on Author.
>
> The most significant advantages of the local CI setup are
>
>  - CI immediately responds to push.
>
>  - You can dirty the code with additional logging aid as much as you
>like to see closely what is going on.  It makes us hesitant to do
>the same on this ML:p
>

Indeed :)

I found the cause for the test failing on window -- is due to the
custom archive command setting which wasn't setting the correct window
archive directory path.

There is no option to choose a custom wal archive and restore patch in
the TAP test. The attach 0001 patch proposes the same, which enables
you to choose a custom WAL archive and restore directory. The only
concern I have is that $node->info() will print the wrong archive
directory path in that case, do we need to fix that?  We might need to
store archive_dir like base_dir, I wasn't sure about that.  Thoughts?
Comments?

Regards,
Amul
From 4bc968d3508e9f608fd572f40f39e2cb374d53f4 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Wed, 19 Jan 2022 23:22:02 -0500
Subject: [PATCH v4 1/3] Allow TAP tests to choose custom WAL archive and
 restore path.

---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b7d4c24553..6a562db2ce 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1033,10 +1033,13 @@ primary_conninfo='$root_connstr'
 # Internal routine to enable archive recovery command on a standby node
 sub enable_restoring
 {
-	my ($self, $root_node, $standby) = @_;
-	my $path = PostgreSQL::Test::Utils::perl2host($root_node->archive_dir);
+	my ($self, $root_node, $standby, $path) = @_;
 	my $name = $self->name;
 
+	# Default archive directory will be used if not specified.
+	$path = $root_node->archive_dir if (!defined($path));
+	$path = PostgreSQL::Test::Utils::perl2host($path);
+
 	print "### Enabling WAL restore for node \"$name\"\n";
 
 	# On Windows, the path specified in the restore command needs to use
@@ -1101,10 +1104,13 @@ sub set_standby_mode
 # Internal routine to enable archiving
 sub enable_archiving
 {
-	my ($self) = @_;
-	my $path   = PostgreSQL::Test::Utils::perl2host($self->archive_dir);
+	my ($self, $path) = @_;
 	my $name   = $self->name;
 
+	# Default archive directory will be used if not specified.
+	$path = $self->archive_dir if (!defined($path));
+	$path = PostgreSQL::Test::Utils::perl2host($path);
+
 	print "### Enabling WAL archiving for node \"$name\"\n";
 
 	# On Windows, the path specified in the restore command needs to use
-- 
2.18.0

From 6c730ee10b9e64d90982aa555283c9094a5fdb87 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Wed, 19 Jan 2022 23:25:07 -0500
Subject: [PATCH v4 2/3] TAP test for EndOfLogTLI

---
 src/test/recovery/t/003_recovery_targets.pl | 47 -
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index 24da78c0bc..b3f7076540 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -182,3 +182,48 @@ $logfile = slurp_file($node_standby->logfile());
 ok( $logfile =~
 	  qr/FATAL: .* recovery ended before configured rec

Re: TAP test to cover "EndOfLogTLI != replayTLI" case

2022-01-17 Thread Amul Sul
On Sat, Jan 15, 2022 at 11:35 AM Julien Rouhaud  wrote:
>
> Hi,
>
> On Mon, Jan 10, 2022 at 09:46:23AM +0530, Amul Sul wrote:
> >
> > Thanks for the note, I can see the same test is failing on my centos
> > vm too with the latest master head(376ce3e404b).  The failing reason is
> > the "recovery_target_inclusive = off" setting which is unnecessary for
> > this test, the attached patch removing the same.
>
> This version still fails on windows according to the cfbot:
>
> https://cirrus-ci.com/task/5882621321281536
>
> [18:14:02.639] c:\cirrus>call perl src/tools/msvc/vcregress.pl recoverycheck
> [18:14:56.114]
> [18:14:56.122] #   Failed test 'check standby content before timeline switch 
> 0/500FB30'
> [18:14:56.122] #   at t/003_recovery_targets.pl line 234.
> [18:14:56.122] #  got: '6000'
> [18:14:56.122] # expected: '7000'
>
> I'm switching the cf entry to Waiting on Author.

Thanks for the note.

I am not sure what really went wrong but I think the 'standby_9'
server shutdown too early before it gets a chance to archive a
required WAL file. The attached patch tries to shutdown that server
after the required WAL are archived, unfortunately, I couldn't test
that on the window, let see how cfbot reacts to this version.

Regards,
Amul
From 08d19c0ef2f464e8bf722ce13457acf3f9be47e8 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Mon, 17 Jan 2022 06:25:30 -0500
Subject: [PATCH v3] TAP test for EndOfLogTLI

---
 src/test/recovery/t/003_recovery_targets.pl | 57 -
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index 24da78c0bcd..928799b9490 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -182,3 +182,58 @@ $logfile = slurp_file($node_standby->logfile());
 ok( $logfile =~
 	  qr/FATAL: .* recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+# Test to cover a case where that we are looking for WAL record that ought to be
+# in for e.g 00010001 we don't find it; instead we find
+# 00020003 because of various reasons such as there was a
+# timeline switch in that segment, and we were reading the old WAL from a
+# segment belonging to a higher timeline or our recovery target timeline is 2,
+# or something that has 2 in its history.
+
+# Insert few more data to primary
+$node_primary->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(6001,7000))");
+my $lsn6 = $node_primary->safe_psql('postgres',
+	"SELECT pg_current_wal_lsn()");
+
+# Setup new standby and enable WAL archiving to archive WAL files at the same
+# location as the primary.
+my $archive_cmd = $node_primary->safe_psql('postgres',
+	"SELECT current_setting('archive_command')");
+$node_standby = PostgreSQL::Test::Cluster->new('standby_9');
+$node_standby->init_from_backup(
+	$node_primary, 'my_backup',
+	has_streaming => 1);
+$node_standby->append_conf(
+'postgresql.conf', qq(
+archive_mode = on
+archive_command = '$archive_cmd'
+));
+$node_standby->start;
+# Wait until necessary replay has been done on standby
+$node_primary->wait_for_catchup($node_standby, 'replay',
+	$node_primary->lsn('write'));
+$node_standby->promote;
+$node_standby->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(7001,8000))");
+# Force archiving of WAL file
+my $last_wal = $node_standby->safe_psql('postgres',
+	"SELECT pg_walfile_name(pg_switch_wal())");
+# Wait until this WAL file archive
+my $check_archive = "SELECT last_archived_wal >= '$last_wal' FROM pg_stat_archiver";
+$node_standby->poll_query_until('postgres', $check_archive)
+	or die "Timed out while waiting for $last_wal file archive";
+$node_standby->stop;
+
+# Another standby whose recovery target lsn will be in the WAL file has
+# a different TLI than the target LSN belongs to.
+$node_standby = PostgreSQL::Test::Cluster->new('standby_10');
+$node_standby->init_from_backup(
+	$node_primary, 'my_backup',
+	has_restoring => 1);
+$node_standby->append_conf(
+'postgresql.conf', qq(recovery_target_lsn = '$lsn6'));
+$node_standby->start;
+my $result = $node_standby->safe_psql('postgres',
+	"SELECT count(*) FROM tab_int");
+is($result, '7000', "check standby content before timeline switch $lsn6");
-- 
2.18.0



Re: TAP test to cover "EndOfLogTLI != replayTLI" case

2022-01-09 Thread Amul Sul
On Mon, Jan 10, 2022 at 8:25 AM Andres Freund  wrote:
>
> Hi,
>
> On 2021-11-23 11:43:21 +0530, Amul Sul wrote:
> > Attached patch covers a case where TLI in the filename for a
> > record being read is different from where it belongs to. In other
> > words, it covers following case noted in StartupXLOG():
>
> > Thoughts? Suggestions?
>
> It seems the test isn't quite reliable. It occasionally fails on freebsd,
> macos, linux and always on windows (starting with the new CI stuff, before the
> test wasn't run).
>
> See 
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3427
>

Thanks for the note, I can see the same test is failing on my centos
vm too with the latest master head(376ce3e404b).  The failing reason is
the "recovery_target_inclusive = off" setting which is unnecessary for
this test, the attached patch removing the same.

Regards,
Amul
From 88ae9ea5a33c1ecc5b5493dae9c016ef19fbf88f Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Sun, 9 Jan 2022 23:10:07 -0500
Subject: [PATCH v2] TAP test for EndOfLogTLI

---
 src/test/recovery/t/003_recovery_targets.pl | 52 -
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index 24da78c0bcd..cf72b5d9343 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -182,3 +182,53 @@ $logfile = slurp_file($node_standby->logfile());
 ok( $logfile =~
 	  qr/FATAL: .* recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+# Test to cover a case where that we are looking for WAL record that ought to be
+# in for e.g 00010001 we don't find it; instead we find
+# 00020003 because of various reasons such as there was a
+# timeline switch in that segment, and we were reading the old WAL from a
+# segment belonging to a higher timeline or our recovery target timeline is 2,
+# or something that has 2 in its history.
+
+# Insert few more data to primary
+$node_primary->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(6001,7000))");
+my $lsn6 = $node_primary->safe_psql('postgres',
+	"SELECT pg_current_wal_lsn()");
+
+# Setup new standby and enable WAL archiving to archive WAL files at the same
+# location as the primary.
+my $archive_cmd = $node_primary->safe_psql('postgres',
+	"SELECT current_setting('archive_command')");
+$node_standby = PostgreSQL::Test::Cluster->new('standby_9');
+$node_standby->init_from_backup(
+	$node_primary, 'my_backup',
+	has_streaming => 1);
+$node_standby->append_conf(
+'postgresql.conf', qq(
+archive_mode = on
+archive_command = '$archive_cmd'
+));
+$node_standby->start;
+# Wait until necessary replay has been done on standby
+$node_primary->wait_for_catchup($node_standby, 'replay',
+	$node_primary->lsn('write'));
+$node_standby->promote;
+$node_standby->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(7001,8000))");
+# Force archiving of WAL file
+$node_standby->safe_psql('postgres', "SELECT pg_switch_wal()");
+$node_standby->stop;
+
+# Another standby whose recovery target lsn will be in the WAL file has
+# a different TLI than the target LSN belongs to.
+$node_standby = PostgreSQL::Test::Cluster->new('standby_10');
+$node_standby->init_from_backup(
+	$node_primary, 'my_backup',
+	has_restoring => 1);
+$node_standby->append_conf(
+'postgresql.conf', qq(recovery_target_lsn = '$lsn6'));
+$node_standby->start;
+my $result = $node_standby->safe_psql('postgres',
+	"SELECT count(*) FROM tab_int");
+is($result, '7000', "check standby content before timeline switch $lsn6");
-- 
2.18.0



Re: generic plans and "initial" pruning

2022-01-05 Thread Amul Sul
On Fri, Dec 31, 2021 at 7:56 AM Amit Langote  wrote:
>
> On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat  
> wrote:
>>
>> On Sat, Dec 25, 2021 at 9:06 AM Amit Langote  wrote:
>> >
>> > Executing generic plans involving partitions is known to become slower
>> > as partition count grows due to a number of bottlenecks, with
>> > AcquireExecutorLocks() showing at the top in profiles.
>> >
>> > Previous attempt at solving that problem was by David Rowley [1],
>> > where he proposed delaying locking of *all* partitions appearing under
>> > an Append/MergeAppend until "initial" pruning is done during the
>> > executor initialization phase.  A problem with that approach that he
>> > has described in [2] is that leaving partitions unlocked can lead to
>> > race conditions where the Plan node belonging to a partition can be
>> > invalidated when a concurrent session successfully alters the
>> > partition between AcquireExecutorLocks() saying the plan is okay to
>> > execute and then actually executing it.
>> >
>> > However, using an idea that Robert suggested to me off-list a little
>> > while back, it seems possible to determine the set of partitions that
>> > we can safely skip locking.  The idea is to look at the "initial" or
>> > "pre-execution" pruning instructions contained in a given Append or
>> > MergeAppend node when AcquireExecutorLocks() is collecting the
>> > relations to lock and consider relations from only those sub-nodes
>> > that survive performing those instructions.   I've attempted
>> > implementing that idea in the attached patch.
>> >
>>
>> In which cases, we will have "pre-execution" pruning instructions that
>> can be used to skip locking partitions? Can you please give a few
>> examples where this approach will be useful?
>
>
> This is mainly to be useful for prepared queries, so something like:
>
> prepare q as select * from partitioned_table where key = $1;
>
> And that too when execute q(…) uses a generic plan. Generic plans are 
> problematic because it must contain nodes for all partitions (without any 
> plan time pruning), which means CheckCachedPlan() has to spend time 
> proportional to the number of partitions to determine that the plan is still 
> usable / has not been invalidated; most of that is AcquireExecutorLocks().
>
> Other bottlenecks, not addressed in this patch, pertain to some executor 
> startup/shutdown subroutines that process the range table of a PlannedStmt in 
> its entirety, whose length is also proportional to the number of partitions 
> when the plan is generic.
>
>> The benchmark is showing good results, indeed.
>
Indeed.

Here are few comments for v1 patch:

+   /* Caller error if we get here without contains_init_steps */
+   Assert(pruneinfo->contains_init_steps);

-   prunedata = prunestate->partprunedata[i];
-   pprune = >partrelprunedata[0];

-   /* Perform pruning without using PARAM_EXEC Params */
-   find_matching_subplans_recurse(prunedata, pprune, true, );
+   if (parentrelids)
+   *parentrelids = NULL;

You got two blank lines after Assert.
--

+   /* Set up EState if not in the executor proper. */
+   if (estate == NULL)
+   {
+   estate = CreateExecutorState();
+   estate->es_param_list_info = params;
+   free_estate = true;
}

... [Skip]

+   if (free_estate)
+   {
+   FreeExecutorState(estate);
+   estate = NULL;
}

I think this work should be left to the caller.
--

/*
 * Stuff that follows matches exactly what ExecCreatePartitionPruneState()
 * does, except we don't need a PartitionPruneState here, so don't call
 * that function.
 *
 * XXX some refactoring might be good.
 */

+1, while doing it would be nice if foreach_current_index() is used
instead of the i & j sequence in the respective foreach() block, IMO.
--

+   while ((i = bms_next_member(validsubplans, i)) >= 0)
+   {
+   Plan   *subplan = list_nth(subplans, i);
+
+   context->relations =
+   bms_add_members(context->relations,
+   get_plan_scanrelids(subplan));
+   }

I think instead of get_plan_scanrelids() the
GetLockableRelations_worker() can be used; if so, then no need to add
get_plan_scanrelids() function.
--

 /* Nodes containing prunable subnodes. */
+   case T_MergeAppend:
+   {
+   PlannedStmt *plannedstmt = context->plannedstmt;
+   List   *rtable = plannedstmt->rtable;
+   ParamListInfo params = context->params;
+   PartitionPruneInfo *pruneinfo;
+   Bitmapset  *validsubplans;
+   Bitmapset  *parentrelids;

...
if (pruneinfo && pruneinfo->contains_init_steps)
{
int i;
...
   return false;
}
}
break;

Most of the declarations need to be moved inside 

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-02 Thread Amul Sul
On Mon, Jan 3, 2022 at 2:56 AM Andres Freund  wrote:
>
> Hi,
>
> On 2021-12-14 20:23:57 +, Bossart, Nathan wrote:
> > As promised, here is v2.  This patch set includes handling for all
> > four tasks noted upthread.  I'd still consider this a work-in-
> > progress, as I've done minimal testing.  At the very least, it should
> > demonstrate what an auxiliary process approach might look like.
>
> This generates a compiler warning:
> https://cirrus-ci.com/task/5740581082103808?logs=mingw_cross_warning#L378
>

Somehow, I am not getting these compiler warnings on the latest master
head (69872d0bbe6).

Here are the few minor comments for the v2 version, I thought would help:

+ * Copyright (c) 2021, PostgreSQL Global Development Group

Time to change the year :)
--

+
+   /* These operations are really just a minimal subset of
+* AbortTransaction().  We don't have very many resources to worry
+* about.
+*/

Incorrect formatting, the first line should be empty in the multiline
code comment.
--

+   XLogRecPtr  logical_rewrite_mappings_cutoff;/* can remove
older mappings */
+   XLogRecPtr  logical_rewrite_mappings_cutoff_set;

Look like logical_rewrite_mappings_cutoff gets to set only once and
never get reset, if it is true then I think that variable can be
skipped completely and set the initial logical_rewrite_mappings_cutoff
to InvalidXLogRecPtr, that will do the needful.
--

Regards,
Amul




Re: Multi-Column List Partitioning

2022-01-02 Thread Amul Sul
On Wed, Dec 29, 2021 at 7:26 PM Nitin Jadhav
 wrote:
>
>
> > -* For range partitioning, we must only perform pruning with values
> > -* for either all partition keys or a prefix thereof.
> > +* For range partitioning and list partitioning, we must only 
> > perform
> > +* pruning with values for either all partition keys or a prefix
> > +* thereof.
> > */
> > -   if (keyno > nvalues && context->strategy == 
> > PARTITION_STRATEGY_RANGE)
> > +   if (keyno > nvalues && (context->strategy == 
> > PARTITION_STRATEGY_RANGE ||
> > +   context->strategy == 
> > PARTITION_STRATEGY_LIST))
> >break;
> >
> > I think this is not true for multi-value list partitions, we might
> > still want prune partitions for e.g. (100, IS NULL, 20).  Correct me
> > if I am missing something here.
>
> AFAIK, the above condition/comments says that, either we should
> include all keys or prefixes of the partition keys to get the
> partition pruning results. For example if we have a table with 2
> columns and both are present in the partition key. Let the column
> names be 'a' and 'b'.
>
> SELECT * FROM table WHERE a=1 AND b=1;  - This query works for pruning
> and it refers to a comment which says all partition keys are included.
> SELECT * FROM table WHERE b=1;  - Here partition pruning does not work
> as it does not contain prefix of the partition keys.
> SELECT * FROM table WHERE a=1; - This query works fine as column 'a'
> is prefix of partition keys.
>
> Please let me know if you need more information.

That what I was assuming is not correct. The dependency of the prefix
is true for the range partitioning but why should that be in the case
of list partitioning? I think all partitioning keys in the list will
not be dependent on each other, AFAICU. If you prune list partitions
based on the b=1 value that still is correct & gives the correct
result, correct me If I am wrong.

> ---
>
> > +/*
> > + * get_min_and_max_offset
> > + *
> > + * Fetches the minimum and maximum offset of the matching partitions.
> > + */
> >
> > ...
> >
> > +/*
> > + * get_min_or_max_off
> > + *
> > + * Fetches either minimum or maximum offset of the matching partitions
> > + * depending on the value of is_min parameter.
> > + */
> >
> > I am not sure we really have to have separate functions but if needed
> > then I would prefer to have a separate function for each min and max
> > rather than combining.
>
> If we don't make a separate function, then we have to include this
> code in get_matching_list_bounds() which is already a big function. I
> just made a separate function to not increase the complexity of
> get_matching_list_bounds() and most of the code present in
> get_min_or_max_off() is common for min and max calculation. If we make
> it separate then there might be a lot of duplications. Please let me
> know if you still feel if any action is required.

Hmm, ok, I personally didn't like to have two functions one gives max
and min and the other gives only max or min, the other could have
different opinions.

How about keeping only one function say, get_min_max_off() and based
on the argument e.g. minoff & maxoff fetch the value, I mean e.g. if
minoff is not null then fetch the value otherwise skip that, same for
maxoff too.

> ---
>
> > +   if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
> > +   {
> > +   *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
> > +   return PARTCLAUSE_MATCH_NULLNESS;
> > +   }
> > +
> > +   expr = makeConst(UNKNOWNOID, -1, InvalidOid, -2, (Datum) 0,
> > true, false);
> > +   partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
> > +
> > +   partclause->keyno = partkeyidx;
> > +   partclause->expr = (Expr *) expr;
> > +   partclause->is_null = true;
> > +
> > +   if (nulltest->nulltesttype == IS_NOT_NULL)
> > +   {
> > +   partclause->op_is_ne = true;
> > +   partclause->op_strategy = InvalidStrategy;
> > +   }
> > +   else
> > +   {
> > +   partclause->op_is_ne = false;
> > +   partclause->op_strategy = BTEqualStrategyNumber;
> > +   }
> >
> > -   return PARTCLAUSE_MATCH_NULLNESS;
> > +   *pc = partclause;
> > +   return PARTCLAUSE_MATCH_CLAUSE;
> >
> > I still believe considering NULL value for match clause is not a
> > fundamentally correct thing. And that is only for List partitioning
> > which isn't aligned with the other partitioning.
>
> As other partitions which support multiple partition keys (Range
> partitioning) do not support NULL values. This feature supports
> multiple partition keys with list partitioning and it also supports
> NULL values. With the existing design, I have tried to support this
> feature with minimal changes as possible. If this is not the right
> approach to support NULL values, I would like to know how we can
> support multiple NULL values. Kindly 

Re: generalized conveyor belt storage

2021-12-29 Thread Amul Sul
On Wed, Dec 15, 2021 at 9:04 PM Robert Haas  wrote:
>
> On Wed, Dec 15, 2021 at 10:03 AM Matthias van de Meent
>  wrote:
> [...]

Thought patch is WIP, here are a few comments that I found while
reading the patch and thought might help:

+   {
+   if (meta->cbm_oldest_index_segment ==
+   meta->cbm_newest_index_segment)
+   elog(ERROR, "must remove last index segment when only one remains");
+   meta->cbm_oldest_index_segment = segno;
+   }

How about having to assert or elog to ensure that 'segno' is indeed
the successor?
--

+   if (meta->cbm_index[offset] != offset)
+   elog(ERROR,
+"index entry at offset %u was expected to be %u but found %u",
+offset, segno, meta->cbm_index[offset]);

IF condition should be : meta->cbm_index[offset] != segno ?
--

+   if (segno >= CB_METAPAGE_FREESPACE_BYTES * BITS_PER_BYTE)
+   elog(ERROR, "segment %u out of range for metapage fsm", segno);

I think CB_FSM_SEGMENTS_FOR_METAPAGE should be used like
cb_metapage_set_fsm_bit()
--

+/*
+ * Increment the count of segments allocated.
+ */
+void
+cb_metapage_increment_next_segment(CBMetapageData *meta, CBSegNo segno)
+{
+   if (segno != meta->cbm_next_segment)
+   elog(ERROR, "extending to create segment %u but next segment is %u",
+segno, meta->cbm_next_segment);

I didn't understand this error, what does it mean?  It would be
helpful to add a brief about what it means and why we are throwing it
and/or rephrasing the error bit.
--

+++ b/src/backend/access/conveyor/cbxlog.c
@@ -0,0 +1,442 @@
+/*-
+ *
+ * cbxlog.c
+ *   XLOG support for conveyor belts.
+ *
+ * For each REDO function in this file, see cbmodify.c for the
+ * corresponding function that performs the modification during normal
+ * running and logs the record that we REDO here.
+ *
+ * Copyright (c) 2016-2021, PostgreSQL Global Development Group
+ *
+ * src/backend/access/conveyor/cbmodify.c

Incorrect file name: s/cbmodify.c/cbxlog.c/
--

+   can_allocate_segment =
+   (free_segno_first_blkno < possibly_not_on_disk_blkno)

The condition should be '<=' ?
--

+ * ConveyorBeltPhysicalTruncate. For more aggressive cleanup options, see
+ * ConveyorBeltCompact or ConveyorBeltRewrite.

Didn't find ConveyorBeltCompact or ConveyorBeltRewrite code, might yet
to be implemented?
--

+ * the value computed here here if the entry at that offset is already

"here" twice.
--

And few typos:

othr
semgents
fucntion
refrenced
initialied
remve
extrordinarily
implemenation
--

Regards,
Amul




  1   2   3   4   >