Re: [meson] add missing pg_attribute_aligned for MSVC in meson build

2022-10-13 Thread Michael Paquier
On Fri, Oct 14, 2022 at 10:59:28AM +0800, Junwang Zhao wrote:
> Commit ec3c9cc add pg_attribute_aligned in MSVC[1],
> which was pushed one day before the meson commits,
> so meson build missed this feature.
> 
> [1]: 
> https://www.postgresql.org/message-id/caaaqye-hbtzvr3msomtk+hyw2s0e0oapzmw8icsmytma+mn...@mail.gmail.com

Right, thanks!  And it is possible to rely on _MSC_VER for that in
this code path.
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-13 Thread Michael Paquier
On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:
> Indeed, ;-)

So, I have spent the last two days looking at all that, studying the
structure of the patch and the existing HEAD code, and it looks like
that a few things could be consolidated.

First, as of HEAD, AuthToken is only used for elements in a list of
role and database names in hba.conf before filling in each HbaLine,
hence we limit its usage to the initial parsing.  The patch assigns an
optional regex_t to it, then extends the use of AuthToken for single
hostname entries in pg_hba.conf.  Things going first: shouldn't we
combine ident_user and "re" together in the same structure?  Even if
we finish by not using AuthToken to store the computed regex, it seems
to me that we'd better use the same base structure for pg_ident.conf
and pg_hba.conf.  While looking closely at the patch, we would expand
the use of AuthToken outside its original context.  I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes.  This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).  So with all that in mind, it feels right to
not use AuthToken at all when building each HbaLine and each
IdentLine, but a new, separate, structure.  We could call that an
AuthItem (string, its compiled regex) perhaps?  It could have its own
make() routine, taking in input an AuthToken and process
pg_regcomp().  Better ideas for this new structure would be welcome,
and the idea is that we'd store the post-parsing state of an
AuthToken to something that has a compiled regex.  We could finish by
using AuthToken at the end and expand its use, but it does not feel
completely right either to have a make() routine but not be able to
compile its regular expression when creating the AuthToken.

The logic in charge of compiling the regular expressions could be
consolidated more.  The patch refactors the logic with
token_regcomp(), uses it for the user names (ident_user in
parse_ident_line() from pg_ident.conf), then extended to the hostnames
(single item) and the role/database names (list possible in these
cases).  This approach looks right to me.  Once we plug in an AuthItem
to IdentLine, token_regcomp could be changed so as it takes an
AuthToken in input, saving directly the compiled regex_t in the input
structure.

At the end, the global structure of the code should, I guess, respect
the following rules:
- The number of places where we check if a string is a regex should be
minimal (aka string beginning by '/').
- Only one code path of hba.c should call pg_regcomp() (the patch does
that), and only one code path should call pg_regexec() (two code paths
of hba.c do that with the patch, as of the need to store matching
expression).  This should minimize the areas where we call
pg_mb2wchar_with_len(), for one.

About this last point, token_regexec() does not include
check_ident_usermap() in its logic, and it seems to me that it should.
The difference is with the expected regmatch_t structures, so
shouldn't token_regexec be extended with two arguments as of an array
of regmatch_t and the number of elements in the array?  This would
save a bit some of the logic around pg_mb2wchar_with_len(), for
example.  To make all that work, token_regexec() should return an int,
coming from pg_regexec, but no specific error strings as we don't want
to spam the logs when checking hosts, roles and databases in
pg_hba.conf.

   /* Check if it has a CIDR suffix and if so isolate it */
-  cidr_slash = strchr(str, '/');
-  if (cidr_slash)
-  *cidr_slash = '\0';
+  if (!is_regexp)
+  {
+  cidr_slash = strchr(str, '/');
+  if (cidr_slash)
+  *cidr_slash = '\0';
+  }
[...]
/* Get the netmask */
-   if (cidr_slash)
+   if (cidr_slash && !is_regexp)
{
Some of the code handling regexes for hostnames itches me a bit, like
this one.  Perhaps it would be better to evaluate this interaction
with regular expressions separately.  The database and role names
don't have this need, so their cases are much simpler to think about.

The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec(), with the introduction of a new structure that
includes the compiled regexes.  (Feel free to counterargue about the
use of AuthToken for this purpose, of course!)
- Plug in the refactored logic for the lists of role names and
database names in pg_hba.conf.
- Handle the case of single host entries in pg_hba.conf.
--
Michael


signature.asc
Description: PGP signature


Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Amit Kapila
On Fri, Oct 14, 2022 at 2:25 AM Andres Freund  wrote:
>
> >
> > Attached are two patches. The first patch is what Robert has proposed
> > with some changes in comments to emphasize the fact that cleanup lock
> > on the new bucket is just to be consistent with the old bucket page
> > locking as we are initializing it just before checking for cleanup
> > lock. In the second patch, I removed the acquisition of cleanup lock
> > on the new bucket page and changed the comments/README accordingly.
> >
> > I think we can backpatch the first patch and the second patch can be
> > just a HEAD-only patch. Does that sound reasonable to you?
>
> Not particularly, no. I don't understand how "overwrite a page and then get a
> cleanup lock" can sensibly be described by this comment:
>
> > +++ b/src/backend/access/hash/hashpage.c
> > @@ -807,7 +807,8 @@ restart_expand:
> >* before changing the metapage's mapping info, in case we can't get 
> > the
> >* disk space.  Ideally, we don't need to check for cleanup lock on 
> > new
> >* bucket as no other backend could find this bucket unless meta page 
> > is
> > -  * updated.  However, it is good to be consistent with old bucket 
> > locking.
> > +  * updated and we initialize the page just before it.  However, it is 
> > just
> > +  * to be consistent with old bucket locking.
> >*/
> >   buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
> >   if (!IsBufferCleanupOK(buf_nblkno))
>
> This is basically saying "I am breaking basic rules of locking just to be
> consistent", no?
>

Fair point. How about something like: "XXX Do we really need to check
for cleanup lock on the new bucket? Here, we initialize the page, so
ideally we don't need to perform any operation that requires such a
check."?

Feel free to suggest something better.

-- 
With Regards,
Amit Kapila.




Re: make_ctags: use -I option to ignore pg_node_attr macro

2022-10-13 Thread Yugo NAGATA
On Thu, 13 Oct 2022 15:35:09 +0900 (JST)
Tatsuo Ishii  wrote:

> > OK, that sounds good then.  I would make a feature request to have a
> > switch that supresses creation of these links, then.
> 
> Ok, I have added "-n" option to make_ctags so that it skips to create
> the links.
> 
> Also I have changed make_etags so that it exec make_ctags, which seems
> to be the consensus.

Thank you for following up my patch.
I fixed the patch to allow use both -e and -n options together.

Regards,
Yugo Nagata

> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp


-- 
Yugo NAGATA 
diff --git a/src/tools/make_ctags b/src/tools/make_ctags
index 912b6fafac..8062ff2d3e 100755
--- a/src/tools/make_ctags
+++ b/src/tools/make_ctags
@@ -1,12 +1,37 @@
 #!/bin/sh
 
-# src/tools/make_ctags
+# src/tools/make_ctags [-e] [-n]
+# If -e is specified, generate tags files for emacs.
+# If -n is specified, don't create symbolic links of tags file.
+usage="$0 [-e][-n]"
+if [ $# -gt 2 ];then
+echo $usage
+exit 1
+fi
+
+mode=
+no_symlink=
+tags_file=tags
+
+while [ $# -gt 0 ]
+do
+if [ $1 = "-e" ];then
+		mode="-e"
+		tags_file=TAGS
+elif [ $1 = "-n" ];then
+		no_symlink=yes
+else
+		echo $usage
+		exit 1
+fi
+shift
+done
 
 command -v ctags >/dev/null || \
 	{ echo "'ctags' program not found" 1>&2; exit 1; }
 
 trap "ret=$?; rm -rf /tmp/$$; exit $ret" 0 1 2 3 15
-rm -f ./tags
+rm -f ./$tags_file
 
 IS_EXUBERANT=""
 ctags --version 2>&1 | grep Exuberant && IS_EXUBERANT="Y"
@@ -34,9 +59,17 @@ then	FLAGS="--c-kinds=+dfmstuv"
 else	FLAGS="-dt"
 fi
 
+# Use -I option to ignore a macro
+if [ "$IS_EXUBERANT" ]
+then	IGNORE_IDENTIFIES="-I pg_node_attr+"
+else	IGNORE_IDENTIFIES=
+fi
+
 # this is outputting the tags into the file 'tags', and appending
-find `pwd`/ -type f -name '*.[chyl]' -print |
-	xargs ctags -a -f tags "$FLAGS"
+find `pwd`/ \( -name tmp_install -prune -o -name tmp_check -prune \) \
+	-o \( -name "*.[chly]" -o -iname "*makefile*" -o -name "*.mk" -o -name "*.in" \
+	-o -name "*.sql" -o -name "*.p[lm]" \) -type f -print |
+	xargs ctags $mode -a -f $tags_file "$FLAGS" "$IGNORE_IDENTIFIES"
 
 # Exuberant tags has a header that we cannot sort in with the other entries
 # so we skip the sort step
@@ -45,10 +78,13 @@ find `pwd`/ -type f -name '*.[chyl]' -print |
 if [ ! "$IS_EXUBERANT" ]
 then	LC_ALL=C
 	export LC_ALL
-	sort tags >/tmp/$$ && mv /tmp/$$ tags
+	sort $tags_file >/tmp/$$ && mv /tmp/$$ $tags_file
 fi
 
-find . \( -name 'CVS' -prune \) -o \( -name .git -prune \) -o -type d -print |
-while read DIR
-do	[ "$DIR" != "." ] && ln -f -s `echo "$DIR" | sed 's;/[^/]*;/..;g'`/tags "$DIR"/tags
-done
+# create symblink links
+if [ ! "$no_symlink" ];then
+find . \( -name 'CVS' -prune \) -o \( -name .git -prune \) -o -type d -print |
+	while read DIR
+	do	[ "$DIR" != "." ] && ln -f -s `echo "$DIR" | sed 's;/[^/]*;/..;g'`/$tags_file "$DIR"/$tags_file
+	done
+fi
diff --git a/src/tools/make_etags b/src/tools/make_etags
index 9288ef7b14..afc57e3e89 100755
--- a/src/tools/make_etags
+++ b/src/tools/make_etags
@@ -1,16 +1,6 @@
 #!/bin/sh
-
 # src/tools/make_etags
 
-command -v etags >/dev/null || \
-	{ echo "'etags' program not found" 1>&2; exit 1; }
-
-rm -f ./TAGS
-
-find `pwd`/ -type f -name '*.[chyl]' -print |
-	xargs etags --append -o TAGS
-
-find . \( -name CVS -prune \) -o \( -name .git -prune \) -o -type d -print |
-while read DIR
-do	[ "$DIR" != "." ] && ln -f -s `pwd`/TAGS "$DIR"
-done
+cdir=`dirname $0`
+dir=`(cd $cdir && pwd)`
+exec $dir/make_ctags -e $*


thinko in basic_archive.c

2022-10-13 Thread Nathan Bossart
Hi hackers,

I intended for the temporary file name generated by basic_archive.c to
include the current timestamp so that the name was "sufficiently unique."
Of course, this could also be used to determine the creation time, but you
can just as easily use stat(1) for that.  In any case, I forgot to divide
the microseconds field by 1000 to obtain the current timestamp in
milliseconds, so while the value is unique, it's also basically garbage.
I've attached a small patch that fixes this so that the temporary file name
includes the timestamp in milliseconds for when it was created.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index 776a386e35..a02708fc12 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -265,7 +265,7 @@ basic_archive_file_internal(const char *file, const char *path)
 	 */
 	gettimeofday(, NULL);
 	if (pg_mul_u64_overflow((uint64) 1000, (uint64) tv.tv_sec, ) ||
-		pg_add_u64_overflow(epoch, (uint64) tv.tv_usec, ))
+		pg_add_u64_overflow(epoch, (uint64) tv.tv_usec / 1000, ))
 		elog(ERROR, "could not generate temporary file name for archiving");
 
 	snprintf(temp, sizeof(temp), "%s/%s.%s.%d." UINT64_FORMAT,


Re: Perform streaming logical transactions by background workers and parallel apply

2022-10-13 Thread Amit Kapila
On Wed, Oct 12, 2022 at 7:41 AM wangw.f...@fujitsu.com
 wrote:
>
> On Fri, Oct 7, 2022 at 14:18 PM Hou, Zhijie/侯 志杰  
> wrote:
> > Attach the new version patch set which addressed most of the comments.
>
> Rebased the patch set because the new change in HEAD (776e1c8).
>
> Attach the new patch set.
>

+static void
+HandleParallelApplyMessage(ParallelApplyWorkerInfo *winfo, StringInfo msg)
{
...
+ case 'X': /* Terminate, indicating clean exit */
+ {
+ shm_mq_detach(winfo->error_mq_handle);
+ winfo->error_mq_handle = NULL;
+ break;
+ }
...
}

I don't see the use of this message in the patch. If this is not
required by the latest version then we can remove it and its
corresponding handling in parallel_apply_start_worker(). I am
referring to the below code in parallel_apply_start_worker():

+ if (tmp_winfo->error_mq_handle == NULL)
+ {
+ /*
+ * Release the worker information and try next one if the parallel
+ * apply worker exited cleanly.
+ */
+ ParallelApplyWorkersList =
foreach_delete_current(ParallelApplyWorkersList, lc);
+ shm_mq_detach(tmp_winfo->mq_handle);
+ dsm_detach(tmp_winfo->dsm_seg);
+ pfree(tmp_winfo);
+ }

-- 
With Regards,
Amit Kapila.




Re: Fix error message for MERGE foreign tables

2022-10-13 Thread Richard Guo
On Fri, Oct 14, 2022 at 12:07 PM Richard Guo  wrote:

>
> On Fri, Oct 14, 2022 at 10:59 AM bt22nakamorit <
> bt22nakamo...@oss.nttdata.com> wrote:
>
>> Hi,
>>
>> MERGE command does not accept foreign tables as targets.
>> When a foreign table is specified as a target, it shows error messages
>> like this:
>>
>> -- ERROR:  cannot execute MERGE on relation "child1"
>> -- DETAIL:  This operation is not supported for foreign tables.
>>
>> However, when a partitioned table includes foreign tables as partitions
>> and MERGE is executed on the partitioned table, following error message
>> shows.
>>
>> -- ERROR:  unexpected operation: 5
>>
>> The latter error message is unclear, and should be the same as the
>> former one.
>> The attached patch adds the code to display error the former error
>> messages in the latter case.
>> Any thoughts?
>
>
> +1. The new message is an improvement to the default one.
>
> I wonder if we can provide more details in the error message, such as
> foreign table name.
>

Maybe something like below, so that we keep it consistent with the case
of a foreign table being specified as a target.

--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1872,6 +1872,13 @@ postgresPlanForeignModify(PlannerInfo *root,
 returningList,
 _attrs);
break;
+   case CMD_MERGE:
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot execute MERGE on relation \"%s\"",
+   RelationGetRelationName(rel)),
+
 errdetail_relkind_not_supported(rel->rd_rel->relkind)));
+   break;

Thanks
Richard


Re: Fix error message for MERGE foreign tables

2022-10-13 Thread Richard Guo
On Fri, Oct 14, 2022 at 10:59 AM bt22nakamorit <
bt22nakamo...@oss.nttdata.com> wrote:

> Hi,
>
> MERGE command does not accept foreign tables as targets.
> When a foreign table is specified as a target, it shows error messages
> like this:
>
> -- ERROR:  cannot execute MERGE on relation "child1"
> -- DETAIL:  This operation is not supported for foreign tables.
>
> However, when a partitioned table includes foreign tables as partitions
> and MERGE is executed on the partitioned table, following error message
> shows.
>
> -- ERROR:  unexpected operation: 5
>
> The latter error message is unclear, and should be the same as the
> former one.
> The attached patch adds the code to display error the former error
> messages in the latter case.
> Any thoughts?


+1. The new message is an improvement to the default one.

I wonder if we can provide more details in the error message, such as
foreign table name.

Thanks
Richard


Re: GUC values - recommended way to declare the C variables?

2022-10-13 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Oct 13, 2022 at 11:14:57PM -0400, Tom Lane wrote:
>> For strings, maybe the rule could be "the
>> old value must be NULL or strcmp-equal to the boot_val".

> pg_strcasecmp()'d would be more flexible here?

Don't see the point for that.  The case we're talking about is
where the variable is declared like

char *my_guc_variable = "foo_bar";

where the initialization value is going to be a compile-time
constant.  I don't see why we'd need to allow any difference
between that constant and the one used in guc_tables.c.

On the other hand, we could insist that string values be strcmp-equal with
no allowance for NULL.  But that probably results in duplicate copies of
the string constant, and I'm not sure it buys anything in most places.
Allowing NULL doesn't seem like it creates any extra hazard for early
references, because they'd just crash if they try to use the value
while it's still NULL.

regards, tom lane




Re: GUC values - recommended way to declare the C variables?

2022-10-13 Thread Michael Paquier
On Thu, Oct 13, 2022 at 11:14:57PM -0400, Tom Lane wrote:
> Could we fix the out-of-sync risk by having InitializeGUCOptions insist
> that the pre-existing value of the variable match what is in guc_tables.c?
> That may not work for string values but I think we could insist on it
> for other GUC data types.  For strings, maybe the rule could be "the
> old value must be NULL or strcmp-equal to the boot_val".

pg_strcasecmp()'d would be more flexible here?  Sometimes the
character casing on the values is not entirely consistent, but no
objections to use something stricter, either.
--
Michael


signature.asc
Description: PGP signature


Re: [RFC] building postgres with meson - v13

2022-10-13 Thread Tom Lane
"shiy.f...@fujitsu.com"  writes:
> On Fri, Oct 14, 2022 12:40 AM Andres Freund  wrote:
>> It'd be a fair amount of work, both initially and to maintain it, to generate
>> something compatible. I can see some benefit in showing some feature
>> influencing output in --configure, but compatible output doesn't seem worth 
>> it
>> to me.

> And it's ok for me that the output is not exactly the same as before.

Yeah, the output doesn't have to be exactly the same.  But it had better
show all the options influencing the compilation, or else we will be
looking for other ways to reconstruct that information.

regards, tom lane




RE: [RFC] building postgres with meson - v13

2022-10-13 Thread shiy.f...@fujitsu.com
On Fri, Oct 14, 2022 12:40 AM Andres Freund  wrote:
> 
> Hi,
> 
> On 2022-10-13 09:24:51 +, shiy.f...@fujitsu.com wrote:
> > I noticed that `pg_config --configure` didn't show the options given when
> > building with meson.
> 
> Yes, that was noted somewhere on this thread.
> 
> 
> > Maybe it would be better if pg_config can output this information, to be
> > consistent with the output when building with `./configure` and `make`.
> >
> > The output when building with `./configure` and `make`:
> > $ pg_config --configure
> >  '--prefix=/home/postgres/install/' '--cache' 'gcc.cache' '--enable-dtrace' 
> > '--
> with-icu' '--enable-cassert'
> 
> It'd be a fair amount of work, both initially and to maintain it, to generate
> something compatible. I can see some benefit in showing some feature
> influencing output in --configure, but compatible output doesn't seem worth it
> to me.
> 

I agree that there are some benefits to showing that, which helps to confirm the
build options. Although that can be confirmed from the compile log, but the log
may not be available all the time.

And it's ok for me that the output is not exactly the same as before.

Regards,
Shi yu




Re: GUC values - recommended way to declare the C variables?

2022-10-13 Thread Tom Lane
Peter Smith  writes:
> I agree if constants are used in both places then there will always be
> some risk they can get out of sync again.

Yeah.

> But probably it is no problem to just add #defines (e.g. in
> logicallauncher.h?) to be commonly used for the C variable declaration
> and also in the guc_tables.

The problem is exactly that there's no great place to put those #define's,
at least not without incurring a lot of fresh #include bloat.

Also, if you did it like that, then it doesn't really address Michael's
desire to see the default value in the variable declaration.

I do lean towards having the data available, mainly because of the
fear I mentioned upthread that some GUCs may be accessed before
InitializeGUCOptions runs.

Could we fix the out-of-sync risk by having InitializeGUCOptions insist
that the pre-existing value of the variable match what is in guc_tables.c?
That may not work for string values but I think we could insist on it
for other GUC data types.  For strings, maybe the rule could be "the
old value must be NULL or strcmp-equal to the boot_val".

regards, tom lane




[meson] add missing pg_attribute_aligned for MSVC in meson build

2022-10-13 Thread Junwang Zhao
Hi Andres,

Commit ec3c9cc add pg_attribute_aligned in MSVC[1],
which was pushed one day before the meson commits,
so meson build missed this feature.

[1]: 
https://www.postgresql.org/message-id/caaaqye-hbtzvr3msomtk+hyw2s0e0oapzmw8icsmytma+mn...@mail.gmail.com

-- 
Regards
Junwang Zhao


0001-meson-add-missing-pg_attribute_aligned-for-MSVC.patch
Description: Binary data


Fix error message for MERGE foreign tables

2022-10-13 Thread bt22nakamorit

Hi,

MERGE command does not accept foreign tables as targets.
When a foreign table is specified as a target, it shows error messages 
like this:


-- ERROR:  cannot execute MERGE on relation "child1"
-- DETAIL:  This operation is not supported for foreign tables.

However, when a partitioned table includes foreign tables as partitions 
and MERGE is executed on the partitioned table, following error message 
shows.


-- ERROR:  unexpected operation: 5

The latter error message is unclear, and should be the same as the 
former one.
The attached patch adds the code to display error the former error 
messages in the latter case.

Any thoughts?

Best,
Tatsuhiro Nakamoridiff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d98709e5e8..b4fd54d913 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1872,6 +1872,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 			 returningList,
 			 _attrs);
 			break;
+		case CMD_MERGE:
+			elog(ERROR, "MERGE not permitted on foreign tables");
+			break;
 		default:
 			elog(ERROR, "unexpected operation: %d", (int) operation);
 			break;


Re: create subscription - improved warning message

2022-10-13 Thread Peter Smith
On Thu, Oct 13, 2022 at 9:07 AM Peter Smith  wrote:
>
> On Thu, Oct 13, 2022 at 2:01 AM Tom Lane  wrote:
> >
> > Alvaro Herrera  writes:
> > > On 2022-Oct-12, Amit Kapila wrote:
> > >> Okay, then I think we can commit the last error message patch of Peter
> > >> [1] as we have an agreement on the same, and then work on this as a
> > >> separate patch. What do you think?
> >
> > > No objection.
> >
> > Yeah, the v3-0001 patch is fine.  I agree that the docs change needs
> > more work.
>
> Thanks to everybody for the feedback/suggestions. I will work on
> improving the documentation for this and post something in a day or
> so.


PSA a patch for adding examples of how to activate a subscription that
was originally created in a disconnected mode.

The new docs are added as part of the "Logical Replication -
Subscription" section 31.2.

The CREATE SUBSCRIPTION reference page was also updated to include
links to the new docs.

Feedback welcome.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.


v4-0001-create-subscriptipon-pgdocs-for-deferred-slot-cre.patch
Description: Binary data


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-13 Thread shiy.f...@fujitsu.com
On Wed, Aug 24, 2022 12:25 AM Önder Kalacı  wrote:
> Hi,
> 
> Thanks for the review!
> 

Thanks for your reply.

> 
> >
> > 1.
> > In FilterOutNotSuitablePathsForReplIdentFull(), is
> > "nonPartialIndexPathList" a
> > good name for the list? Indexes on only expressions are also be filtered.
> >
> > +static List *
> > +FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
> > +{
> > +   ListCell   *lc;
> > +   List *nonPartialIndexPathList = NIL;
> >
> >
> Yes, true. We only started filtering the non-partial ones first. Now
> changed to *suitableIndexList*, does that look right?
> 

That looks ok to me.

> 
> 
> > 3.
> > It looks we should change the comment for FindReplTupleInLocalRel() in this
> > patch.
> >
> > /*
> >  * Try to find a tuple received from the publication side (in
> > 'remoteslot') in
> >  * the corresponding local relation using either replica identity index,
> >  * primary key or if needed, sequential scan.
> >  *
> >  * Local tuple, if found, is returned in '*localslot'.
> >  */
> > static bool
> > FindReplTupleInLocalRel(EState *estate, Relation localrel,
> >
> >
> I made a small change, just adding "index". Do you expect a larger change?
> 
> 

I think that's sufficient.

> 
> 
> > 5.
> > +   if (!AttributeNumberIsValid(mainattno))
> > +   {
> > +   /*
> > +* There are two cases to consider. First, if the
> > index is a primary or
> > +* unique key, we cannot have any indexes with
> > expressions. So, at this
> > +* point we are sure that the index we deal is not
> > these.
> > +*/
> > +   Assert(RelationGetReplicaIndex(rel) !=
> > RelationGetRelid(idxrel) &&
> > +  RelationGetPrimaryKeyIndex(rel) !=
> > RelationGetRelid(idxrel));
> > +
> > +   /*
> > +* For a non-primary/unique index with an
> > expression, we are sure that
> > +* the expression cannot be used for replication
> > index search. The
> > +* reason is that we create relevant index paths
> > by providing column
> > +* equalities. And, the planner does not pick
> > expression indexes via
> > +* column equality restrictions in the query.
> > +*/
> > +   continue;
> > +   }
> >
> > Is it possible that it is a usable index with an expression? I think
> > indexes
> > with an expression has been filtered in
> > FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index
> > with
> > an expression, maybe we shouldn't use "continue" here.
> >
> 
> 
> 
> Ok, I think there are some confusing comments in the code, which I updated.
> Also, added one more explicit Assert to make the code a little more
> readable.
> 
> We can support indexes involving expressions but not indexes that are only
> consisting of expressions. FilterOutNotSuitablePathsForReplIdentFull()
> filters out the latter, see IndexOnlyOnExpression().
> 
> So, for example, if we have an index as below, we are skipping the
> expression while building the index scan keys:
> 
> CREATE INDEX people_names ON people (firstname, lastname, (id || '_' ||
> sub_id));
> 
> We can consider removing `continue`, but that'd mean we should also adjust
> the following code-block to handle indexprs. To me, that seems like an edge
> case to implement at this point, given such an index is probably not
> common. Do you think should I try to use the indexprs as well while
> building the scan key?
> 
> I'm mostly trying to keep the complexity small. If you suggest this
> limitation should be lifted, I can give it a shot. I think the limitation I
> leave here is with a single sentence: *The index on the subscriber can only
> use simple column references.  *
> 

Thanks for your explanation. I get it and think it's OK.

> > 6.
> > In the following case, I got a result which is different from HEAD, could
> > you
> > please look into it?
> >
> > -- publisher
> > CREATE TABLE test_replica_id_full (x int);
> > ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
> > CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
> >
> > -- subscriber
> > CREATE TABLE test_replica_id_full (x int, y int);
> > CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y);
> > CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
> > port=5432' PUBLICATION tap_pub_rep_full;
> >
> > -- publisher
> > INSERT INTO test_replica_id_full VALUES (1);
> > UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;
> >
> > The data in subscriber:
> > on HEAD:
> > postgres=# select * from test_replica_id_full ;
> >  x | y
> > ---+---
> >  2 |
> > (1 row)
> >
> > After applying the patch:
> > postgres=# select * from test_replica_id_full ;
> >  x | y
> > ---+---
> >  1 |
> > (1 row)
> >

Re: GUC values - recommended way to declare the C variables?

2022-10-13 Thread Peter Smith
On Fri, Oct 14, 2022 at 8:26 AM Nathan Bossart  wrote:
>
> On Thu, Oct 13, 2022 at 09:47:25AM +0900, Michael Paquier wrote:
> > So, the initial values of max_wal_senders and max_replication_slots
> > became out of sync with their defaults in guc_tables.c.  FWIW, I would
> > argue the opposite way: rather than removing the initializations, I
> > would fix and keep them as these references can be useful when
> > browsing the area of the code related to such GUCs, without having to
> > look at guc_tables.c for this information.
>
> Well, those initializations are only useful when they are kept in sync,
> which, as demonstrated by this patch, isn't always the case.  I don't have
> a terribly strong opinion about this, but I'd lean towards reducing the
> number of places that track the default value of GUCs.
>

I agree if constants are used in both places then there will always be
some risk they can get out of sync again.

But probably it is no problem to just add #defines (e.g. in
logicallauncher.h?) to be commonly used for the C variable declaration
and also in the guc_tables. I chose not to do that way only because it
didn't seem to be the typical convention for all the other numeric
GUCs I looked at, but it's fine by me if that way is preferred

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant

2022-10-13 Thread Richard Guo
On Thu, Oct 13, 2022 at 6:43 PM David Rowley  wrote:

> On Thu, 13 Oct 2022 at 21:17, Richard Guo  wrote:
> >
> > On Thu, Oct 13, 2022 at 2:48 PM David Rowley 
> wrote:
> >> To stop the planner from adding that final sort, I opted to hack the
> >> LimitPath's pathkeys to say that it's already sorted by the
> >> PlannerInfo's sort_pathkeys.  That feels slightly icky, but it does
> >> seem a little wasteful to initialise a sort node on every execution of
> >> the plan to sort a single tuple.
> >
> >
> > I don't get how this plan comes out. It seems not correct because Limit
> > node above an unsorted path would give us an unpredictable row.
>
> Actually, you're right. That manual setting of the pathkeys is an
> unneeded remanent from a bug I fixed before sending out v2.  It can
> just be removed.
>
> I've attached the v3 patch.


The v3 patch looks good to me.

Thanks
Richard


Re: Support tls-exporter as channel binding for TLSv1.3

2022-10-13 Thread Michael Paquier
On Thu, Oct 13, 2022 at 10:30:37AM -0700, Jacob Champion wrote:
> Is the intent to backport tls-exporter client support? Or is the
> compatibility break otherwise acceptable?

Well, I'd rather say yes thanks to the complexity avoided in the
backend as that's the most sensible piece security-wise, even if we
always require a certificate to exist in PG.  A server attempting to
trick a client in downgrading would still be a problem :/

tls-exporter would be a new feature, so backporting is out of the
picture.

> It seemed like there was also some general interest in proxy TLS
> termination (see also the PROXY effort, though it has stalled a bit).
> For that, I would think tls-server-end-point is an important feature.

Oh, okay.  That's an argument in favor of not doing that, then.
Perhaps we'd better revisit the introduction of tls-exporter once we
know more about all that, and it looks like we would need a way to be
able to negotiate which channel binding to use (I recall that the
surrounding RFCs allowed some extra negotiation, vaguely, but my
impression may be wrong).
--
Michael


signature.asc
Description: PGP signature


Re: Incorrect comment regarding command completion tags

2022-10-13 Thread David Rowley
On Fri, 14 Oct 2022 at 11:38, Mark Dilger  wrote:
>
> > On Oct 13, 2022, at 2:56 PM, David Rowley  wrote:
> > In the attached, I rewrote the comment to remove mention of the
> > Asserts. I also tried to form the comment in a way that's more
> > understandable about why we always write a "0" in "INSERT 0 ".
>
> Your wording is better.  +1

Thanks for having a look. I adjusted the wording slightly as I had
written "ancient" in regards to PostgreSQL 11 and earlier.  It's
probably a bit early to call a supported version of PostgreSQL ancient
so I just decided to mention the version number instead.

I pushed the resulting patch.

Thanks for having a look.

David




Re: New "single-call SRF" APIs are very confusingly named

2022-10-13 Thread Andres Freund
Hi,

On 2022-10-14 10:28:34 +0900, Michael Paquier wrote:
> On Thu, Oct 13, 2022 at 12:48:20PM -0700, Andres Freund wrote:
> > Maybe something like InitMaterializedSRF() w/
> > MAT_SRF_(USE_EXPECTED_DESC|BLESS)
> 
> Or just SetMaterializedFuncCall()?

I think starting any function that's not a setter with Set* is very likely to
be misunderstood (SetReturning* is clearer, but long). This just reads like
you're setting the materialized function call on something.


> Do we always have to mention the SRF part of it once we tell about the
> materialization part?

Yes. The SRF is the important part.


> The latter sort implies the former once a function returns multiple tuples.

There's lot of other other things that can be materialized.


> I don't mind doing some renaming of all that even post-release, though
> comes the question of keeping some compabitility macros for
> compilation in case one uses these routines?

Agreed that we'd need compat. I think it'd need to be compatibility function,
not just renaming via macro, so we keep ABI compatibility.

Greetings,

Andres Freund




Re: New "single-call SRF" APIs are very confusingly named

2022-10-13 Thread Michael Paquier
On Thu, Oct 13, 2022 at 12:48:20PM -0700, Andres Freund wrote:
> I unfortunately just noticed this now, just after we released...

Thanks for the feedback.

> Even leaving the confusion with ExprSingleResult aside, calling it "Single"
> still seems very non-descriptive. I assume it's named to contrast with
> init_MultiFuncCall() etc. While those are also not named greatly, they're not
> typically used directly but wraped in SRF_FIRSTCALL_INIT etc.

This is mentioned on the original thread here, as of the point that we
go through the function one single time:
https://www.postgresql.org/message-id/yh8sbtuzkhq7j...@paquier.xyz

> I also quite intensely dislike SRF_SINGLE_USE_EXPECTED. It sounds like it's
> saying that a single use of the SRF is expected, but that's not at all what it
> means: "use expectedDesc as tupdesc".

Okay.  Something like the USE_EXPECTED_DESC you are suggesting or
USE_EXPECTED_TUPLE_DESC would be fine by me.

> I'm also confused by SRF_SINGLE_BLESS - the comment says "validate tuple for
> SRF". BlessTupleDesc can't really be described as validation, or am I missing
> something?

I'd rather keep the flag name to match the history behind this API.
How about updating the comment as of "complete tuple descriptor, for a
transient RECORD datatype", or something like that?

> Maybe something like InitMaterializedSRF() w/
> MAT_SRF_(USE_EXPECTED_DESC|BLESS)

Or just SetMaterializedFuncCall()?  Do we always have to mention the
SRF part of it once we tell about the materialization part?  The
latter sort implies the former once a function returns multiple
tuples.

I don't mind doing some renaming of all that even post-release, though
comes the question of keeping some compabitility macros for
compilation in case one uses these routines?  Any existing extension
code works out-of-the-box without these new routines, so the chance of
somebody using the new stuff outside core sounds rather limited but it
does not seem worth taking a risk, either..  And this has been in the
tree for a bit more than half a year now.
--
Michael


signature.asc
Description: PGP signature


Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Peter Geoghegan
On Thu, Oct 13, 2022 at 6:10 PM Andres Freund  wrote:
> My point here is a lot more mundane. The code essentially does
> _hash_pageinit(), overwriting the whole page, and *then* conditionally
> acquires a cleanup lock.  It simply is bogus code.

I understood that that was what you meant. It's easy to see why this
code is broken, but to me it seems related to having too much
confidence in what is possible while relying on cleanup locks. That's
just my take.

-- 
Peter Geoghegan




Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Andres Freund
Hi,

On 2022-10-13 17:46:25 -0700, Peter Geoghegan wrote:
> On Fri, Sep 30, 2022 at 12:05 PM Andres Freund  wrote:
> > My problem with this approach is that the whole cleanup lock is hugely
> > misleading as-is.
>
> While nbtree VACUUM does use cleanup locks, they don't protect the
> index structure itself -- it actually functions as an interlock
> against concurrent TID recycling, which might otherwise confuse
> in-flight index scans. That's why we need cleanup locks for VACUUM,
> but not for index deletions, even though the physical modifications
> that are performed to physical leaf pages are identical (the WAL
> records are almost identical). Clearly the use of cleanup locks is not
> really about protecting the leaf page itself -- it's about using the
> physical leaf page as a proxy for the heap TIDs contained therein. A
> very narrow protocol with a very specific purpose.
>
> More generally, cleanup locks exist to protect transient references
> that point into a heap page. References held by one backend only. A
> TID, or a HeapTuple C pointer, or something similar. Cleanup locks are
> not intended to protect a physical data structure in the heap, either
> -- just a reference/pointer that points to the structure. There are
> implications for the physical page structure itself, of course, but
> that seems secondary. The guarantees are often limited to "never allow
> the backend holding the pin to become utterly confused".
>
> I am skeptical of the idea of using cleanup locks for anything more
> ambitious than this. Especially in index AM code. It seems
> uncomfortably close to "a buffer lock, but somehow also not a buffer
> lock".

My point here is a lot more mundane. The code essentially does
_hash_pageinit(), overwriting the whole page, and *then* conditionally
acquires a cleanup lock.  It simply is bogus code.

Greetings,

Andres Freund




Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Peter Geoghegan
On Fri, Sep 30, 2022 at 12:05 PM Andres Freund  wrote:
> My problem with this approach is that the whole cleanup lock is hugely
> misleading as-is.

While nbtree VACUUM does use cleanup locks, they don't protect the
index structure itself -- it actually functions as an interlock
against concurrent TID recycling, which might otherwise confuse
in-flight index scans. That's why we need cleanup locks for VACUUM,
but not for index deletions, even though the physical modifications
that are performed to physical leaf pages are identical (the WAL
records are almost identical). Clearly the use of cleanup locks is not
really about protecting the leaf page itself -- it's about using the
physical leaf page as a proxy for the heap TIDs contained therein. A
very narrow protocol with a very specific purpose.

More generally, cleanup locks exist to protect transient references
that point into a heap page. References held by one backend only. A
TID, or a HeapTuple C pointer, or something similar. Cleanup locks are
not intended to protect a physical data structure in the heap, either
-- just a reference/pointer that points to the structure. There are
implications for the physical page structure itself, of course, but
that seems secondary. The guarantees are often limited to "never allow
the backend holding the pin to become utterly confused".

I am skeptical of the idea of using cleanup locks for anything more
ambitious than this. Especially in index AM code. It seems
uncomfortably close to "a buffer lock, but somehow also not a buffer
lock".

-- 
Peter Geoghegan




Re: archive modules

2022-10-13 Thread Michael Paquier
On Thu, Oct 13, 2022 at 02:53:38PM -0400, Tom Lane wrote:
> Yeah, it really does not work to use GUC hooks to enforce multi-variable
> constraints.  We've learned that the hard way (more than once, if memory
> serves).

414c2fd is one of the most recent ones.  Its thread is about the same
thing.
--
Michael


signature.asc
Description: PGP signature


Re: problems with making relfilenodes 56-bits

2022-10-13 Thread Matthias van de Meent
On Wed, 12 Oct 2022 at 23:13, Andres Freund  wrote:
>
> Hi,
>
> On 2022-10-12 22:05:30 +0200, Matthias van de Meent wrote:
> > On Wed, 5 Oct 2022 at 01:50, Andres Freund  wrote:
> > > I light dusted off my old varint implementation from [1] and converted the
> > > RelFileLocator and BlockNumber from fixed width integers to varint ones. 
> > > This
> > > isn't meant as a serious patch, but an experiment to see if this is a path
> > > worth pursuing.
> > >
> > > A run of installcheck in a cluster with autovacuum=off, 
> > > full_page_writes=off
> > > (for increased reproducability) shows a decent saving:
> > >
> > > master: 241106544 - 230 MB
> > > varint: 227858640 - 217 MB
> >
> > I think a signficant part of this improvement comes from the premise
> > of starting with a fresh database. tablespace OID will indeed most
> > likely be low, but database OID may very well be linearly distributed
> > if concurrent workloads in the cluster include updating (potentially
> > unlogged) TOASTed columns and the databases are not created in one
> > "big bang" but over the lifetime of the cluster. In that case DBOID
> > will consume 5B for a significant fraction of databases (anything with
> > OID >=2^28).
> >
> > My point being: I don't think that we should have different WAL
> > performance in databases which is dependent on which OID was assigned
> > to that database.
>
> To me this is raising the bar to an absurd level. Some minor space usage
> increase after oid wraparound and for very large block numbers isn't a huge
> issue - if you're in that situation you already have a huge amount of wal.

I didn't want to block all varlen encoding, I just want to make clear
that I don't think it's great for performance testing and consistency
across installations if WAL size (and thus part of your performance)
is dependent on which actual database/relation/tablespace combination
you're running your workload in.

With the 56-bit relfilenode, the size of a block reference would
realistically differ between 7 bytes and 23 bytes:

- tblspc=0=1B
  db=16386=3B
  rel=797=2B (787 = 4 * default # of data relations in a fresh DB in PG14, + 1)
  block=0=1B

vs

- tsp>=2^28 = 5B
  dat>=2^28 =5B
  rel>=2^49 =8B
  block>=2^28 =5B

That's a difference of 16 bytes, of which only the block number can
realistically be directly influenced by the user ("just don't have
relations larger than X blocks").

If applied to Dilip's pgbench transaction data, that would imply a
minimum per transaction wal usage of 509 bytes, and a maximum per
transaction wal usage of 609 bytes. That is nearly a 20% difference in
WAL size based only on the location of your data, and I'm just not
comfortable with that. Users have little or zero control over the
internal IDs we assign to these fields, while it would affect
performance fairly significantly.

(difference % between min/max wal size is unchanged (within 0.1%)
after accounting for record alignment)

> > 0002 - Rework XLogRecord
> > This makes many fields in the xlog header optional, reducing the size
> > of many xlog records by several bytes. This implements the design I
> > shared in my earlier message [1].
> >
> > 0003 - Rework XLogRecordBlockHeader.
> > This patch could be applied on current head, and saves some bytes in
> > per-block data. It potentially saves some bytes per registered
> > block/buffer in the WAL record (max 2 bytes for the first block, after
> > that up to 3). See the patch's commit message in the patch for
> > detailed information.
>
> The amount of complexity these two introduce seems quite substantial to
> me. Both from an maintenance and a runtime perspective. I think we'd be better
> off using building blocks like variable lengths encoded values than open
> coding it in many places.

I guess that's true for length fields, but I don't think dynamic
header field presence (the 0002 rewrite, and the omission of
data_length in 0003) is that bad. We already have dynamic data
inclusion through block ids 25x; I'm not sure why we couldn't do that
more compactly with bitfields as indicators instead (hence the dynamic
header size).

As for complexity, I think my current patchset is mostly complex due
to a lack of tooling. Note that decoding makes common use of
COPY_HEADER_FIELD, which we don't really have an equivalent for in
XLogRecordAssemble. I think the code for 0002 would improve
significantly in readability if such construct would be available.

To reduce complexity in 0003, I could drop the 'repeat id'
optimization, as that reduces the complexity significantly, at the
cost of not saving that 1 byte per registered block after the first.

Kind regards,

Matthias van de Meent




Re: Incorrect comment regarding command completion tags

2022-10-13 Thread Mark Dilger



> On Oct 13, 2022, at 2:56 PM, David Rowley  wrote:
> 
> I was looking at the code in EndCommand() and noticed a comment
> talking about some Asserts which didn't seem to exist in the code.
> The comment also talks about LastOid which looks like the name of a
> variable that's nowhere to be seen.
> 
> It looks like the Asserts did exists when the completion tag patch was
> being developed [1] but they disappeared somewhere along the line and
> the comment didn't get an update before 2f9661311 went in.
> 
> In the attached, I rewrote the comment to remove mention of the
> Asserts. I also tried to form the comment in a way that's more
> understandable about why we always write a "0" in "INSERT 0 ".

Your wording is better.  +1

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: New docs chapter on Transaction Management and related changes

2022-10-13 Thread Bruce Momjian
On Thu, Oct 13, 2022 at 11:54:51PM +0200, Erik Rijkers wrote:
> > [xact.diff]
> 
> I think that
>   'This chapter explains how the control the reliability of'
> 
> should be:
> 'This chapter explains how to control the reliability of'
> 
> 
> And in these lines:
> +   together in a transactional manner.  The commands PREPARE
> +   TRANSACTION, COMMIT PREPARED and
> +   ROLLBACK PREPARED.  Two-phase transactions
> 
> 'The commands'
> 
> should be
> 'The commands are'

Thanks, updated patch attached.  You can see the output at:

https://momjian.us/tmp/pgsql/

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 66312b53b8..024a3c5101 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7224,12 +7224,14 @@ local0.*/var/log/postgresql
 
 
  %v
- Virtual transaction ID (backendID/localXID)
+ Virtual transaction ID (backendID/localXID);  see
+ 
  no
 
 
  %x
- Transaction ID (0 if none is assigned)
+ Transaction ID (0 if none is assigned);  see
+ 
  no
 
 
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index b030b36002..fdffba4442 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4992,7 +4992,8 @@ WHERE ...
 xmin and xmax.  Transaction identifiers are 32-bit quantities.
 In some contexts, a 64-bit variant xid8 is used.  Unlike
 xid values, xid8 values increase strictly
-monotonically and cannot be reused in the lifetime of a database cluster.
+monotonically and cannot be reused in the lifetime of a database
+cluster.  See  for more details.

 

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index de450cd661..0d6be9a2fa 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -104,6 +104,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b5a2f94c4e..1e0d587932 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24676,7 +24676,10 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns the current transaction's ID.  It will assign a new one if the
 current transaction does not have one already (because it has not
-performed any database updates).
+performed any database updates);  see  for details.  If executed in a
+subtransaction this will return the top-level xid;  see  for details.

   
 
@@ -24693,6 +24696,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 ID is assigned yet.  (It's best to use this variant if the transaction
 might otherwise be read-only, to avoid unnecessary consumption of an
 XID.)
+If executed in a subtransaction this will return the top-level xid.

   
 
@@ -24736,6 +24740,8 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns a current snapshot, a data structure
 showing which transaction IDs are now in-progress.
+Only top-level xids are included in the snapshot; subxids are not
+shown;  see  for details.

   
 
@@ -24790,7 +24796,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
 Is the given transaction ID visible according
 to this snapshot (that is, was it completed before the snapshot was
 taken)?  Note that this function will not give the correct answer for
-a subtransaction ID.
+a subtransaction ID (subxid);  see  for
+details.

   
  
@@ -24802,8 +24809,9 @@ SELECT collation for ('foo' COLLATE "de_DE");
 wraps around every 4 billion transactions.  However,
 the functions shown in  use a
 64-bit type xid8 that does not wrap around during the life
-of an installation, and can be converted to xid by casting if
-required.  The data type pg_snapshot stores information about
+of an installation and can be converted to xid by casting if
+required;  see  for details. 
+The data type pg_snapshot stores information about
 transaction ID visibility at a particular moment in time.  Its components
 are described in .
 pg_snapshot's textual representation is
@@ -24849,7 +24857,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 xmax and not in this list was already completed at the time
 of the snapshot, and thus is either visible or dead according to its
 commit status.  This list does not include the transaction IDs of
-subtransactions.
+subtransactions (subxids).

   
  
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index d6d0a3a814..fe138a47bb 100644

Re: remove redundant memset() call

2022-10-13 Thread Daniel Gustafsson
> On 13 Oct 2022, at 21:59, Bruce Momjian  wrote:
> 
> On Thu, Oct 13, 2022 at 09:40:42PM +0200, Daniel Gustafsson wrote:
>>> On 13 Oct 2022, at 21:18, Nathan Bossart  wrote:
>>> 
>>> On Thu, Oct 13, 2022 at 03:15:13PM -0400, Bruce Momjian wrote:
 On Thu, Oct 13, 2022 at 12:12:35PM -0700, Zhihong Yu wrote:
> the memory has been zero'ed out by palloc0().
> 
> memcpy is not relevant w.r.t. resetting memory.
 
 Ah, good point.
>>> 
>>> Yeah, it looks like this was missed in ca7f8e2.
>> 
>> Agreed, it looks like I missed that one, I can't see any reason to keep it. 
>> Do
>> you want me to take care of it Bruce, and clean up after myself, or are you
>> already on it?
> 
> You can do it, thanks.

Done now.

--
Daniel Gustafsson   https://vmware.com/





Re: New docs chapter on Transaction Management and related changes

2022-10-13 Thread Erik Rijkers

Op 13-10-2022 om 23:28 schreef Bruce Momjian:

On Tue, Sep 13, 2022 at 03:02:34PM +0100, Simon Riggs wrote:

Thanks Robert. I've tried to accommodate all of your thoughts, plus Alvaro's.

New v5 attached.

Happy to receive further comments.


I liked this patch very much.  It gives details on a lot of the
internals we expose to users.  Some of my changes were:

*  tightening the wording
*  restructuring the flow
*  splitting out user-visible details (prepared transactions) from
internals, e.g., xid, vxid, subtransactions
*  adding references from places in our docs to these new sections



[xact.diff]


I think that
  'This chapter explains how the control the reliability of'

should be:
'This chapter explains how to control the reliability of'


And in these lines:
+   together in a transactional manner.  The commands PREPARE
+   TRANSACTION, COMMIT PREPARED and
+   ROLLBACK PREPARED.  Two-phase transactions

'The commands'

should be
'The commands are'


thanks,

Erik Rijkers



I plan to apply this and backpatch it to all supported versions since
these details apply to all versions.  These docs should enable our users
to much better understand and monitor Postgres.

Updated patch attached.





Incorrect comment regarding command completion tags

2022-10-13 Thread David Rowley
I was looking at the code in EndCommand() and noticed a comment
talking about some Asserts which didn't seem to exist in the code.
The comment also talks about LastOid which looks like the name of a
variable that's nowhere to be seen.

It looks like the Asserts did exists when the completion tag patch was
being developed [1] but they disappeared somewhere along the line and
the comment didn't get an update before 2f9661311 went in.

In the attached, I rewrote the comment to remove mention of the
Asserts. I also tried to form the comment in a way that's more
understandable about why we always write a "0" in "INSERT 0 ".

David

[1] 
https://www.postgresql.org/message-id/1c642743-8e46-4246-b4a0-c9a638b3e...@enterprisedb.com
diff --git a/src/backend/tcop/dest.c b/src/backend/tcop/dest.c
index c952cbea8b..e6934d7b66 100644
--- a/src/backend/tcop/dest.c
+++ b/src/backend/tcop/dest.c
@@ -179,12 +179,11 @@ EndCommand(const QueryCompletion *qc, CommandDest dest, 
bool force_undecorated_o
 * We assume the tagname is plain ASCII and therefore 
requires no
 * encoding conversion.
 *
-* We no longer display LastOid, but to preserve the 
wire
-* protocol, we write InvalidOid where the LastOid used 
to be
-* written.
-*
-* All cases where LastOid was written also write 
nprocessed
-* count, so just Assert that rather than having an 
extra test.
+* In ancient versions of PostgreSQL, INSERT used to 
include the
+* Oid of the inserted record in the completion tag.  
We no longer
+* support tables with Oids, so to maintain 
compatibility in the
+* wire protocol, we write a "0" for InvalidOid in the 
location
+* where we once wrote the inserted record's Oid.
 */
tag = qc->commandTag;
tagname = GetCommandTagName(tag);


Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options

2022-10-13 Thread Zhihong Yu
On Wed, Oct 12, 2022 at 5:35 PM David Rowley  wrote:

> On Wed, 12 Oct 2022 at 16:33, Vik Fearing  wrote:
> > Per spec, the ROW_NUMBER() window function is not even allowed to have a
> > frame specified.
> >
> >  b) The window framing clause of WDX shall not be present.
> >
> > Also, the specification for ROW_NUMBER() is:
> >
> >  f) ROW_NUMBER() OVER WNS is equivalent to the :
> >
> >  COUNT (*) OVER (WNS1 ROWS UNBOUNDED PRECEDING)
> >
> >
> > So I don't think we need to test for anything at all and can
> > indiscriminately add or replace the frame with ROWS UNBOUNDED PRECEDING.
>
> Thanks for digging that out.
>
> Just above that I see:
>
> RANK() OVER WNS is equivalent to:
> ( COUNT (*) OVER (WNS1 RANGE UNBOUNDED PRECEDING)
> - COUNT (*) OVER (WNS1 RANGE CURRENT ROW) + 1 )
>
> and
>
> DENSE_RANK() OVER WNS is equivalent to the :
> COUNT (DISTINCT ROW ( VE1, ..., VEN ) )
> OVER (WNS1 RANGE UNBOUNDED PRECEDING)
>
> So it looks like the same can be done for rank() and dense_rank() too.
> I've added support for those in the attached.
>
> This also got me thinking that maybe we should be a bit more generic
> with the support function node tag name. After looking at the
> nodeWindowAgg.c code for a while, I wondered if we might want to add
> some optimisations in the future that makes WindowAgg not bother
> storing tuples for row_number(), rank() and dense_rank().  That might
> save a bit of overhead from the tuple store.  I imagined that we'd
> want to allow the expansion of this support request so that the
> support function could let the planner know if any tuples will be
> accessed by the window function or not.  The
> SupportRequestWFuncOptimizeFrameOpts name didn't seem very fitting for
> that so I adjusted it to become SupportRequestOptimizeWindowClause
> instead.
>
> The updated patch is attached.
>
> David
>
Hi,

+   req->frameOptions = (FRAMEOPTION_ROWS |
+FRAMEOPTION_START_UNBOUNDED_PRECEDING |
+FRAMEOPTION_END_CURRENT_ROW);

The bit combination appears multiple times in the patch.
Maybe define the combination as a constant in supportnodes.h and reference
it in the code.

Cheers


Re: New docs chapter on Transaction Management and related changes

2022-10-13 Thread Bruce Momjian
On Tue, Sep 13, 2022 at 03:02:34PM +0100, Simon Riggs wrote:
> Thanks Robert. I've tried to accommodate all of your thoughts, plus Alvaro's.
> 
> New v5 attached.
> 
> Happy to receive further comments.

I liked this patch very much.  It gives details on a lot of the
internals we expose to users.  Some of my changes were:

*  tightening the wording
*  restructuring the flow
*  splitting out user-visible details (prepared transactions) from
   internals, e.g., xid, vxid, subtransactions
*  adding references from places in our docs to these new sections

I plan to apply this and backpatch it to all supported versions since
these details apply to all versions.  These docs should enable our users
to much better understand and monitor Postgres.

Updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 66312b53b8..024a3c5101 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7224,12 +7224,14 @@ local0.*/var/log/postgresql
 
 
  %v
- Virtual transaction ID (backendID/localXID)
+ Virtual transaction ID (backendID/localXID);  see
+ 
  no
 
 
  %x
- Transaction ID (0 if none is assigned)
+ Transaction ID (0 if none is assigned);  see
+ 
  no
 
 
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index b030b36002..fdffba4442 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4992,7 +4992,8 @@ WHERE ...
 xmin and xmax.  Transaction identifiers are 32-bit quantities.
 In some contexts, a 64-bit variant xid8 is used.  Unlike
 xid values, xid8 values increase strictly
-monotonically and cannot be reused in the lifetime of a database cluster.
+monotonically and cannot be reused in the lifetime of a database
+cluster.  See  for more details.

 

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index de450cd661..0d6be9a2fa 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -104,6 +104,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b5a2f94c4e..1e0d587932 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24676,7 +24676,10 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns the current transaction's ID.  It will assign a new one if the
 current transaction does not have one already (because it has not
-performed any database updates).
+performed any database updates);  see  for details.  If executed in a
+subtransaction this will return the top-level xid;  see  for details.

   
 
@@ -24693,6 +24696,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 ID is assigned yet.  (It's best to use this variant if the transaction
 might otherwise be read-only, to avoid unnecessary consumption of an
 XID.)
+If executed in a subtransaction this will return the top-level xid.

   
 
@@ -24736,6 +24740,8 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns a current snapshot, a data structure
 showing which transaction IDs are now in-progress.
+Only top-level xids are included in the snapshot; subxids are not
+shown;  see  for details.

   
 
@@ -24790,7 +24796,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
 Is the given transaction ID visible according
 to this snapshot (that is, was it completed before the snapshot was
 taken)?  Note that this function will not give the correct answer for
-a subtransaction ID.
+a subtransaction ID (subxid);  see  for
+details.

   
  
@@ -24802,8 +24809,9 @@ SELECT collation for ('foo' COLLATE "de_DE");
 wraps around every 4 billion transactions.  However,
 the functions shown in  use a
 64-bit type xid8 that does not wrap around during the life
-of an installation, and can be converted to xid by casting if
-required.  The data type pg_snapshot stores information about
+of an installation and can be converted to xid by casting if
+required;  see  for details. 
+The data type pg_snapshot stores information about
 transaction ID visibility at a particular moment in time.  Its components
 are described in .
 pg_snapshot's textual representation is
@@ -24849,7 +24857,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 xmax and not in this list was already completed at the time
 of the snapshot, and thus is either visible or dead according to its
 commit status.  This list does not include the 

Re: GUC values - recommended way to declare the C variables?

2022-10-13 Thread Nathan Bossart
On Thu, Oct 13, 2022 at 09:47:25AM +0900, Michael Paquier wrote:
> So, the initial values of max_wal_senders and max_replication_slots
> became out of sync with their defaults in guc_tables.c.  FWIW, I would
> argue the opposite way: rather than removing the initializations, I
> would fix and keep them as these references can be useful when
> browsing the area of the code related to such GUCs, without having to
> look at guc_tables.c for this information.

Well, those initializations are only useful when they are kept in sync,
which, as demonstrated by this patch, isn't always the case.  I don't have
a terribly strong opinion about this, but I'd lean towards reducing the
number of places that track the default value of GUCs.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




WIP: Analyze whether our docs need more granular refentries.

2022-10-13 Thread Corey Huinker
In reviewing another patch, I noticed that the documentation had an xref to
a fairly large page of documentation (create_table.sgml), and I wondered if
that link was chosen because the original author genuinely felt the entire
page was relevant, or merely because a more granular link did not exist at
the time, and this link had been carried forward since then while the
referenced page grew in complexity.

In the interest of narrowing the problem down to a manageable size, I wrote
a script (attached) to find all xrefs and rank them by criteria[1] that I
believe hints at the possibility that the xrefs should be more granular
than they are.

I intend to use the script output below as a guide for manually reviewing
the references and seeing if there are opportunities to guide the reader to
the relevant section of those pages.

In case anyone is curious, here is a top excerpt of the script output:

file_name  link_name link_count
 line_count  num_refentries
-    --
 --  --
ref/psql-ref.sgml  app-psql  20
 52151
ecpg.sgml  ecpg-sql-allocate-descriptor  4
  10101   17
ref/create_table.sgml  sql-createtable   23
 24371
ref/select.sgmlsql-select23
 22071
ref/create_function.sgml   sql-createfunction30
 935 1
ref/alter_table.sgml   sql-altertable12
 17761
ref/pg_dump.sgml   app-pgdump11
 15451
ref/pg_basebackup.sgml app-pgbasebackup  11
 10081
ref/create_type.sgml   sql-createtype10
 10291
ref/create_index.sgml  sql-createindex   9
  999 1
ref/postgres-ref.sgml  app-postgres  10
 845 1
ref/copy.sgml  sql-copy  7
  10811
ref/create_role.sgml   sql-createrole13
 511 1
ref/grant.sgml sql-grant 13
 507 1
ref/create_foreign_table.sgml  sql-createforeigntable14
 455 1
ref/insert.sgmlsql-insert8
  792 1
ref/pg_ctl-ref.sgmlapp-pg-ctl8
  713 1
ref/create_trigger.sgmlsql-createtrigger 7
  777 1
ref/set.sgml   sql-set   15
 332 1
ref/create_aggregate.sgml  sql-createaggregate   6
  805 1
ref/initdb.sgmlapp-initdb8
  588 1
ref/create_policy.sgml sql-createpolicy  7
  655 1
dblink.sgmlcontrib-dblink-connect1
  213619
ref/create_subscription.sgml   sql-createsubscription9
  472 1

Some of these will clearly be false positives. For instance, dblink.sgml
and ecpg.sgml have a lot of refentries, but they seem to lack a global
"top" refentry which I assumed would be there.

On the other hand, I have to wonder if the references to psql might be to a
specific feature of the tool, and perhaps we can create refentries to those.

[1] The criteria is: must be first refentry in file, file must be at least
200 lines long, then rank by lines*references, 2x for referencing the top
refentry when others exist


xref-analysis.sh
Description: application/shellscript


Re: libpq support for NegotiateProtocolVersion

2022-10-13 Thread Nathan Bossart
On Thu, Oct 13, 2022 at 10:33:01AM +0200, Peter Eisentraut wrote:
> + if (their_version != conn->pversion)

Shouldn't this be 'their_version < conn->pversion'?  If the server supports
a later protocol than what is requested but not all the requested protocol
extensions, I think libpq would still report "protocol version not
supported."

> + appendPQExpBuffer(>errorMessage,
> +   libpq_gettext("protocol 
> version not supported by server: client uses %d.%d, server supports %d.%d\n"),
> +   
> PG_PROTOCOL_MAJOR(conn->pversion), PG_PROTOCOL_MINOR(conn->pversion),
> +   
> PG_PROTOCOL_MAJOR(their_version), PG_PROTOCOL_MINOR(their_version));

Should this match the error in postmaster.c and provide the range of
versions the server supports?  The FATAL in postmaster.c is for the major
version, but I believe the same information is relevant when a
NegotiateProtocolVersion message is sent.

ereport(FATAL,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("unsupported frontend protocol %u.%u: 
server supports %u.0 to %u.%u",

> + else
> + appendPQExpBuffer(>errorMessage,
> +   libpq_gettext("protocol 
> extension not supported by server: %s\n"), buf.data);

nitpick: s/extension/extensions

What if neither the protocol version nor the requested extensions are
supported?  Right now, I think only the unsupported protocol version is
supported in that case, but presumably we could report both pretty easily.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Andres Freund
Hi,

On 2022-10-06 12:44:24 +0530, Amit Kapila wrote:
> On Sat, Oct 1, 2022 at 12:35 AM Andres Freund  wrote:
> > My problem with this approach is that the whole cleanup lock is hugely
> > misleading as-is. As I noted in
> > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de
> > we take the cleanup lock *after* re-initializing the page. Thereby
> > completely breaking the properties that a cleanup lock normally tries to
> > guarantee.
> >
> > Even if that were to achieve something useful (doubtful in this case),
> > it'd need a huge comment explaining what's going on.
> >
>
> Attached are two patches. The first patch is what Robert has proposed
> with some changes in comments to emphasize the fact that cleanup lock
> on the new bucket is just to be consistent with the old bucket page
> locking as we are initializing it just before checking for cleanup
> lock. In the second patch, I removed the acquisition of cleanup lock
> on the new bucket page and changed the comments/README accordingly.
>
> I think we can backpatch the first patch and the second patch can be
> just a HEAD-only patch. Does that sound reasonable to you?

Not particularly, no. I don't understand how "overwrite a page and then get a
cleanup lock" can sensibly be described by this comment:

> +++ b/src/backend/access/hash/hashpage.c
> @@ -807,7 +807,8 @@ restart_expand:
>* before changing the metapage's mapping info, in case we can't get the
>* disk space.  Ideally, we don't need to check for cleanup lock on new
>* bucket as no other backend could find this bucket unless meta page is
> -  * updated.  However, it is good to be consistent with old bucket 
> locking.
> +  * updated and we initialize the page just before it.  However, it is 
> just
> +  * to be consistent with old bucket locking.
>*/
>   buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
>   if (!IsBufferCleanupOK(buf_nblkno))

This is basically saying "I am breaking basic rules of locking just to be
consistent", no?

Greetings,

Andres Freund




Re: remove redundant memset() call

2022-10-13 Thread Bruce Momjian
On Thu, Oct 13, 2022 at 09:40:42PM +0200, Daniel Gustafsson wrote:
> > On 13 Oct 2022, at 21:18, Nathan Bossart  wrote:
> > 
> > On Thu, Oct 13, 2022 at 03:15:13PM -0400, Bruce Momjian wrote:
> >> On Thu, Oct 13, 2022 at 12:12:35PM -0700, Zhihong Yu wrote:
> >>> the memory has been zero'ed out by palloc0().
> >>> 
> >>> memcpy is not relevant w.r.t. resetting memory.
> >> 
> >> Ah, good point.
> > 
> > Yeah, it looks like this was missed in ca7f8e2.
> 
> Agreed, it looks like I missed that one, I can't see any reason to keep it.  
> Do
> you want me to take care of it Bruce, and clean up after myself, or are you
> already on it?

You can do it, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Lack of PageSetLSN in heap_xlog_visible

2022-10-13 Thread Jeff Davis
On Thu, 2022-10-13 at 12:50 +0300, Konstantin Knizhnik wrote:
>      /*
>       * We don't bump the LSN of the heap page when setting the 
> visibility
>       * map bit (unless checksums or wal_hint_bits is enabled, in
> which
>       * case we must), because that would generate an unworkable 
> volume of
>       * full-page writes.

It clearly says there that it must set the page LSN, but I don't see
where that's happening. It seems to go all the way back to the original
checksums commit, 96ef3b8ff1.

I can reproduce a case where a replica ends up with a different page
header than the primary (checksums enabled):

  Primary:
create extension pageinspect;
create table t(i int) with (autovacuum_enabled=off);
insert into t values(0);

  Shut down and restart primary and replica.

  Primary:
insert into t values(1);
vacuum t;

  Crash replica and let it recover.

  Shut down and restart primary and replica.

  Primary:
select * from page_header(get_raw_page('t', 0));

  Replica:
select * from page_header(get_raw_page('t', 0));

The LSN on the replica is lower, but the flags are the same
(PD_ALL_VISIBLE set). That's a problem, right? The checksums are valid
on both, though.

It may violate our torn page protections for checksums, as well, but I
couldn't construct a scenario for that because recovery can only create
restartpoints at certain times.

> But it still not clear for me that not bumping LSN in this place is 
> correct if wal_log_hints is set.
> In this case we will have VM page with larger LSN than heap page, 
> because visibilitymap_set
> bumps LSN of VM page. It means that in theory after recovery we may
> have 
> page marked as all-visible in VM,
> but not having PD_ALL_VISIBLE  in page header. And it violates VM 
> constraint:

I'm not quite following this scenario. If the heap page has a lower LSN
than the VM page, how could we recover to a point where the VM bit is
set but the heap flag isn't? And what does it have to do with
wal_log_hints/checksums?


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS






New "single-call SRF" APIs are very confusingly named

2022-10-13 Thread Andres Freund
Hi,

I unfortunately just noticed this now, just after we released...

In

commit 9e98583898c347e007958c8a09911be2ea4acfb9
Author: Michael Paquier 
Date:   2022-03-07 10:26:29 +0900

Create routine able to set single-call SRFs for Materialize mode


a new helper was added:

#define SRF_SINGLE_USE_EXPECTED 0x01/* use expectedDesc as tupdesc */
#define SRF_SINGLE_BLESS0x02/* validate tuple for SRF */
extern void SetSingleFuncCall(FunctionCallInfo fcinfo, bits32 flags);


I think the naming here is very poor. For one, "Single" here conflicts with
ExprSingleResult which indicates "expression does not return a set",
i.e. precisely the opposite what SetSingleFuncCall() is used for. For another
the "Set" in SetSingleFuncCall makes it sound like it's function setting a
property.

Even leaving the confusion with ExprSingleResult aside, calling it "Single"
still seems very non-descriptive. I assume it's named to contrast with
init_MultiFuncCall() etc. While those are also not named greatly, they're not
typically used directly but wraped in SRF_FIRSTCALL_INIT etc.

I also quite intensely dislike SRF_SINGLE_USE_EXPECTED. It sounds like it's
saying that a single use of the SRF is expected, but that's not at all what it
means: "use expectedDesc as tupdesc".

I'm also confused by SRF_SINGLE_BLESS - the comment says "validate tuple for
SRF". BlessTupleDesc can't really be described as validation, or am I missing
something?

This IMO needs to be cleaned up.


Maybe something like InitMaterializedSRF() w/
MAT_SRF_(USE_EXPECTED_DESC|BLESS)

Greetings,

Andres Freund




Re: remove redundant memset() call

2022-10-13 Thread Daniel Gustafsson
> On 13 Oct 2022, at 21:18, Nathan Bossart  wrote:
> 
> On Thu, Oct 13, 2022 at 03:15:13PM -0400, Bruce Momjian wrote:
>> On Thu, Oct 13, 2022 at 12:12:35PM -0700, Zhihong Yu wrote:
>>> the memory has been zero'ed out by palloc0().
>>> 
>>> memcpy is not relevant w.r.t. resetting memory.
>> 
>> Ah, good point.
> 
> Yeah, it looks like this was missed in ca7f8e2.

Agreed, it looks like I missed that one, I can't see any reason to keep it.  Do
you want me to take care of it Bruce, and clean up after myself, or are you
already on it?

--
Daniel Gustafsson   https://vmware.com/





Re: remove redundant memset() call

2022-10-13 Thread Nathan Bossart
On Thu, Oct 13, 2022 at 03:15:13PM -0400, Bruce Momjian wrote:
> On Thu, Oct 13, 2022 at 12:12:35PM -0700, Zhihong Yu wrote:
>> the memory has been zero'ed out by palloc0().
>> 
>> memcpy is not relevant w.r.t. resetting memory.
> 
> Ah, good point.

Yeah, it looks like this was missed in ca7f8e2.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: remove redundant memset() call

2022-10-13 Thread Bruce Momjian
On Thu, Oct 13, 2022 at 12:12:35PM -0700, Zhihong Yu wrote:
> On Thu, Oct 13, 2022 at 12:10 PM Bruce Momjian  wrote:
> 
> On Thu, Oct 13, 2022 at 10:55:08AM -0700, Zhihong Yu wrote:
> > Hi,
> > I was looking at combo_init in contrib/pgcrypto/px.c .
> >
> > There is a memset() call following palloc0() - the call is redundant.
> >
> > Please see the patch for the proposed change.
> >
> > Thanks
> 
> > diff --git a/contrib/pgcrypto/px.c b/contrib/pgcrypto/px.c
> > index 3b098c6151..d35ccca777 100644
> > --- a/contrib/pgcrypto/px.c
> > +++ b/contrib/pgcrypto/px.c
> > @@ -203,7 +203,6 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned
> klen,
> >       if (klen > ks)
> >               klen = ks;
> >       keybuf = palloc0(ks);
> > -     memset(keybuf, 0, ks);
> >       memcpy(keybuf, key, klen);
> > 
> >       err = px_cipher_init(c, keybuf, klen, ivbuf);
> 
> Uh, the memset() is ks length but the memcpy() is klen, and the above
> test allows ks to be larger than klen.
> 
> --
>   Bruce Momjian          https://momjian.us
>   EDB                                      https://enterprisedb.com
> 
>   Indecision is a decision.  Inaction is an action.  Mark Batterson
> 
> 
> Hi,
> the memory has been zero'ed out by palloc0().
> 
> memcpy is not relevant w.r.t. resetting memory.

Ah, good point.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: remove redundant memset() call

2022-10-13 Thread Zhihong Yu
On Thu, Oct 13, 2022 at 12:10 PM Bruce Momjian  wrote:

> On Thu, Oct 13, 2022 at 10:55:08AM -0700, Zhihong Yu wrote:
> > Hi,
> > I was looking at combo_init in contrib/pgcrypto/px.c .
> >
> > There is a memset() call following palloc0() - the call is redundant.
> >
> > Please see the patch for the proposed change.
> >
> > Thanks
>
> > diff --git a/contrib/pgcrypto/px.c b/contrib/pgcrypto/px.c
> > index 3b098c6151..d35ccca777 100644
> > --- a/contrib/pgcrypto/px.c
> > +++ b/contrib/pgcrypto/px.c
> > @@ -203,7 +203,6 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned
> klen,
> >   if (klen > ks)
> >   klen = ks;
> >   keybuf = palloc0(ks);
> > - memset(keybuf, 0, ks);
> >   memcpy(keybuf, key, klen);
> >
> >   err = px_cipher_init(c, keybuf, klen, ivbuf);
>
> Uh, the memset() is ks length but the memcpy() is klen, and the above
> test allows ks to be larger than klen.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   Indecision is a decision.  Inaction is an action.  Mark Batterson
>
> Hi,
the memory has been zero'ed out by palloc0().

memcpy is not relevant w.r.t. resetting memory.

Cheers


Re: remove redundant memset() call

2022-10-13 Thread Bruce Momjian
On Thu, Oct 13, 2022 at 10:55:08AM -0700, Zhihong Yu wrote:
> Hi,
> I was looking at combo_init in contrib/pgcrypto/px.c .
> 
> There is a memset() call following palloc0() - the call is redundant.
> 
> Please see the patch for the proposed change.
> 
> Thanks

> diff --git a/contrib/pgcrypto/px.c b/contrib/pgcrypto/px.c
> index 3b098c6151..d35ccca777 100644
> --- a/contrib/pgcrypto/px.c
> +++ b/contrib/pgcrypto/px.c
> @@ -203,7 +203,6 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned klen,
>   if (klen > ks)
>   klen = ks;
>   keybuf = palloc0(ks);
> - memset(keybuf, 0, ks);
>   memcpy(keybuf, key, klen);
>  
>   err = px_cipher_init(c, keybuf, klen, ivbuf);

Uh, the memset() is ks length but the memcpy() is klen, and the above
test allows ks to be larger than klen.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Suppressing useless wakeups in walreceiver

2022-10-13 Thread Nathan Bossart
On Thu, Oct 13, 2022 at 12:37:39PM +0200, Alvaro Herrera wrote:
> I think in 0001 we should put more stuff in the state struct --
> specifically these globals:
> 
> static intrecvFile = -1;
> static TimeLineID recvFileTLI = 0;
> static XLogSegNo recvSegNo = 0;
> 
> The main reason is that it seems odd to have startpointTLI in the struct
> used in some places together with a file-global recvFileTLI which isn't.
> The way one is passed as argument and the other as part of a struct
> seemed too distracting.  This should reduce the number of moving parts,
> ISTM.

Makes sense.  Do you think the struct should be file-global so that it
doesn't need to be provided as an argument to most of the static functions
in this file?

> One thing that confused me for a moment is that we have some state in
> walrcv and some more state in 'state'.  The difference is pretty obvious
> once you look at the other, but it suggest to me that a better variable
> name for the latter is 'localstate' to more obviously distinguish them.

Sure, I'll change it to 'localstate'.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: archive modules

2022-10-13 Thread Tom Lane
Nathan Bossart  writes:
> On Thu, Oct 13, 2022 at 03:25:27PM +0530, Bharath Rupireddy wrote:
>> The intent here looks reasonable to me. However, why should the user
>> be able to set both archive_command and archive_library in the first
>> place only to later fail in LoadArchiveLibrary() per the patch? IMO,
>> the check_hook() is the right way to disallow any sorts of GUC
>> misconfigurations, no?

> There was some discussion upthread about using the GUC hooks to enforce
> this [0].  In general, it doesn't seem to be a recommended practice.

Yeah, it really does not work to use GUC hooks to enforce multi-variable
constraints.  We've learned that the hard way (more than once, if memory
serves).

regards, tom lane




Re: PG upgrade 14->15 fails - database contains our own extension

2022-10-13 Thread Tom Lane
I wrote:
> We might be able to put in some kluge in pg_dump to make it less
> likely to fail with existing DBs, but I think the true fix lies
> in adding that dependency.

Here's a draft patch for that.  I'm unsure whether it's worth
back-patching; even if we did, we couldn't guarantee that existing
databases would have the additional pg_depend entries.

If we do only put it in HEAD, maybe we should break compatibility
to the extent of changing IsBinaryCoercible's API rather than
inventing a separate call.  I'm still not excited about recording
additional dependencies elsewhere, but that path would leave us
with cleaner code if we eventually do that.

regards, tom lane

diff --git a/src/backend/catalog/pg_cast.c b/src/backend/catalog/pg_cast.c
index 1812bb7fcc..b50a56954d 100644
--- a/src/backend/catalog/pg_cast.c
+++ b/src/backend/catalog/pg_cast.c
@@ -35,13 +35,20 @@
  * Caller must have already checked privileges, and done consistency
  * checks on the given datatypes and cast function (if applicable).
  *
+ * Since we allow binary coercibility of the datatypes to the cast
+ * function's input and result, there could be one or two WITHOUT FUNCTION
+ * casts that this one depends on.  We don't record that explicitly
+ * in pg_cast, but we still need to make dependencies on those casts.
+ *
  * 'behavior' indicates the types of the dependencies that the new
- * cast will have on its input and output types and the cast function.
+ * cast will have on its input and output types, the cast function,
+ * and the other casts if any.
  * 
  */
 ObjectAddress
-CastCreate(Oid sourcetypeid, Oid targettypeid, Oid funcid, char castcontext,
-		   char castmethod, DependencyType behavior)
+CastCreate(Oid sourcetypeid, Oid targettypeid,
+		   Oid funcid, Oid incastid, Oid outcastid,
+		   char castcontext, char castmethod, DependencyType behavior)
 {
 	Relation	relation;
 	HeapTuple	tuple;
@@ -102,6 +109,18 @@ CastCreate(Oid sourcetypeid, Oid targettypeid, Oid funcid, char castcontext,
 		add_exact_object_address(, addrs);
 	}
 
+	/* dependencies on casts required for function */
+	if (OidIsValid(incastid))
+	{
+		ObjectAddressSet(referenced, CastRelationId, incastid);
+		add_exact_object_address(, addrs);
+	}
+	if (OidIsValid(outcastid))
+	{
+		ObjectAddressSet(referenced, CastRelationId, outcastid);
+		add_exact_object_address(, addrs);
+	}
+
 	record_object_address_dependencies(, addrs, behavior);
 	free_object_addresses(addrs);
 
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index e6fcfc23b9..1f820c93e9 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -1526,6 +1526,8 @@ CreateCast(CreateCastStmt *stmt)
 	char		sourcetyptype;
 	char		targettyptype;
 	Oid			funcid;
+	Oid			incastid = InvalidOid;
+	Oid			outcastid = InvalidOid;
 	int			nargs;
 	char		castcontext;
 	char		castmethod;
@@ -1603,7 +1605,9 @@ CreateCast(CreateCastStmt *stmt)
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 	 errmsg("cast function must take one to three arguments")));
-		if (!IsBinaryCoercible(sourcetypeid, procstruct->proargtypes.values[0]))
+		if (!IsBinaryCoercibleWithCast(sourcetypeid,
+	   procstruct->proargtypes.values[0],
+	   ))
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 	 errmsg("argument of cast function must match or be binary-coercible from source data type")));
@@ -1617,7 +1621,9 @@ CreateCast(CreateCastStmt *stmt)
 	(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 	 errmsg("third argument of cast function must be type %s",
 			"boolean")));
-		if (!IsBinaryCoercible(procstruct->prorettype, targettypeid))
+		if (!IsBinaryCoercibleWithCast(procstruct->prorettype,
+	   targettypeid,
+	   ))
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 	 errmsg("return data type of cast function must match or be binary-coercible to target data type")));
@@ -1756,8 +1762,8 @@ CreateCast(CreateCastStmt *stmt)
 			break;
 	}
 
-	myself = CastCreate(sourcetypeid, targettypeid, funcid, castcontext,
-		castmethod, DEPENDENCY_NORMAL);
+	myself = CastCreate(sourcetypeid, targettypeid, funcid, incastid, outcastid,
+		castcontext, castmethod, DEPENDENCY_NORMAL);
 	return myself;
 }
 
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 33b64fd279..b7c3dded17 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -1705,7 +1705,9 @@ DefineRange(ParseState *pstate, CreateRangeStmt *stmt)
 			   );
 
 	/* Create cast from the range type to its multirange type */
-	CastCreate(typoid, multirangeOid, castFuncOid, 'e', 'f', DEPENDENCY_INTERNAL);
+	CastCreate(typoid, multirangeOid, castFuncOid, InvalidOid, InvalidOid,
+			   COERCION_CODE_EXPLICIT, COERCION_METHOD_FUNCTION,
+			   

Re: proposal: possibility to read dumped table's name from file

2022-10-13 Thread Andres Freund
Hi,

On 2022-10-07 07:26:08 +0200, Pavel Stehule wrote:
> I am sending version with handy written parser and meson support

Given this is a new approach it seems inaccurate to have the CF entry marked
ready-for-committer. I've updated it to needs-review.

Greetings,

Andres Freund




Re: archive modules

2022-10-13 Thread Nathan Bossart
On Thu, Oct 13, 2022 at 03:25:27PM +0530, Bharath Rupireddy wrote:
> The intent here looks reasonable to me. However, why should the user
> be able to set both archive_command and archive_library in the first
> place only to later fail in LoadArchiveLibrary() per the patch? IMO,
> the check_hook() is the right way to disallow any sorts of GUC
> misconfigurations, no?

There was some discussion upthread about using the GUC hooks to enforce
this [0].  In general, it doesn't seem to be a recommended practice.  One
basic example of the problems with this approach is the following:

  1. Set archive_command and leave archive_library unset and restart
 the server.
  2. Unset archive_command and set archive_library and call 'pg_ctl
 reload'.

After these steps, you'll see the following log messages:

2022-10-13 10:58:42.112 PDT [1562524] LOG:  received SIGHUP, reloading 
configuration files
2022-10-13 10:58:42.114 PDT [1562524] LOG:  cannot set 
"archive_library" when "archive_command" is specified
2022-10-13 10:58:42.114 PDT [1562524] DETAIL:  Only one of 
"archive_library" or "archive_command" can be specified.
2022-10-13 10:58:42.114 PDT [1562524] LOG:  parameter "archive_command" 
changed to ""
2022-10-13 10:58:42.114 PDT [1562524] LOG:  configuration file 
"/home/nathan/pgdata/postgresql.conf" contains errors; unaffected changes were 
applied

[0] https://postgr.es/m/20220914200305.GA2984249%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




remove redundant memset() call

2022-10-13 Thread Zhihong Yu
Hi,
I was looking at combo_init in contrib/pgcrypto/px.c .

There is a memset() call following palloc0() - the call is redundant.

Please see the patch for the proposed change.

Thanks


remove-redundant-memset-call.patch
Description: Binary data


Re: Suppressing useless wakeups in walreceiver

2022-10-13 Thread Nathan Bossart
On Thu, Oct 13, 2022 at 03:35:14PM +0530, Bharath Rupireddy wrote:
> I think the hot standby feedback message gets sent too frequently
> without honouring the wal_receiver_status_interval because the 'now'
> is actually not the current time with your patch but 'now +
> XLogWalRcvSendReply()'s time'. However, it's possible that I may be
> wrong here.

Is your concern that the next wakeup time will be miscalculated because of
a stale value in 'now'?  I think that's existing behavior, as we save the
current time before sending the message in XLogWalRcvSendReply and
XLogWalRcvSendHSFeedback.  The only way to completely avoid this would be
to call GetCurrentTimestamp() after the message has been sent and to use
that value to calculate when the next message should be sent.  However, I
am skeptical there's really any practical impact.  Is it really a problem
if a message that should be sent every 30 seconds is sent at an interval of
29.9 seconds sometimes?  I think it could go the other way, too.  If 'now'
is stale, we might decide not to send a feedback message until the next
loop iteration, so the interval might be 30.1 seconds.  I don't think it's
realistic to expect complete accuracy here.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Support tls-exporter as channel binding for TLSv1.3

2022-10-13 Thread Jacob Champion
On Wed, Oct 12, 2022 at 11:01 PM Michael Paquier  wrote:
> One thing that would reduce the complexity of the equation is
> to drop support for tls-server-end-point in the backend in PG >= 16 as
> the versions of OpenSSL we still support on HEAD would cover
> completely tls-exporter.

Is the intent to backport tls-exporter client support? Or is the
compatibility break otherwise acceptable?

It seemed like there was also some general interest in proxy TLS
termination (see also the PROXY effort, though it has stalled a bit).
For that, I would think tls-server-end-point is an important feature.

--Jacob




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-13 Thread Melanie Plageman
On Mon, Oct 10, 2022 at 7:43 PM Maciek Sakrejda  wrote:
>
> Thanks for working on this! Like Lukas, I'm excited to see more
> visibility into important parts of the system like this.

Thanks for taking another look!

>
> On Mon, Oct 10, 2022 at 11:49 AM Melanie Plageman
>  wrote:
> >
> > I've gone ahead and implemented option 1 (commented below).
>
> No strong opinion on 1 versus 2, but I guess at least partly because I
> don't understand the implications (I do understand the difference,
> just not when it might be important in terms of stats). Can we think
> of a situation where combining stats about initial additions with
> pinned additions hides some behavior that might be good to understand
> and hard to pinpoint otherwise?

I think that it makes sense to count both the initial buffers added to
the ring and subsequent shared buffers added to the ring (either when
the current strategy buffer is pinned or in use or when a bulkread
rejects dirty strategy buffers in favor of new shared buffers) as
strategy clocksweeps because of how the statistic would be used.

Clocksweeps give you an idea of how much of your working set is cached
(setting aside initially reading data into shared buffers when you are
warming up the db). You may use clocksweeps to determine if you need to
make shared buffers larger.

Distinguishing strategy buffer clocksweeps from shared buffer
clocksweeps allows us to avoid enlarging shared buffers if most of the
clocksweeps are to bring in blocks for the strategy operation.

However, I could see an argument that discounting strategy clocksweeps
done because the current strategy buffer is pinned makes the number of
shared buffer clocksweeps artificially low since those other queries
using the buffer would have suffered a cache miss were it not for the
strategy. And, in this case, you would take strategy clocksweeps
together with shared clocksweeps to make your decision. And if we
include buffers initially added to the strategy ring in the strategy
clocksweep statistic, this number may be off because those blocks may
not be needed in the main shared working set. But you won't know that
until you try to reuse the buffer and it is pinned. So, I think we don't
have a better option than counting initial buffers added to the ring as
strategy clocksweeps (as opposed to as reuses).

So, in answer to your question, no, I cannot think of a scenario like
that.

Sitting down and thinking about that for a long time did, however, help
me realize that some of my code comments were misleading (and some
incorrect). I will update these in the next version once we agree on
updated docs.

It also made me remember that I am incorrectly counting rejected buffers
as reused. I'm not sure if it is a good idea to subtract from reuses
when a buffer is rejected. Waiting until after it is rejected to count
the reuse will take some other code changes. Perhaps we could also count
rejections in the stats?

>
> I took a look at the latest docs (as someone mostly familiar with
> internals at only a pretty high level, so probably somewhat close to
> the target audience) and have some feedback.
>
> + 
> +   role="column_definition">
> +   backend_type text
> +  
> +  
> +   Type of backend (e.g. background worker, autovacuum worker).
> +  
> + 
>
> Not critical, but is there a list of backend types we could
> cross-reference elsewhere in the docs?

The most I could find was this longer explanation (with exhaustive list
of types) in pg_stat_activity docs [1]. I could duplicate what it says
or I could link to the view and say "see pg_stat_activity" for a
description of backend_type" or something like that (to keep them from
getting out of sync as new backend_types are added. I suppose I could
also add docs on backend_types, but I'm not sure where something like
that would go.

>
> From the io_context column description:
>
> +   The autovacuum daemon, explicit VACUUM,
> explicit
> +   ANALYZE, many bulk reads, and many bulk
> writes use a
> +   fixed amount of memory, acquiring the equivalent number of
> shared
> +   buffers and reusing them circularly to avoid occupying an
> undue portion
> +   of the main shared buffer pool.
> +  
>
> I don't understand how this is relevant to the io_context column.
> Could you expand on that, or am I just missing something obvious?
>

I'm trying to explain why those other IO Contexts exist (bulkread,
bulkwrite, vacuum) and why they are separate from shared buffers.
Should I cut it altogether or preface it with something like: these are
counted separate from shared buffers because...?


> + 
> +   role="column_definition">
> +   extended bigint
> +  
> +  
> +   Extends of relations done by this
> backend_type in
> +   order to write data in this io_context.
> +  
> + 
>
> I understand what this is, but not why this is something I might want
> to know about.

Unlike writes, backends largely have to 

Re: Summary function for pg_buffercache

2022-10-13 Thread Andres Freund
Hi,

On 2022-10-12 12:27:54 -0700, Andres Freund wrote:
> I intentionally put my changes into a fixup commit, in case you want to look
> at the differences.

I pushed the (combined) patch now. Thanks for your contribution!

Greetings,

Andres Freund




Re: [RFC] building postgres with meson - v13

2022-10-13 Thread Andres Freund
Hi,

On 2022-10-13 09:24:51 +, shiy.f...@fujitsu.com wrote:
> I noticed that `pg_config --configure` didn't show the options given when
> building with meson.

Yes, that was noted somewhere on this thread.


> Maybe it would be better if pg_config can output this information, to be
> consistent with the output when building with `./configure` and `make`.
>
> The output when building with `./configure` and `make`:
> $ pg_config --configure
>  '--prefix=/home/postgres/install/' '--cache' 'gcc.cache' '--enable-dtrace' 
> '--with-icu' '--enable-cassert'

It'd be a fair amount of work, both initially and to maintain it, to generate
something compatible. I can see some benefit in showing some feature
influencing output in --configure, but compatible output doesn't seem worth it
to me.

Greetings,

Andres Freund




Re: Bloom filter Pushdown Optimization for Merge Join

2022-10-13 Thread Zhihong Yu
On Thu, Oct 13, 2022 at 7:30 AM Zhihong Yu  wrote:

>
>
> On Wed, Oct 12, 2022 at 4:35 PM Zhihong Yu  wrote:
>
>>
>>
>> On Wed, Oct 12, 2022 at 3:36 PM Lyu Pan  wrote:
>>
>>> Hello Zhihong Yu & Tomas Vondra,
>>>
>>> Thank you so much for your review and feedback!
>>>
>>> We made some updates based on previous feedback and attached the new
>>> patch set. Due to time constraints, we didn't get to resolve all the
>>> comments, and we'll continue to improve this patch.
>>>
>>> > In this prototype, the cost model is based on an assumption that there
>>> is
>>> > a linear relationship between the performance gain from using a
>>> semijoin
>>> > filter and the estimated filtering rate:
>>> > % improvement to Merge Join cost = 0.83 * estimated filtering rate -
>>> 0.137.
>>> >
>>> > How were the coefficients (0.83 and 0.137) determined ?
>>> > I guess they were based on the results of running certain workload.
>>>
>>> Right, the coefficients (0.83 and 0.137) determined are based on some
>>> preliminary testings. The current costing model is pretty naive and
>>> we'll work on a more robust costing model in future work.
>>>
>>>
>>> > I agree, in principle, although I think the current logic / formula is
>>> a
>>> > bit too crude and fitted to the simple data used in the test. I think
>>> > this needs to be formulated as a regular costing issue, considering
>>> > stuff like cost of the hash functions, and so on.
>>> >
>>> > I think this needs to do two things:
>>> >
>>> > 1) estimate the cost of building the bloom filter - This shall depend
>>> on
>>> > the number of rows in the inner relation, number/cost of the hash
>>> > functions (which may be higher for some data types), etc.
>>> >
>>> > 2) estimate improvement for the probing branch - Essentially, we need
>>> to
>>> > estimate how much we save by filtering some of the rows, but this also
>>> > neeeds to include the cost of probing the bloom filter.
>>> >
>>> > This will probably require some improvements to the lib/bloomfilter, in
>>> > order to estimate the false positive rate - this may matter a lot for
>>> > large data sets and small work_mem values. The bloomfilter library
>>> > simply reduces the size of the bloom filter, which increases the false
>>> > positive rate. At some point it'll start reducing the benefit.
>>> >
>>>
>>> These suggestions make a lot of sense. The current costing model is
>>> definitely not good enough, and we plan to work on a more robust
>>> costing model as we continue to improve the patch.
>>>
>>>
>>> > OK. Could also build the bloom filter in shared memory?
>>> >
>>>
>>> We thought about this approach but didn't prefer this one because if
>>> all worker processes share the same bloom filter in shared memory, we
>>> need to frequently lock and unlock the bloom filter to avoid race
>>> conditions. So we decided to have each worker process create its own
>>> bloom filter.
>>>
>>>
>>> > IMHO we shouldn't make too many conclusions from these examples. Yes,
>>> it
>>> > shows merge join can be improved, but for cases where a hashjoin works
>>> > better so we wouldn't use merge join anyway.
>>> >
>>> > I think we should try constructing examples where either merge join
>>> wins
>>> > already (and gets further improved by the bloom filter), or would lose
>>> > to hash join and the bloom filter improves it enough to win.
>>> >
>>> > AFAICS that requires a join of two large tables - large enough that
>>> hash
>>> > join would need to be batched, or pre-sorted inputs (which eliminates
>>> > the explicit Sort, which is the main cost in most cases).
>>> >
>>> > The current patch only works with sequential scans, which eliminates
>>> the
>>> > second (pre-sorted) option. So let's try the first one - can we invent
>>> > an example with a join of two large tables where a merge join would
>>> win?
>>> >
>>> > Can we find such example in existing benchmarks like TPC-H/TPC-DS.
>>> >
>>>
>>> Agreed. The current examples are only intended to show us that using
>>> bloom filters in merge join could improve the merge join performance
>>> in some cases. We are working on testing more examples that merge join
>>> with bloom filter could out-perform hash join, which should be more
>>> persuasive.
>>>
>>>
>>> > The bloom filter is built by the first seqscan (on t0), and then used
>>> by
>>> > the second seqscan (on t1). But this only works because we always run
>>> > the t0 scan to completion (because we're feeding it into Sort) before
>>> we
>>> > start scanning t1.
>>> >
>>> > But when the scan on t1 switches to an index scan, it's over - we'd be
>>> > building the filter without being able to probe it, and when we finish
>>> > building it we no longer need it. So this seems pretty futile.
>>> >
>>> > It might still improve plans like
>>> >
>>> >->  Merge Join
>>> >  Merge Cond: (t0.c1 = t1.c1)
>>> >  SemiJoin Filter Created Based on: (t0.c1 = t1.c1)
>>> >  SemiJoin Estimated Filtering Rate: 1.
>>> > ->  Sort

Is anybody planning to release pglogical for v15 ?

2022-10-13 Thread Hannu Krosing
So, is anybody planning to release pglogical for v15 ?

There are still a few things that one can do in  pglogical but not in
native / built0in replication ...


Best Regards
Hannu


Re: PG upgrade 14->15 fails - database contains our own extension

2022-10-13 Thread Tom Lane
I wrote:
> Hmm ... I think it's a very ancient bug that somehow David has avoided
> tripping over up to now.

Looking closer, I don't see how b55f2b692 could have changed pg_dump's
opinion of the order to sort these three casts in; that sort ordering
logic is old enough to vote.  So I'm guessing that in fact this *never*
worked.  Perhaps this extension has never been through pg_upgrade before,
or at least not with these casts?

> We might be able to put in some kluge in pg_dump to make it less
> likely to fail with existing DBs, but I think the true fix lies
> in adding that dependency.

I don't see any painless way to fix this in pg_dump, and I'm inclined
not to bother trying if it's not a regression.  Better to spend the
effort on the backend-side fix.

On the backend side, really anyplace that we consult IsBinaryCoercible
during DDL is at hazard.  While there aren't a huge number of such
places, there's certainly more than just CreateCast.  I'm trying to
decide how much trouble it's worth going to there.  I could be wrong,
but I think that only the cast-vs-cast case is really likely to be
problematic for pg_dump, given that it dumps casts pretty early now.
So it might be sufficient to fix that one case.

regards, tom lane




Re: PG upgrade 14->15 fails - database contains our own extension

2022-10-13 Thread Tom Lane
Robert Haas  writes:
> CREATE CAST (integer AS public.lbuid) WITH FUNCTION
> pg_catalog.int8(integer) AS IMPLICIT;
> CREATE CAST (bigint AS public.lbuid) WITHOUT FUNCTION AS IMPLICIT;
> CREATE CAST (public.lbuid AS bigint) WITHOUT FUNCTION AS IMPLICIT;

> That's not a valid dump ordering, and if I drop the casts and try to
> recreate them that way, it fails in the same way you saw.

> My guess is that this is a bug in Tom's commit
> b55f2b6926556115155930c4b2d006c173f45e65, "Adjust pg_dump's priority
> ordering for casts."

Hmm ... I think it's a very ancient bug that somehow David has avoided
tripping over up to now.  Namely, that we require the bigint->lbuid
implicit cast to exist in order to make that WITH FUNCTION cast, but
we fail to record it as a dependency during CastCreate.  So pg_dump
is flying blind as to the required restoration order, and if it ever
worked, you were just lucky.

We might be able to put in some kluge in pg_dump to make it less
likely to fail with existing DBs, but I think the true fix lies
in adding that dependency.

(I'm pretty skeptical about it being a good idea to have a set of
casts like this, but I don't suppose pg_dump is chartered to
editorialize on that.)

regards, tom lane




Re: Bloom filter Pushdown Optimization for Merge Join

2022-10-13 Thread Zhihong Yu
On Wed, Oct 12, 2022 at 4:35 PM Zhihong Yu  wrote:

>
>
> On Wed, Oct 12, 2022 at 3:36 PM Lyu Pan  wrote:
>
>> Hello Zhihong Yu & Tomas Vondra,
>>
>> Thank you so much for your review and feedback!
>>
>> We made some updates based on previous feedback and attached the new
>> patch set. Due to time constraints, we didn't get to resolve all the
>> comments, and we'll continue to improve this patch.
>>
>> > In this prototype, the cost model is based on an assumption that there
>> is
>> > a linear relationship between the performance gain from using a semijoin
>> > filter and the estimated filtering rate:
>> > % improvement to Merge Join cost = 0.83 * estimated filtering rate -
>> 0.137.
>> >
>> > How were the coefficients (0.83 and 0.137) determined ?
>> > I guess they were based on the results of running certain workload.
>>
>> Right, the coefficients (0.83 and 0.137) determined are based on some
>> preliminary testings. The current costing model is pretty naive and
>> we'll work on a more robust costing model in future work.
>>
>>
>> > I agree, in principle, although I think the current logic / formula is a
>> > bit too crude and fitted to the simple data used in the test. I think
>> > this needs to be formulated as a regular costing issue, considering
>> > stuff like cost of the hash functions, and so on.
>> >
>> > I think this needs to do two things:
>> >
>> > 1) estimate the cost of building the bloom filter - This shall depend on
>> > the number of rows in the inner relation, number/cost of the hash
>> > functions (which may be higher for some data types), etc.
>> >
>> > 2) estimate improvement for the probing branch - Essentially, we need to
>> > estimate how much we save by filtering some of the rows, but this also
>> > neeeds to include the cost of probing the bloom filter.
>> >
>> > This will probably require some improvements to the lib/bloomfilter, in
>> > order to estimate the false positive rate - this may matter a lot for
>> > large data sets and small work_mem values. The bloomfilter library
>> > simply reduces the size of the bloom filter, which increases the false
>> > positive rate. At some point it'll start reducing the benefit.
>> >
>>
>> These suggestions make a lot of sense. The current costing model is
>> definitely not good enough, and we plan to work on a more robust
>> costing model as we continue to improve the patch.
>>
>>
>> > OK. Could also build the bloom filter in shared memory?
>> >
>>
>> We thought about this approach but didn't prefer this one because if
>> all worker processes share the same bloom filter in shared memory, we
>> need to frequently lock and unlock the bloom filter to avoid race
>> conditions. So we decided to have each worker process create its own
>> bloom filter.
>>
>>
>> > IMHO we shouldn't make too many conclusions from these examples. Yes, it
>> > shows merge join can be improved, but for cases where a hashjoin works
>> > better so we wouldn't use merge join anyway.
>> >
>> > I think we should try constructing examples where either merge join wins
>> > already (and gets further improved by the bloom filter), or would lose
>> > to hash join and the bloom filter improves it enough to win.
>> >
>> > AFAICS that requires a join of two large tables - large enough that hash
>> > join would need to be batched, or pre-sorted inputs (which eliminates
>> > the explicit Sort, which is the main cost in most cases).
>> >
>> > The current patch only works with sequential scans, which eliminates the
>> > second (pre-sorted) option. So let's try the first one - can we invent
>> > an example with a join of two large tables where a merge join would win?
>> >
>> > Can we find such example in existing benchmarks like TPC-H/TPC-DS.
>> >
>>
>> Agreed. The current examples are only intended to show us that using
>> bloom filters in merge join could improve the merge join performance
>> in some cases. We are working on testing more examples that merge join
>> with bloom filter could out-perform hash join, which should be more
>> persuasive.
>>
>>
>> > The bloom filter is built by the first seqscan (on t0), and then used by
>> > the second seqscan (on t1). But this only works because we always run
>> > the t0 scan to completion (because we're feeding it into Sort) before we
>> > start scanning t1.
>> >
>> > But when the scan on t1 switches to an index scan, it's over - we'd be
>> > building the filter without being able to probe it, and when we finish
>> > building it we no longer need it. So this seems pretty futile.
>> >
>> > It might still improve plans like
>> >
>> >->  Merge Join
>> >  Merge Cond: (t0.c1 = t1.c1)
>> >  SemiJoin Filter Created Based on: (t0.c1 = t1.c1)
>> >  SemiJoin Estimated Filtering Rate: 1.
>> > ->  Sort
>> >Sort Key: t0.c1
>> >->  Seq Scan on t0
>> >  ->  Index Scan on t1
>> >
>> > But I don't know how common/likely that actually is. I'd expect to have
>> > an index on 

Re: PG upgrade 14->15 fails - database contains our own extension

2022-10-13 Thread Robert Haas
On Thu, Oct 13, 2022 at 9:57 AM David Turoň  wrote:
> pg_restore: creating TYPE "public.lbuid"
> pg_restore: creating CAST "CAST (integer AS "public"."lbuid")"
> pg_restore: while PROCESSING TOC:
> pg_restore: from TOC entry 3751; 2605 16393 CAST CAST (integer AS 
> "public"."lbuid") (no owner)
> pg_restore: error: could not execute query: ERROR:  return data type of cast 
> function must match or be binary-coercible to target data type
> Command was: CREATE CAST (integer AS "public"."lbuid") WITH FUNCTION 
> "pg_catalog"."int8"(integer) AS IMPLICIT;

I think the error is complaining that the return type of
int8(integer), which is bigint, needs to coercible WITHOUT FUNCTION to
lbuid. Your extension contains such a cast, but at the point when the
error occurs, it hasn't been restored yet. That suggests that either
the cast didn't get included in the dump file, or it got included in
the wrong order. A quick test suggest the latter. If I execute your
SQL file and the dump the database, I get:

CREATE CAST (integer AS public.lbuid) WITH FUNCTION
pg_catalog.int8(integer) AS IMPLICIT;
CREATE CAST (bigint AS public.lbuid) WITHOUT FUNCTION AS IMPLICIT;
CREATE CAST (public.lbuid AS bigint) WITHOUT FUNCTION AS IMPLICIT;

That's not a valid dump ordering, and if I drop the casts and try to
recreate them that way, it fails in the same way you saw.

My guess is that this is a bug in Tom's commit
b55f2b6926556115155930c4b2d006c173f45e65, "Adjust pg_dump's priority
ordering for casts."

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Move backup-related code to xlogbackup.c/.h

2022-10-13 Thread Bharath Rupireddy
On Thu, Oct 13, 2022 at 4:43 PM Alvaro Herrera  wrote:
>

Thanks for reviewing.

> > Hm. Agree. But, that requires us to include xlogbackup.h in
> > xlog_internal.h for SessionBackupState enum in
> > ResetXLogBackupActivity(). Is that okay?
>
> It's not great, but it's not *that* bad, ISTM, mainly because
> xlog_internal.h will affect less stuff than xlog.h.

Moved them to xlog_internal.h without xlogbackup.h included, please see below.

> > SessionBackupState and it needs to be set before we release WAL insert
> > locks, see the comment [1].
>
> I see.  Maybe we could keep that enum in xlog.h, instead.

It's not required now, please see below.

> While looking at how that works: I think calling a local variable
> "session_backup_state" is super confusing, seeing that we have a
> file-global variable called sessionBackupState.  I recommend naming the
> local "newstate" or something along those lines instead.
>
> I wonder why does pg_backup_start_callback() not change the backup state
> before your patch.  This seems a gratuitous difference, or is it?  If
> you change that code so that it also sets the status to BACKUP_NONE,
> then you can pass a bare SessionBackupState to ResetXLogBackupActivity
> rather than a pointer to one, which is a very strange arrangement that
> exists only so that you can have a third state (NULL) meaning "don't
> change state" -- that looks quite weird.
>
> Alternatively, if you don't want or can't change
> pg_backup_start_callback to pass a valid state value, another solution
> might be to pass a separate boolean "change state".
>
> But I would look at having another patch before your series that changes
> pg_backup_start_callback to make the code identical for the three
> callers, then you can simplify the patched code.

The pg_backup_start_callback() can just go ahead and reset
sessionBackupState. However, it leads us to the complete removal of
pg_backup_start_callback() itself and use do_pg_abort_backup()
consistently across, saving 20 LOC attached as v5-0001.

With this, the other patches would get simplified a bit too,
xlogbackup.h footprint got reduced now.

Please find the v5 patch-set. 0002-0004 moves the backup code to
xlogbackup.c/.h.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 5b2e8f86e388fbbe9f7eb276ca137b14ac0e2fa3 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 13 Oct 2022 12:18:46 +
Subject: [PATCH v5] Use do_pg_abort_backup() consistently across the backup
 code

Backup code uses another abort callback pg_backup_start_callback()
when it has do_pg_abort_backup(). These two are before_shmem_exit()
callbacks more or less do the same thing. The only difference
between them is resetting sessionBackupState to SESSION_BACKUP_NONE.
Actually, it is safe for us to reset sessionBackupState on aborts.
This leads us to remove the pg_backup_start_callback() and use
do_pg_abort_backup() consistently across.

Author: Bharath Rupireddy per suggestion from Alvaro Herrera.
Discussion: https://www.postgresql.org/message-id/20221013111330.564fk5tkwe3ha77l%40alvherre.pgsql
---
 src/backend/access/transam/xlog.c | 23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 27085b15a8..c9c1c6f287 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -679,8 +679,6 @@ static void ReadControlFile(void);
 static void UpdateControlFile(void);
 static char *str_time(pg_time_t tnow);
 
-static void pg_backup_start_callback(int code, Datum arg);
-
 static int	get_sync_bit(int method);
 
 static void CopyXLogRecordToWAL(int write_len, bool isLogSwitch,
@@ -8321,7 +8319,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 	WALInsertLockRelease();
 
 	/* Ensure we release forcePageWrites if fail below */
-	PG_ENSURE_ERROR_CLEANUP(pg_backup_start_callback, (Datum) 0);
+	PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, DatumGetBool(false));
 	{
 		bool		gotUniqueStartpoint = false;
 		DIR		   *tblspcdir;
@@ -8531,7 +8529,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 
 		state->starttime = (pg_time_t) time(NULL);
 	}
-	PG_END_ENSURE_ERROR_CLEANUP(pg_backup_start_callback, (Datum) 0);
+	PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, DatumGetBool(false));
 
 	state->started_in_recovery = backup_started_in_recovery;
 
@@ -8541,23 +8539,6 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 	sessionBackupState = SESSION_BACKUP_RUNNING;
 }
 
-/* Error cleanup callback for pg_backup_start */
-static void
-pg_backup_start_callback(int code, Datum arg)
-{
-	/* Update backup counters and forcePageWrites on failure */
-	WALInsertLockAcquireExclusive();
-
-	Assert(XLogCtl->Insert.runningBackups > 0);
-	XLogCtl->Insert.runningBackups--;
-
-	if (XLogCtl->Insert.runningBackups == 0)
-	{
-		

Re: Git tag for v15

2022-10-13 Thread Tom Lane
Alvaro Herrera  writes:
> If for whatever reason a problem is found before the tag has been
> posted, then a new Git commit is chosen and a new tarball published.
> The tag will then match the new commit, not the original obviously.
> I don't know what would happen if a problem were to be found *after* the
> tag has been pushed; normally that would just mean the fix would have to
> wait until the next minor version in that branch.  It would have to be
> something really serious in order for this process to be affected.
> As far as I know, this has never happened.

I don't think it's happened since we adopted Git, anyway.  The plan
if we did have to re-wrap at that point would be to increment the version
number(s), since not doing so would probably create too much confusion
about which tarballs were the correct ones.

regards, tom lane




Re: Move backup-related code to xlogbackup.c/.h

2022-10-13 Thread Robert Haas
On Thu, Oct 13, 2022 at 7:13 AM Alvaro Herrera  wrote:
> On 2022-Oct-13, Bharath Rupireddy wrote:
> > Hm. Agree. But, that requires us to include xlogbackup.h in
> > xlog_internal.h for SessionBackupState enum in
> > ResetXLogBackupActivity(). Is that okay?
>
> It's not great, but it's not *that* bad, ISTM, mainly because
> xlog_internal.h will affect less stuff than xlog.h.

This is unfortunately a lot less true than I would like. I count 75
places where we #include "access/xlog.h" and 53 where we #include
"access/xlog_internal.h". And many of those are in frontend code. I
feel like the contents of xlog_internal.h are a bit too eclectic.
Maybe stuff that has to do with the on-disk directory structure, like
XLOGDIR and XLOG_FNAME_LEN, as well as stuff that has to do with where
bytes are located, like XLByteToSeg, should move to another file.
Besides that, which is the biggest part of the file, there's also
stuff that has to do with the page and record format generally (like
XLOG_PAGE_MAGIC and SizeOfXLogShortPHD) and stuff that is used for
certain specific WAL record types (like xl_parameter_change and
xl_restore_point) and some random rmgr-related things (like RmgrData
and the stuff that folllows) and the usual assortment of random GUCs
and global variables (like RecoveryTargetAction and
ArchiveRecoveryRequested). Maybe it doesn't make sense to split this
up into a thousand tiny little header files, but I think some
rethinking would be a good idea, because it really doesn't make much
sense to me to mix stuff that has to do with file-naming conventions,
which a bunch of frontend code needs to know about, together with a
bunch of backend-only things.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Transparent column encryption

2022-10-13 Thread Mark Woodward
If memory serves me correctly, if you statically link openssl this will
work. If you are using ssl in a DLL, I believe that the DLL has its own
"c-library" and its own heap.

On Thu, Oct 13, 2022 at 9:43 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 06.10.22 17:19, Andres Freund wrote:
> >>> psql error: stderr: 'OPENSSL_Uplink(7FFC165CBD50,08): no
> OPENSSL_Applink'
> >> What in the world is that about?  What scant information I could find
> >> suggests that it has something to do with building a "release" build
> against
> >> an "debug" build of the openssl library, or vice versa.  But this patch
> >> doesn't introduce any use of openssl that we haven't seen before.
> > It looks to me that one needs to compile, in some form,
> openssl/applink.c and
> > link it to the application. No idea why that'd be required now and not
> > earlier.
>
> I have figured this out.  The problem is that on Windows you can't
> reliably pass stdio FILE * handles between the application and OpenSSL.
> To give the helpful places I found some Google juice, I'll mention them
> here:
>
> - https://github.com/edenhill/librdkafka/pull/3602
> - https://github.com/edenhill/librdkafka/issues/3554
> - https://www.mail-archive.com/openssl-users@openssl.org/msg91029.html
>
>
>
>


PG upgrade 14->15 fails - database contains our own extension

2022-10-13 Thread David Turoň


Hi community,

I have problem with pg_upgrade. Tested from 14.5 to 15.0 rc2 when database
contains our extension with one new type. Using pg_dump & restore works
well.

We made workaround extension for some usage in javascript library that
contains new type that represents bigint as text. So something like auto
conversion from SELECT (2^32)::text -> bigint when data is stored and
(2^32) -> text when data is retrieved.

Im not sure if this is postgresql bug or we have something wrong in our
extension with cast. So becouse this im writing there.

Here is output of pg_upgrade:
(our extension name is lbuid, our type public.lbuid)

command: "/usr/pgsql-15/bin/pg_dump" --host /tmp/pg_upgrade_log --port
50432 --username postgres --schema-only --quote-all-identifiers
--binary-upgrade --format=custom
--file="/home/pgsql/data_new/pg_upgrade_output.d/20221013T104924.054/
dump/pg_upgrade_dump_16385.custom" 'dbname=lbstat' >>
"/home/pgsql/data_new/pg_upgrade_output.d/20221013T104924.054/log/pg_upgrade_dump_16385.log"
 2>&1


command: "/usr/pgsql-15/bin/pg_restore" --host /tmp/pg_upgrade_log --port
50432 --username postgres --create --exit-on-error --verbose --dbname
template1
"/home/pgsql/data_new/pg_upgrade_output.d/20221013T104924.054/dump/pg_upgrade_dump_1
6385.custom" >>
"/home/pgsql/data_new/pg_upgrade_output.d/20221013T104924.054/log/pg_upgrade_dump_16385.log"
 2>&1
pg_restore: connecting to database for restore
pg_restore: creating DATABASE "lbstat"
pg_restore: connecting to new database "lbstat"
pg_restore: creating DATABASE PROPERTIES "lbstat"
pg_restore: connecting to new database "lbstat"
pg_restore: creating pg_largeobject "pg_largeobject"
pg_restore: creating SCHEMA "public"
pg_restore: creating COMMENT "SCHEMA "public""
pg_restore: creating EXTENSION "lbuid"
pg_restore: creating COMMENT "EXTENSION "lbuid""
pg_restore: creating SHELL TYPE "public.lbuid"
pg_restore: creating FUNCTION "public.lbuid_in("cstring")"
pg_restore: creating FUNCTION "public.lbuid_out("public"."lbuid")"
pg_restore: creating TYPE "public.lbuid"
pg_restore: creating CAST "CAST (integer AS "public"."lbuid")"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 3751; 2605 16393 CAST CAST (integer AS
"public"."lbuid") (no owner)
pg_restore: error: could not execute query: ERROR:  return data type of
cast function must match or be binary-coercible to target data type
Command was: CREATE CAST (integer AS "public"."lbuid") WITH FUNCTION
"pg_catalog"."int8"(integer) AS IMPLICIT;

-- For binary upgrade, handle extension membership the hard way
ALTER EXTENSION "lbuid" ADD CAST (integer AS "public"."lbuid");


Hint is good, but this cast is already member of our extension:

lbstat=# ALTER EXTENSION lbuid ADD CAST (integer AS public.lbuid);
ERROR:  cast from integer to lbuid is already a member of extension "lbuid"


Database contains this:

CREATE EXTENSION lbuid ;
CREATE TABLE test(id lbuild);
INSERT INTO test VALUES ('1344456644646645456');

Tested on our distribution based on centos7.

When i drop this cast from extension manualy a then manualy restore after
pg_upgrade, then operation is without failure.

In attachment are extension files.

Thanks.

Best regards. David T.
(See attached file: lbuid.control)(See attached file: lbuid--0.1.0.sql)

--
-
Ing. David TUROŇ
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:+420 591 166 224
fax:+420 596 621 273
mobil:  +420 732 589 152
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-

lbuid.control
Description: Binary data


lbuid--0.1.0.sql
Description: Binary data


Re: Transparent column encryption

2022-10-13 Thread Peter Eisentraut

On 06.10.22 17:19, Andres Freund wrote:

psql error: stderr: 'OPENSSL_Uplink(7FFC165CBD50,08): no OPENSSL_Applink'

What in the world is that about?  What scant information I could find
suggests that it has something to do with building a "release" build against
an "debug" build of the openssl library, or vice versa.  But this patch
doesn't introduce any use of openssl that we haven't seen before.

It looks to me that one needs to compile, in some form, openssl/applink.c and
link it to the application. No idea why that'd be required now and not
earlier.


I have figured this out.  The problem is that on Windows you can't 
reliably pass stdio FILE * handles between the application and OpenSSL. 
To give the helpful places I found some Google juice, I'll mention them 
here:


- https://github.com/edenhill/librdkafka/pull/3602
- https://github.com/edenhill/librdkafka/issues/3554
- https://www.mail-archive.com/openssl-users@openssl.org/msg91029.html





Re: Tracking last scan time

2022-10-13 Thread Dave Page
Hi

On Wed, 12 Oct 2022 at 23:52, Andres Freund  wrote:

> Hi,
>
> On 2022-10-12 12:50:31 -0700, Andres Freund wrote:
> > I think this should have at a basic test in
> src/test/regress/sql/stats.sql. If
> > I can write one in a few minutes I'll go for that, otherwise will reply
> > detailing difficulties.
>
> Took a bit longer (+lunch). Attached.
>
>
> In the attached 0001, the patch to make
> GetCurrentTransactionStopTimestamp()
> set xactStopTimestamp, I added a few comment updates and an Assert() to
> ensure
> that CurrentTransactionState->state is
> TRANS_(DEFAULT|COMMIT|ABORT|PREPARE). I
> am worried that otherwise we might end up with someone ending up using it
> in a
> place before the end of the transaction, which'd then end up recording the
> wrong timestamp in the commit/abort record.
>
>
> For 0002, the commit adding lastscan, I added catversion/stats version
> bumps
> (because I was planning to commit it already...), a commit message, and
> that
> minor docs change mentioned earlier.
>
>
> 0003 adds the tests mentioned above. I plan to merge them with 0002, but
> left
> them separate for easier review for now.
>
> To be able to compare timestamps for > not just >= we need to make sure
> that
> two subsequent timestamps differ. The attached achieves this by sleeping
> for
> 100ms between those points - we do that in other places already. I'd
> started
> out with 10ms, which I am fairly sure would suffice, but then deciced to
> copy
> the existing 100ms sleeps.
>
> I verified tests pass under valgrind, debug_discard_caches and after I make
> pgstat_report_stat() only flush when force is passed in.
>

Thanks for that. It looks good to me, bar one comment (repeated 3 times in
the sql and expected files):

fetch timestamps from before the next test

"from " should be removed.

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com


RE: possible typo for CREATE PUBLICATION description

2022-10-13 Thread osumi.takami...@fujitsu.com
Hi, Alvaro-san


On Thursday, October 13, 2022 8:38 PM Alvaro Herrera  
wrote:
> On 2022-Oct-13, osumi.takami...@fujitsu.com wrote:
> 
> > I noticed a possible typo in the doc for create publication.
> > This applies to PG15 as well.
> > Kindly have a look at the attached patch for it.
> 
> Yeah, looks good.  It actually applies all the way back to 10, so I pushed it 
> to
> all branches.  I included Peter's wording changes as well.
> 
> Thanks
You are right and thank you so much for taking care of this !


Best Regards,
Takamichi Osumi





Re: possible typo for CREATE PUBLICATION description

2022-10-13 Thread Alvaro Herrera
Hello

On 2022-Oct-13, osumi.takami...@fujitsu.com wrote:

> I noticed a possible typo in the doc for create publication.
> This applies to PG15 as well.
> Kindly have a look at the attached patch for it.

Yeah, looks good.  It actually applies all the way back to 10, so I
pushed it to all branches.  I included Peter's wording changes as well.

Thanks

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)




Re: Git tag for v15

2022-10-13 Thread Alvaro Herrera
On 2022-Oct-12, Matthias van de Meent wrote:

> On Wed, 12 Oct 2022 at 20:08, Nemo  wrote:
> > Since v15 doesn't seem to be announced yet - this is confusing me. It
> > doesn't impact us anywhere (yet), since we haven't added the v15 release
> > cycle to our page yet, but I'd like to check if this is an incorrect tag
> > or? If not, what's the correct source for us to use for automation purposes?
> 
> Tags are usually published a few days in advance of the official
> release, so that all packagers have the time to bundle and prepare the
> release in their repositories. The official release is planned for
> tomorrow the 13th, as mentioned in [0].

To be more precise, our cadence goes pretty much every time like this
(both yearly major releases as well as quarterly minor):

- a Git commit is determined for a release on Monday (before noon
EDT/EST); a tarball produced from that commit is posted to registered
packagers

- By Tuesday evening EDT/EST, if no packagers have reported problems,
the tag corresponding to the given commit is pushed to the repo

- By Thursday noon EDT/EST the announcement is made and the packages are
made available.

Thus packagers have all of Monday and Tuesday to report problems.  They
typically don't, since the buildfarm alerts us soon enough to any
portability problems.  Packaging issues are normally found (and dealt
with) during beta.

If for whatever reason a problem is found before the tag has been
posted, then a new Git commit is chosen and a new tarball published.
The tag will then match the new commit, not the original obviously.
I don't know what would happen if a problem were to be found *after* the
tag has been pushed; normally that would just mean the fix would have to
wait until the next minor version in that branch.  It would have to be
something really serious in order for this process to be affected.
As far as I know, this has never happened.


As far as Postgres is concerned, you could automate things so that a tag
detected on a Tuesday is marked as released on the immediately following
Thursday (noon EDT/EST).  If you did this, you'd get it right 99% of the
time, and the only way to achieve 100% would be to have a bot that
follows the quarterly announcements in pgsql-announce.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Allow tests to pass in OpenSSL FIPS mode

2022-10-13 Thread Alvaro Herrera
On 2022-Oct-13, Peter Eisentraut wrote:

> Right, that's the rest of my original patch.  I'll come back with an updated
> version of that.

However, there are some changes in brin_multi.out that are quite
surprising and suggest that we might have bugs in brin:

+WARNING:  unexpected number of results 31 for 
(macaddr8col,>,macaddr8,b1:d1:0e:7b:af:a4:42:12,33)
+WARNING:  unexpected number of results 17 for 
(macaddr8col,>=,macaddr8,d9:35:91:bd:f7:86:0e:1e,15)
+WARNING:  unexpected number of results 11 for 
(macaddr8col,<=,macaddr8,23:e8:46:63:86:07:ad:cb,13)
+WARNING:  unexpected number of results 4 for 
(macaddr8col,<,macaddr8,13:16:8e:6a:2e:6c:84:b4,6)

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"




Re: Move backup-related code to xlogbackup.c/.h

2022-10-13 Thread Alvaro Herrera
On 2022-Oct-13, Bharath Rupireddy wrote:

> Hm. Agree. But, that requires us to include xlogbackup.h in
> xlog_internal.h for SessionBackupState enum in
> ResetXLogBackupActivity(). Is that okay?

It's not great, but it's not *that* bad, ISTM, mainly because
xlog_internal.h will affect less stuff than xlog.h.

> SessionBackupState and it needs to be set before we release WAL insert
> locks, see the comment [1].

I see.  Maybe we could keep that enum in xlog.h, instead.

While looking at how that works: I think calling a local variable
"session_backup_state" is super confusing, seeing that we have a
file-global variable called sessionBackupState.  I recommend naming the
local "newstate" or something along those lines instead.

I wonder why does pg_backup_start_callback() not change the backup state
before your patch.  This seems a gratuitous difference, or is it?  If
you change that code so that it also sets the status to BACKUP_NONE,
then you can pass a bare SessionBackupState to ResetXLogBackupActivity
rather than a pointer to one, which is a very strange arrangement that
exists only so that you can have a third state (NULL) meaning "don't
change state" -- that looks quite weird.

Alternatively, if you don't want or can't change
pg_backup_start_callback to pass a valid state value, another solution
might be to pass a separate boolean "change state".

But I would look at having another patch before your series that changes
pg_backup_start_callback to make the code identical for the three
callers, then you can simplify the patched code.

> Should we just remove the
> SessionBackupState enum and convert SESSION_BACKUP_NONE and
> SESSION_BACKUP_RUNNING to just macros in xlogbackup.h and use integer
> type to pass the state across? I don't know what's better here. Do you
> have any thoughts on this?

No, please, no passing of unadorned magic numbers.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Amit Kapila
On Wed, Oct 12, 2022 at 4:16 PM vignesh C  wrote:
>
> On Thu, 6 Oct 2022 at 12:44, Amit Kapila  wrote:
> >
> > On Sat, Oct 1, 2022 at 12:35 AM Andres Freund  wrote:
> > >
> > > This issue does occasionally happen in CI, as e.g. noted in this thread:
> > > https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com
> > >
> > > On 2022-08-18 15:17:47 +0530, Amit Kapila wrote:
> > > > I agree with you that getting rid of the clean-up lock on the new
> > > > bucket is a more invasive patch and should be done separately if
> > > > required. Yesterday, I have done a brief analysis and I think that is
> > > > possible but it doesn't seem to be a good idea to backpatch it.
> > >
> > > My problem with this approach is that the whole cleanup lock is hugely
> > > misleading as-is. As I noted in
> > > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de
> > > we take the cleanup lock *after* re-initializing the page. Thereby
> > > completely breaking the properties that a cleanup lock normally tries to
> > > guarantee.
> > >
> > > Even if that were to achieve something useful (doubtful in this case),
> > > it'd need a huge comment explaining what's going on.
> > >
> >
> > Attached are two patches. The first patch is what Robert has proposed
> > with some changes in comments to emphasize the fact that cleanup lock
> > on the new bucket is just to be consistent with the old bucket page
> > locking as we are initializing it just before checking for cleanup
> > lock. In the second patch, I removed the acquisition of cleanup lock
> > on the new bucket page and changed the comments/README accordingly.
> >
> > I think we can backpatch the first patch and the second patch can be
> > just a HEAD-only patch. Does that sound reasonable to you?
>
> Thanks for the patches.
> I have verified that the issue is fixed using a manual test upto
> REL_10_STABLE version and found it to be working fine.
>

Thanks for the verification. I am planning to push the first patch
(and backpatch it) next week (by next Tuesday) unless we have more
comments or Robert intends to push it.

-- 
With Regards,
Amit Kapila.




Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant

2022-10-13 Thread David Rowley
On Thu, 13 Oct 2022 at 21:17, Richard Guo  wrote:
>
> On Thu, Oct 13, 2022 at 2:48 PM David Rowley  wrote:
>> To stop the planner from adding that final sort, I opted to hack the
>> LimitPath's pathkeys to say that it's already sorted by the
>> PlannerInfo's sort_pathkeys.  That feels slightly icky, but it does
>> seem a little wasteful to initialise a sort node on every execution of
>> the plan to sort a single tuple.
>
>
> I don't get how this plan comes out. It seems not correct because Limit
> node above an unsorted path would give us an unpredictable row.

Actually, you're right. That manual setting of the pathkeys is an
unneeded remanent from a bug I fixed before sending out v2.  It can
just be removed.

I've attached the v3 patch.

David
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 5d0fd6e072..839b034b29 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4780,11 +4780,44 @@ create_final_distinct_paths(PlannerInfo *root, 
RelOptInfo *input_rel,
 
if (pathkeys_contained_in(needed_pathkeys, 
path->pathkeys))
{
-   add_path(distinct_rel, (Path *)
-create_upper_unique_path(root, 
distinct_rel,
-   
  path,
-   
  list_length(root->distinct_pathkeys),
-   
  numDistinctRows));
+   /*
+* distinct_pathkeys may have become empty if 
all of the
+* pathkeys were determined to be redundant.  
If all of the
+* pathkeys are redundant then each DISTINCT 
target must only
+* allow a single value, therefore all 
resulting tuples must
+* be identical.  We can uniquify these tuples 
simply by just
+* taking the first tuple.  All we do here is 
add a path to do
+* LIMIT 1 on path.  When doing DISTINCT ON we 
may still have
+* a non-NIL sort_pathkeys list, so we must 
still only do this
+* with paths which are contained in 
needed_pathkeys.
+*/
+   if (root->distinct_pathkeys == NIL)
+   {
+   Node   *limitCount;
+
+   limitCount = (Node *) 
makeConst(INT8OID, -1, InvalidOid,
+   
sizeof(int64),
+   
Int64GetDatum(1), false,
+   
FLOAT8PASSBYVAL);
+
+   /*
+* If the query already has a LIMIT 
clause, then we could
+* end up with a duplicate LimitPath in 
the final plan.
+* That does not seem worth troubling 
over too much.
+*/
+   add_path(distinct_rel, (Path *)
+
create_limit_path(root, distinct_rel, path, NULL,
+   
   limitCount, LIMIT_OPTION_COUNT,
+   
   0, 1));
+   }
+   else
+   {
+   add_path(distinct_rel, (Path *)
+
create_upper_unique_path(root, distinct_rel,
+   
  path,
+   
  list_length(root->distinct_pathkeys),
+   
  numDistinctRows));
+   }
}
}
 
@@ -4805,13 +4838,33 @@ create_final_distinct_paths(PlannerInfo *root, 
RelOptInfo *input_rel,
path = (Path *) create_sort_path(root, distinct_rel,
   

Re: Suppressing useless wakeups in walreceiver

2022-10-13 Thread Alvaro Herrera
I think in 0001 we should put more stuff in the state struct --
specifically these globals:

static int  recvFile = -1;
static TimeLineID recvFileTLI = 0;
static XLogSegNo recvSegNo = 0;

The main reason is that it seems odd to have startpointTLI in the struct
used in some places together with a file-global recvFileTLI which isn't.
The way one is passed as argument and the other as part of a struct
seemed too distracting.  This should reduce the number of moving parts,
ISTM.


One thing that confused me for a moment is that we have some state in
walrcv and some more state in 'state'.  The difference is pretty obvious
once you look at the other, but it suggest to me that a better variable
name for the latter is 'localstate' to more obviously distinguish them.


I was tempted to suggest that LogstreamResult would also be good to have
in the new struct, but that might be going a bit too far for a first
cut.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Make finding openssl program a configure or meson option

2022-10-13 Thread Peter Eisentraut

On 12.10.22 03:08, Michael Paquier wrote:

On Tue, Oct 11, 2022 at 05:06:22PM +0200, Peter Eisentraut wrote:

Various test suites use the "openssl" program as part of their setup. There
isn't a way to override which openssl program is to be used, other than by
fiddling with the path, perhaps.  This has gotten increasingly problematic
with some of the work I have been doing, because different versions of
openssl have different capabilities and do different things by default.
This patch checks for an openssl binary in configure and meson setup, with
appropriate ways to override it.  This is similar to how "lz4" and "zstd"
are handled, for example.  The meson build system actually already did this,
but the result was only used in some places. This is now applied more
uniformly.


openssl-env allows the use of the environment variable of the same
name.  This reminds me a bit of the recent interferences with GZIP,
for example.


Sorry, what is "openssl-env"?  I can't find that anywhere.


This patch is missing one addition of set_single_env() in
vcregress.pl, and one update of install-windows.sgml where all the
supported environment variables for commands are listed.


Ok, I'll add that.





Re: Allow tests to pass in OpenSSL FIPS mode

2022-10-13 Thread Peter Eisentraut

On 12.10.22 03:18, Michael Paquier wrote:

On Tue, Oct 11, 2022 at 01:51:50PM +0200, Peter Eisentraut wrote:

Let's make a small start on this.  The attached patch moves the tests of the
md5() function to a separate test file.  That would ultimately make it
easier to maintain a variant expected file for FIPS mode where that function
will fail (similar to how we have done it for the pgcrypto tests).


Makes sense to me.  This slice looks fine.


Committed.


I think that the other md5() computations done in the main regression
test suite could just be switched to use one of the sha*() functions
as they just want to put their hands on text values.  It looks like a
few of them have some expections with the output size and
generate_series(), though, but this could be tweaked by making the
series shorter, for example.


Right, that's the rest of my original patch.  I'll come back with an 
updated version of that.






Re: Move backup-related code to xlogbackup.c/.h

2022-10-13 Thread Bharath Rupireddy
On Thu, Oct 13, 2022 at 3:12 PM Alvaro Herrera  wrote:
>
> As I see it, xlog.h is a header that exports XLOG manipulations to the
> outside world (everything that produces WAL, as well as stuff that
> controls operation); xlog_internal is the header that exports xlog*.c
> internal stuff for other xlog*.c files and specialized frontends to use.
> These new functions are internal to xlogbackup.c and xlog.c, so IMO they
> belong in xlog_internal.h.
>
> Stuff that is used from xlog.c only by xlogrecovery.c should also appear
> in xlog_internal.h only, not xlog.h, so I suggest not to take that as
> precedent.  Also, that file (xlogrecovery.c) is pretty new so we haven't
> had time to nail down the .h layout yet.

Hm. Agree. But, that requires us to include xlogbackup.h in
xlog_internal.h for SessionBackupState enum in
ResetXLogBackupActivity(). Is that okay?

SessionBackupState and it needs to be set before we release WAL insert
locks, see the comment [1]. Should we just remove the
SessionBackupState enum and convert SESSION_BACKUP_NONE and
SESSION_BACKUP_RUNNING to just macros in xlogbackup.h and use integer
type to pass the state across? I don't know what's better here. Do you
have any thoughts on this?

[1]
 * You might think that WALInsertLockRelease() can be called before
 * cleaning up session-level lock because session-level lock doesn't need
 * to be protected with WAL insertion lock. But since
 * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be
 * cleaned up before it.
 */

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Suppressing useless wakeups in walreceiver

2022-10-13 Thread Bharath Rupireddy
On Tue, Oct 11, 2022 at 11:22 PM Nathan Bossart
 wrote:
>
> On Tue, Oct 11, 2022 at 09:34:25AM +0530, Bharath Rupireddy wrote:
> > now = t1;
> > XLogWalRcvSendReply() /* say it ran for a minute or so for whatever reasons 
> > */
> > XLogWalRcvSendHSFeedback() /* with patch walrecevier sends hot standby
> > feedback more often without properly honouring
> > wal_receiver_status_interval because the 'now' isn't actually the
> > current time as far as that function is concerned, it is
> > t1 + XLogWalRcvSendReply()'s time. */
> >
> > Well, is this really a problem? I'm not sure about that. Let's hear from 
> > others.
>
> For this example, the feedback message would just be sent in the next loop
> iteration instead.

I think the hot standby feedback message gets sent too frequently
without honouring the wal_receiver_status_interval because the 'now'
is actually not the current time with your patch but 'now +
XLogWalRcvSendReply()'s time'. However, it's possible that I may be
wrong here.

/*
 * Send feedback at most once per wal_receiver_status_interval.
 */
if (!TimestampDifferenceExceeds(sendTime, now,
wal_receiver_status_interval * 1000))
return;

As said upthread [1], I think the best way to move forward is to
separate the GetCurrentTimestamp() calls optimizations into 0003.

Do you have any further thoughts on review comments posted upthread [1]?

[1] 
https://www.postgresql.org/message-id/CALj2ACV5q2dbVRKwu2PL2s_YY0owZFTRxLdX%3Dt%2BdZ1iag15khA%40mail.gmail.com

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Fast COPY FROM based on batch insert

2022-10-13 Thread Etsuro Fujita
On Thu, Oct 13, 2022 at 1:38 PM Andrey Lepikhov
 wrote:
> On 10/12/22 07:56, Etsuro Fujita wrote:
> > On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov
> >  wrote:
> >> I reviewed the patch one more time. Only one question: bistate and
> >> ri_FdwRoutine are strongly bounded. Maybe to add some assertion on
> >> (ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.
> >
> > You mean the bistate member of CopyMultiInsertBuffer?
> Yes
> >
> > We do not use that member at all for foreign tables, so the patch
> > avoids initializing that member in CopyMultiInsertBufferInit() when
> > called for a foreign table.  So we have bistate = NULL for foreign
> > tables (and bistate != NULL for plain tables), as you mentioned above.
> > I think it is a good idea to add such assertions.  How about adding
> > them to CopyMultiInsertBufferFlush() and
> > CopyMultiInsertBufferCleanup() like the attached?  In the attached I
> > updated comments a bit further as well.
> Yes, quite enough.

I have committed the patch after tweaking comments a little bit further.

Best regards,
Etsuro Fujita




Re: archive modules

2022-10-13 Thread Bharath Rupireddy
On Mon, Oct 10, 2022 at 1:17 PM Peter Eisentraut
 wrote:
>
> On 05.10.22 20:57, Nathan Bossart wrote:
> >> What we are talking about here is, arguably, a misconfiguration, so it
> >> should result in an error.
> >
> > Okay.  What do you think about something like the attached?

The intent here looks reasonable to me. However, why should the user
be able to set both archive_command and archive_library in the first
place only to later fail in LoadArchiveLibrary() per the patch? IMO,
the check_hook() is the right way to disallow any sorts of GUC
misconfigurations, no?

FWIW, I'm attaching a small patch that uses check_hook().

> That looks like the right solution to me.
>
> Let's put that into PG 16, and maybe we can consider backpatching it.

+1 to backpatch to PG 15 where the archive modules feature was introduced.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From c5969071dfb9064bbf9e22513bf2cef57bcfd84f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 13 Oct 2022 09:37:40 +
Subject: [PATCH v1] Handle misconfigurations of archive_command and
 archive_library

The parameters archive_command and archive_library are mutually
exclusive. This patch errors out if the user is trying to set both
of them at a time.
---
 doc/src/sgml/config.sgml| 11 +++
 src/backend/access/transam/xlog.c   | 16 
 src/backend/postmaster/pgarch.c | 18 +-
 src/backend/utils/misc/guc_tables.c |  4 ++--
 src/include/utils/guc_hooks.h   |  4 
 5 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 66312b53b8..62e7d61da7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3597,9 +3597,10 @@ include_dir 'conf.d'


 This parameter can only be set in the postgresql.conf
-file or on the server command line.  It is ignored unless
-archive_mode was enabled at server start and
-archive_library is set to an empty string.
+file or on the server command line. It is not allowed to set both
+archive_command and archive_library
+at the same time, doing so will cause an error. This parameter is ignored
+unless archive_mode was enabled at server start.
 If archive_command is an empty string (the default) while
 archive_mode is enabled (and archive_library
 is set to an empty string), WAL archiving is temporarily
@@ -3625,7 +3626,9 @@ include_dir 'conf.d'
 The library to use for archiving completed WAL file segments.  If set to
 an empty string (the default), archiving via shell is enabled, and
  is used.  Otherwise, the specified
-shared library is used for archiving.  For more information, see
+shared library is used for archiving. It is not allowed to set both
+archive_library and archive_command
+at the same time, doing so will cause an error. For more information, see
  and
 .

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 27085b15a8..64714c6940 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4462,6 +4462,22 @@ show_archive_command(void)
 		return "(disabled)";
 }
 
+/*
+ * GUC check_hook for archive_command
+ */
+bool
+check_archive_command(char **newval, void **extra, GucSource source)
+{
+	if (*newval && strcmp(*newval, "") != 0 && XLogArchiveLibrary[0] != '\0')
+	{
+		GUC_check_errmsg("cannot set \"archive_command\" when \"archive_library\" is specified");
+		GUC_check_errdetail("Only one of \"archive_command\" or \"archive_library\" can be specified.");
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * GUC show_hook for in_hot_standby
  */
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3868cd7bd3..f8c05754a1 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -44,7 +44,7 @@
 #include "storage/procsignal.h"
 #include "storage/shmem.h"
 #include "storage/spin.h"
-#include "utils/guc.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 
@@ -146,6 +146,22 @@ static int	ready_file_comparator(Datum a, Datum b, void *arg);
 static void LoadArchiveLibrary(void);
 static void call_archive_module_shutdown_callback(int code, Datum arg);
 
+/*
+ * GUC check_hook for check_archive_library
+ */
+bool
+check_archive_library(char **newval, void **extra, GucSource source)
+{
+	if (*newval && strcmp(*newval, "") != 0 && XLogArchiveCommand[0] != '\0')
+	{
+		GUC_check_errmsg("cannot set \"archive_library\" when \"archive_command\" is specified");
+		GUC_check_errdetail("Only one of \"archive_library\" or \"archive_command\" can be specified.");
+		return false;
+	}
+
+	return true;
+}
+
 /* Report shared memory space needed by PgArchShmemInit */
 

Lack of PageSetLSN in heap_xlog_visible

2022-10-13 Thread Konstantin Knizhnik

Hi hackers!

heap_xlog_visible is not bumping heap page LSN when setting all-visible 
flag in it.

There is long comment explaining it:

    /*
     * We don't bump the LSN of the heap page when setting the 
visibility

     * map bit (unless checksums or wal_hint_bits is enabled, in which
     * case we must), because that would generate an unworkable 
volume of
     * full-page writes.  This exposes us to torn page hazards, but 
since

     * we're not inspecting the existing page contents in any way, we
     * don't care.
     *
     * However, all operations that clear the visibility map bit 
*do* bump
     * the LSN, and those operations will only be replayed if the 
XLOG LSN
     * follows the page LSN.  Thus, if the page LSN has advanced 
past our

     * XLOG record's LSN, we mustn't mark the page all-visible, because
     * the subsequent update won't be replayed to clear the flag.
     */

But it still not clear for me that not bumping LSN in this place is 
correct if wal_log_hints is set.
In this case we will have VM page with larger LSN than heap page, 
because visibilitymap_set
bumps LSN of VM page. It means that in theory after recovery we may have 
page marked as all-visible in VM,
but not having PD_ALL_VISIBLE  in page header. And it violates VM 
constraint:


 * When we *set* a visibility map during VACUUM, we must write WAL. 
This may
 * seem counterintuitive, since the bit is basically a hint: if it is 
clear,

 * it may still be the case that every tuple on the page is visible to all
 * transactions; we just don't know that for certain.  The difficulty 
is that
 * there are two bits which are typically set together: the 
PD_ALL_VISIBLE bit
 * on the page itself, and the visibility map bit.  If a crash occurs 
after the
 * visibility map page makes it to disk and before the updated heap 
page makes

 * it to disk, redo must set the bit on the heap page.  Otherwise, the next
 * insert, update, or delete on the heap page will fail to realize that the
 * visibility map bit must be cleared, possibly causing index-only scans to
 * return wrong answers.






Re: Move backup-related code to xlogbackup.c/.h

2022-10-13 Thread Alvaro Herrera
On 2022-Oct-13, Bharath Rupireddy wrote:

> On Wed, Oct 12, 2022 at 1:04 PM Alvaro Herrera  
> wrote:

> > You added some commentary that these functions are tailor-made for
> > internal operations, and then declared them in the most public header
> > function that xlog has?  I think at the bare minimum, these prototypes
> > should be in xlog_internal.h, not xlog.h.
> 
> I removed such comments. These are functions used by xlogbackup.c to
> call back into xlog.c similar to the call back functions defined in
> xlog.h for xlogrecovery.c. And, most of the XLogCtl set/get sort of
> function declarations are in xlog.h. So, I'm retaining them in xlog.h.

As I see it, xlog.h is a header that exports XLOG manipulations to the
outside world (everything that produces WAL, as well as stuff that
controls operation); xlog_internal is the header that exports xlog*.c
internal stuff for other xlog*.c files and specialized frontends to use.
These new functions are internal to xlogbackup.c and xlog.c, so IMO they
belong in xlog_internal.h.

Stuff that is used from xlog.c only by xlogrecovery.c should also appear
in xlog_internal.h only, not xlog.h, so I suggest not to take that as
precedent.  Also, that file (xlogrecovery.c) is pretty new so we haven't
had time to nail down the .h layout yet.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




RE: [RFC] building postgres with meson - v13

2022-10-13 Thread shiy.f...@fujitsu.com
Hi,

I noticed that `pg_config --configure` didn't show the options given when
building with meson. 

For example, 
meson setup build -Dcache=gcc.cache -Ddtrace=enabled -Dicu=enabled 
-Dcassert=true -Dprefix=/home/postgres/install_meson/
meson compile -C build 
meson install -C build

$ pg_config --configure

The options I specified (like dtrace) are not shown. I found they actually work
in compilation.
When specifying `-Ddtrace=enabled`, there is a log like this. And with
`-Ddtrace=disabled`, no such log.

[120/1834] /usr/bin/dtrace -C -h -s 
../src/include/utils/../../backend/utils/probes.d -o 
src/include/utils/probes.h.tmp

Maybe it would be better if pg_config can output this information, to be
consistent with the output when building with `./configure` and `make`.

The output when building with `./configure` and `make`:
$ pg_config --configure
 '--prefix=/home/postgres/install/' '--cache' 'gcc.cache' '--enable-dtrace' 
'--with-icu' '--enable-cassert'


Regards,
Shi yu




Re: Refactor to introduce pg_strcoll().

2022-10-13 Thread Peter Eisentraut

On 07.10.22 01:15, Jeff Davis wrote:

+ * Call ucol_strcollUTF8(), ucol_strcoll(), strcoll(), strcoll_l(), wcscoll(),
+ * or wcscoll_l() as appropriate for the given locale, platform, and database
+ * encoding. Arguments must be NUL-terminated. If the locale is not specified,
+ * use the database collation.
+ *
+ * If the collation is deterministic, break ties with memcmp(), and then with
+ * the string length.
+ */
+int
+pg_strcoll(const char *arg1, int len1, const char *arg2, int len2,
+  pg_locale_t locale)


It's a bit confusing that arguments must be NUL-terminated, but the 
length is still specified.  Maybe another sentence to explain that would 
be helpful.


The length arguments ought to be of type size_t, I think.





Question about pull_up_sublinks_qual_recurse

2022-10-13 Thread Andy Fan
Hi:

When I was working on another task, the following case caught my mind.

create table t1(a int, b int, c int);
create table t2(a int, b int, c int);
create table t3(a int, b int, c int);

explain (costs off) select * from t1
where exists (select 1 from t2
  where exists (select 1 from t3
   where t3.c = t1.c
   and t2.b = t3.b)
and t2.a = t1.a);


I got the plan like this:

QUERY PLAN
---
 Hash Semi Join
   Hash Cond: (t1.a = t2.a)
   Join Filter: (hashed SubPlan 2)
   ->  Seq Scan on t1
   ->  Hash
 ->  Seq Scan on t2
   SubPlan 2
 ->  Seq Scan on t3
(8 rows)

Note we CAN'T pull up the inner sublink which produced the SubPlan 2.


I traced the reason is after we pull up the outer sublink, we got:

select * from t1 semi join t2 on t2.a = t1.a AND
exists (select 1 from t3
where t3.c = t1.c
and t2.b = t3.b);

Later we tried to pull up the EXISTS sublink to t1 OR t2 *separately*, since
this subselect referenced to t1 *AND* t2, so we CAN'T pull up the sublink. I
am thinking why we have to pull up it t1 OR t2 rather than JoinExpr(t1, t2),
I think the latter one is better.

So I changed the code like this,  I got the plan I wanted and 'make
installcheck' didn't find any exception.


   QUERY PLAN

 Hash Semi Join
   Hash Cond: ((t2.b = t3.b) AND (t1.c = t3.c))
   ->  Hash Semi Join
 Hash Cond: (t1.a = t2.a)
 ->  Seq Scan on t1
 ->  Hash
   ->  Seq Scan on t2
   ->  Hash
 ->  Seq Scan on t3
(9 rows)

@@ -553,10 +553,10 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node
*node,
  */
  j->quals = pull_up_sublinks_qual_recurse(root,
  j->quals,
- >larg,
- available_rels1,
- >rarg,
- child_rels);
+ jtlink1,
+ bms_union(available_rels1, child_rels),
+ NULL,
+ NULL);
  /* Return NULL representing constant TRUE */
  return NULL;
  }

Any feedback is welcome.

-- 
Best Regards
Andy Fan


libpq support for NegotiateProtocolVersion

2022-10-13 Thread Peter Eisentraut
We have the NegotiateProtocolVersion protocol message [0], but libpq 
doesn't actually handle it.


Say I increase the protocol number in libpq:

-   conn->pversion = PG_PROTOCOL(3, 0);
+   conn->pversion = PG_PROTOCOL(3, 1);

Then I get

psql: error: connection to server on socket "/tmp/.s.PGSQL.65432" 
failed: expected authentication request from server, but received v


And the same for a protocol option (_pq_.something).

Over in the column encryption patch, I'm proposing to add such a 
protocol option, and the above is currently the behavior when the server 
doesn't support it.


The attached patch adds explicit handling of this protocol message to 
libpq.  So the output in the above case would then be:


psql: error: connection to server on socket "/tmp/.s.PGSQL.65432" 
failed: protocol version not supported by server: client uses 3.1, 
server supports 3.0


Or to test a protocol option:

@@ -2250,6 +2291,8 @@ build_startup_packet(const PGconn *conn, char *packet,
if (conn->client_encoding_initial && conn->client_encoding_initial[0])
ADD_STARTUP_OPTION("client_encoding", 
conn->client_encoding_initial);


+   ADD_STARTUP_OPTION("_pq_.foobar", "1");
+
/* Add any environment-driven GUC settings needed */
for (next_eo = options; next_eo->envName; next_eo++)
{

Result:

psql: error: connection to server on socket "/tmp/.s.PGSQL.65432" 
failed: protocol extension not supported by server: _pq_.foobar



[0]: 
https://www.postgresql.org/docs/devel/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-NEGOTIATEPROTOCOLVERSIONFrom 93df3a8d0a15b718669e4b21d8455a91ca1896fd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 13 Oct 2022 10:22:49 +0200
Subject: [PATCH] libpq: Handle NegotiateProtocolVersion message

Before, receiving a NegotiateProtocolVersion message would result in a
confusing error message like

expected authentication request from server, but received v

This adds proper handling of this protocol message and produces an
on-topic error message from it.
---
 src/interfaces/libpq/fe-connect.c   | 18 ++---
 src/interfaces/libpq/fe-protocol3.c | 41 +
 src/interfaces/libpq/libpq-int.h|  1 +
 3 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index 746e9b4f1efc..1c0d8243a6ca 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3246,10 +3246,11 @@ PQconnectPoll(PGconn *conn)
 
/*
 * Validate message type: we expect only an 
authentication
-* request or an error here.  Anything else 
probably means
-* it's not Postgres on the other end at all.
+* request, NegotiateProtocolVersion, or an 
error here.
+* Anything else probably means it's not 
Postgres on the other
+* end at all.
 */
-   if (!(beresp == 'R' || beresp == 'E'))
+   if (!(beresp == 'R' || beresp == 'v' || beresp 
== 'E'))
{
appendPQExpBuffer(>errorMessage,
  
libpq_gettext("expected authentication request from server, but received %c\n"),
@@ -3405,6 +3406,17 @@ PQconnectPoll(PGconn *conn)
 
goto error_return;
}
+   else if (beresp == 'v')
+   {
+   if 
(pqGetNegotiateProtocolVersion3(conn))
+   {
+   /* We'll come back when there 
is more data */
+   return PGRES_POLLING_READING;
+   }
+   /* OK, we read the message; mark data 
consumed */
+   conn->inStart = conn->inCursor;
+   goto error_return;
+   }
 
/* It is an authentication request. */
conn->auth_req_received = true;
diff --git a/src/interfaces/libpq/fe-protocol3.c 
b/src/interfaces/libpq/fe-protocol3.c
index f001137b7692..2f6c1494c1b5 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -1397,6 +1397,47 @@ reportErrorPosition(PQExpBuffer msg, const char *query, 
int loc, int encoding)
 }
 
 
+/*
+ * Attempt to read a NegotiateProtocolVersion message.
+ * Entry: 'v' message type and length have already been consumed.
+ * Exit: returns 0 if successfully consumed message.
+ * 

Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant

2022-10-13 Thread Richard Guo
On Thu, Oct 13, 2022 at 2:48 PM David Rowley  wrote:

> On Thu, 13 Oct 2022 at 16:47, Richard Guo  wrote:
> > Currently in the patch the optimization is done before we check for
> > presorted paths or do the explicit sort of the cheapest path. How about
> > we move this optimization into the branch where we've found any
> > presorted paths?  Maybe something like:
>
> I've attached a patch to that effect, but it turns out a bit more
> complex than what you imagined.  We still need to handle the case for
> when there's no path that has the required pathkeys and we must add a
> SortPath to the cheapest path. That requires adding some similar code
> to add the LimitPath after the foreach loop over the pathlist is over.


Thanks for the new patch. Previously I considered we just apply this
optimization for adequately-presorted paths so that we can just fetch
the first row from that path. But yes we can also do this optimization
for explicit-sort case so that we can get the result from a top-1
heapsort, just like the new patch does.


I was also getting some weird plans like:
>
> regression=# explain select distinct on (four) four,hundred from tenk1
> where four=0 order by 1,2;
>   QUERY PLAN
> --
>  Sort  (cost=0.20..0.20 rows=1 width=8)
>Sort Key: hundred
>->  Limit  (cost=0.00..0.19 rows=1 width=8)
>  ->  Seq Scan on tenk1  (cost=0.00..470.00 rows=2500 width=8)
>Filter: (four = 0)
>
> To stop the planner from adding that final sort, I opted to hack the
> LimitPath's pathkeys to say that it's already sorted by the
> PlannerInfo's sort_pathkeys.  That feels slightly icky, but it does
> seem a little wasteful to initialise a sort node on every execution of
> the plan to sort a single tuple.


I don't get how this plan comes out. It seems not correct because Limit
node above an unsorted path would give us an unpredictable row. I tried
this query without the hack to LimitPath's pathkeys and I get plans
below, with or with index scan:

explain (costs off) select distinct on (four) four,hundred from tenk1
where four=0 order by 1,2;
 QUERY PLAN
-
 Result
   ->  Limit
 ->  Index Scan using tenk1_hundred on tenk1
   Filter: (four = 0)

explain (costs off) select distinct on (four) four,hundred from tenk1
where four=0 order by 1,2;
QUERY PLAN
--
 Limit
   ->  Sort
 Sort Key: hundred
 ->  Seq Scan on tenk1
   Filter: (four = 0)

Thanks
Richard


Re: ExecRTCheckPerms() and many prunable partitions

2022-10-13 Thread Amit Langote
On Wed, Oct 12, 2022 at 10:50 PM Peter Eisentraut
 wrote:
> On 06.10.22 15:29, Amit Langote wrote:
> > I tried in the attached 0004.  ModifyTable gets a new member
> > extraUpdatedColsBitmaps, which is List of Bitmapset "nodes".
> >
> > Actually, List of Bitmapsets turned out to be something that doesn't
> > just-work with our Node infrastructure, which I found out thanks to
> > -DWRITE_READ_PARSE_PLAN_TREES.  So, I had to go ahead and add
> > first-class support for copy/equal/write/read support for Bitmapsets,
> > such that writeNode() can write appropriately labeled versions of them
> > and nodeRead() can read them as Bitmapsets.  That's done in 0003.  I
> > didn't actually go ahead and make*all*  Bitmapsets in the plan trees
> > to be Nodes, but maybe 0003 can be expanded to do that.  We won't need
> > to make gen_node_support.pl emit *_BITMAPSET_FIELD() blurbs then; can
> > just use *_NODE_FIELD().
>
> Seeing that on 64-bit platforms we have a 4-byte padding gap in the
> Bitmapset struct, sticking a node tag in there seems pretty sensible.
> So turning Bitmapset into a proper Node and then making the other
> adjustments you describe makes sense to me.
>
> Making a new thread about this might be best.
>
> (I can't currently comment on the rest of the patch set.  So I don't
> know if you'll really end up needing lists of bitmapsets.  But from here
> it looks like turning bitmapsets into nodes might be a worthwhile change
> just by itself.)

Ok, thanks.  I'll start a new thread about it.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: SI-read predicate locks on materialized views

2022-10-13 Thread Michael Paquier
On Fri, Sep 30, 2022 at 10:12:13AM +0900, Yugo NAGATA wrote:
> Thank you for comment. Do you think it can be marked as Ready for Commiter?

Matviews have been discarded from needing predicate locks since
3bf3ab8 and their introduction, where there was no concurrent flavor
of refresh yet.  Shouldn't this patch have at least an isolation test
to show the difference in terms of read-write conflicts with some
serializable transactions and REFRESH CONCURRENTLY?
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2022-10-13 Thread Bharath Rupireddy
On Thu, Sep 22, 2022 at 7:16 AM Bharath Rupireddy
 wrote:
>
> PSA v8 patch.

I simplified the code a bit by using a fixed temporary file name
(thanks Michael Paquier for the suggestion) much like the core does in
XLogFileInitInternal(). If there's any left-over temp file from a
crash or failure in dir_open_for_write(), that gets deleted in the
next call.

Please review the v9 patch further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v9-0001-Make-WAL-file-initialization-by-pg_receivewal-ato.patch
Description: Binary data


RE: Add mssing test to test_decoding/meson.build

2022-10-13 Thread kuroda.hay...@fujitsu.com
Dear Michael,

> Thanks, applied.  This was an oversight of 7f13ac8, and the CI accepts
> the test.

I confirmed your commit. Great thanks!


Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: Pluggable toaster

2022-10-13 Thread Nikita Malakhov
Hi hackers,

Fixed warning that failed build in previous patchset.
Here's rebased patch:
v23-0001-toaster-interface.patch - Pluggable TOAST API interface with
reference (original) TOAST mechanics - new API is introduced but
reference TOAST is still left unchanged;
v23-0002-toaster-default.patch - Default TOAST mechanics is re-implemented
using TOAST API and is plugged in as Default Toaster;
v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package

For TOAST API explanation please check /src/backend/access/README.toastapi

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean
Please note that because the development goes on, the actual branch
contains much more than is provided here.

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


v23-0003-toaster-docs.patch.gz
Description: GNU Zip compressed data


v23-0001-toaster-interface.patch.gz
Description: GNU Zip compressed data


v23-0002-toaster-default.patch.gz
Description: GNU Zip compressed data


Re: possible typo for CREATE PUBLICATION description

2022-10-13 Thread Peter Smith
On Thu, Oct 13, 2022 at 6:16 PM osumi.takami...@fujitsu.com
 wrote:
>
> Hi,
>
>
> I noticed a possible typo in the doc for create publication.
> This applies to PG15 as well.
> Kindly have a look at the attached patch for it.
>
>

Your typo correction LGTM.

FWIW, maybe other parts of that paragraph can be tidied too. e.g. The
words "actually" and "So" didn't seem needed IMO.

~

BEFORE
For an INSERT ... ON CONFLICT command, the publication will publish
the operation that actually results from the command. So depending on
the outcome, it may be published as either INSERT or UPDATE, or it may
not be published at all.

SUGGESTION
For an INSERT ... ON CONFLICT command, the publication will publish
the operation that results from the command. Depending on the outcome,
it may be published as either INSERT or UPDATE, or it may not be
published at all.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




possible typo for CREATE PUBLICATION description

2022-10-13 Thread osumi.takami...@fujitsu.com
Hi,


I noticed a possible typo in the doc for create publication.
This applies to PG15 as well.
Kindly have a look at the attached patch for it.


Best Regards,
Takamichi Osumi



v1-0001-modify-a-typo.patch
Description: v1-0001-modify-a-typo.patch


Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-10-13 Thread Amit Kapila
On Wed, Oct 12, 2022 at 11:18 AM Masahiko Sawada  wrote:
>
> Summarizing this issue, the assertion check in AssertTXNLsnOrder()
> fails as reported because the current logical decoding cannot properly
> handle the case where the decoding restarts from NEW_CID. Since we
> don't make the association between top-level transaction and its
> subtransaction while decoding NEW_CID (ie, in
> SnapBuildProcessNewCid()), two transactions are created in
> ReorderBuffer as top-txn and have the same LSN. This failure happens
> on all supported versions.
>
> To fix the problem, one idea is that we make the association between
> top-txn and sub-txn during that by calling ReorderBufferAssignChild(),
> as Tomas proposed. On the other hand, since we don't guarantee to make
> the association between the top-level transaction and its
> sub-transactions until we try to decode the actual contents of the
> transaction, it makes sense to me that instead of trying to solve by
> making association, we need to change the code which are assuming that
> it is associated.
>
> I've attached the patch for this idea. With the patch, we skip the
> assertion checks in AssertTXNLsnOrder() until we reach the LSN at
> which we start decoding the contents of transaction, ie.
> start_decoding_at in SnapBuild. The minor concern is other way that
> the assertion check could miss some faulty cases where two unrelated
> top-transactions could have same LSN. With this patch, it will pass
> for such a case. Therefore, for transactions that we skipped checking,
> we do the check when we reach the LSN.
>

>
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -113,6 +113,15 @@
LogicalDecodingProcessRecord(LogicalDecodingContext *ctx,
XLogReaderState *recor
  buf.origptr);
  }

+#ifdef USE_ASSERT_CHECKING
+ /*
+ * Check the order of transaction LSNs when we reached the start decoding
+ * LSN. See the comments in AssertTXNLsnOrder() for details.
+ */
+ if (SnapBuildGetStartDecodingAt(ctx->snapshot_builder) == buf.origptr)
+ AssertTXNLsnOrder(ctx->reorder);
+#endif
+
  rmgr = GetRmgr(XLogRecGetRmid(record));
>

I am not able to think how/when this check will be useful. Because we
skipped assert checking only for records that are prior to
start_decoding_at point, I think for those records ordering should
have been checked before the restart. start_decoding_at point will be
either (a) confirmed_flush location, or (b) lsn sent by client, and
any record prior to that must have been processed before restart.

Now, say we have commit records for multiple transactions which are
after start_decoding_at but all their changes are before
start_decoding_at, then we won't check their ordering at commit time
but OTOH, we would have checked their ordering before restart. Isn't
that sufficient?

-- 
With Regards,
Amit Kapila.




  1   2   >