Re: [meson] add missing pg_attribute_aligned for MSVC in meson build
On Fri, Oct 14, 2022 at 10:59:28AM +0800, Junwang Zhao wrote: > Commit ec3c9cc add pg_attribute_aligned in MSVC[1], > which was pushed one day before the meson commits, > so meson build missed this feature. > > [1]: > https://www.postgresql.org/message-id/caaaqye-hbtzvr3msomtk+hyw2s0e0oapzmw8icsmytma+mn...@mail.gmail.com Right, thanks! And it is possible to rely on _MSC_VER for that in this code path. -- Michael signature.asc Description: PGP signature
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote: > Indeed, ;-) So, I have spent the last two days looking at all that, studying the structure of the patch and the existing HEAD code, and it looks like that a few things could be consolidated. First, as of HEAD, AuthToken is only used for elements in a list of role and database names in hba.conf before filling in each HbaLine, hence we limit its usage to the initial parsing. The patch assigns an optional regex_t to it, then extends the use of AuthToken for single hostname entries in pg_hba.conf. Things going first: shouldn't we combine ident_user and "re" together in the same structure? Even if we finish by not using AuthToken to store the computed regex, it seems to me that we'd better use the same base structure for pg_ident.conf and pg_hba.conf. While looking closely at the patch, we would expand the use of AuthToken outside its original context. I have also looked at make_auth_token(), and wondered if it could be possible to have this routine compile the regexes. This approach would not stick with pg_ident.conf though, as we validate the fields in each line when we put our hands on ident_user and after the base validation of a line (number of fields, etc.). So with all that in mind, it feels right to not use AuthToken at all when building each HbaLine and each IdentLine, but a new, separate, structure. We could call that an AuthItem (string, its compiled regex) perhaps? It could have its own make() routine, taking in input an AuthToken and process pg_regcomp(). Better ideas for this new structure would be welcome, and the idea is that we'd store the post-parsing state of an AuthToken to something that has a compiled regex. We could finish by using AuthToken at the end and expand its use, but it does not feel completely right either to have a make() routine but not be able to compile its regular expression when creating the AuthToken. The logic in charge of compiling the regular expressions could be consolidated more. The patch refactors the logic with token_regcomp(), uses it for the user names (ident_user in parse_ident_line() from pg_ident.conf), then extended to the hostnames (single item) and the role/database names (list possible in these cases). This approach looks right to me. Once we plug in an AuthItem to IdentLine, token_regcomp could be changed so as it takes an AuthToken in input, saving directly the compiled regex_t in the input structure. At the end, the global structure of the code should, I guess, respect the following rules: - The number of places where we check if a string is a regex should be minimal (aka string beginning by '/'). - Only one code path of hba.c should call pg_regcomp() (the patch does that), and only one code path should call pg_regexec() (two code paths of hba.c do that with the patch, as of the need to store matching expression). This should minimize the areas where we call pg_mb2wchar_with_len(), for one. About this last point, token_regexec() does not include check_ident_usermap() in its logic, and it seems to me that it should. The difference is with the expected regmatch_t structures, so shouldn't token_regexec be extended with two arguments as of an array of regmatch_t and the number of elements in the array? This would save a bit some of the logic around pg_mb2wchar_with_len(), for example. To make all that work, token_regexec() should return an int, coming from pg_regexec, but no specific error strings as we don't want to spam the logs when checking hosts, roles and databases in pg_hba.conf. /* Check if it has a CIDR suffix and if so isolate it */ - cidr_slash = strchr(str, '/'); - if (cidr_slash) - *cidr_slash = '\0'; + if (!is_regexp) + { + cidr_slash = strchr(str, '/'); + if (cidr_slash) + *cidr_slash = '\0'; + } [...] /* Get the netmask */ - if (cidr_slash) + if (cidr_slash && !is_regexp) { Some of the code handling regexes for hostnames itches me a bit, like this one. Perhaps it would be better to evaluate this interaction with regular expressions separately. The database and role names don't have this need, so their cases are much simpler to think about. The code could be split to tackle things step-by-step: - One refactoring patch to introduce token_regcomp() and token_regexec(), with the introduction of a new structure that includes the compiled regexes. (Feel free to counterargue about the use of AuthToken for this purpose, of course!) - Plug in the refactored logic for the lists of role names and database names in pg_hba.conf. - Handle the case of single host entries in pg_hba.conf. -- Michael signature.asc Description: PGP signature
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
On Fri, Oct 14, 2022 at 2:25 AM Andres Freund wrote: > > > > > Attached are two patches. The first patch is what Robert has proposed > > with some changes in comments to emphasize the fact that cleanup lock > > on the new bucket is just to be consistent with the old bucket page > > locking as we are initializing it just before checking for cleanup > > lock. In the second patch, I removed the acquisition of cleanup lock > > on the new bucket page and changed the comments/README accordingly. > > > > I think we can backpatch the first patch and the second patch can be > > just a HEAD-only patch. Does that sound reasonable to you? > > Not particularly, no. I don't understand how "overwrite a page and then get a > cleanup lock" can sensibly be described by this comment: > > > +++ b/src/backend/access/hash/hashpage.c > > @@ -807,7 +807,8 @@ restart_expand: > >* before changing the metapage's mapping info, in case we can't get > > the > >* disk space. Ideally, we don't need to check for cleanup lock on > > new > >* bucket as no other backend could find this bucket unless meta page > > is > > - * updated. However, it is good to be consistent with old bucket > > locking. > > + * updated and we initialize the page just before it. However, it is > > just > > + * to be consistent with old bucket locking. > >*/ > > buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM); > > if (!IsBufferCleanupOK(buf_nblkno)) > > This is basically saying "I am breaking basic rules of locking just to be > consistent", no? > Fair point. How about something like: "XXX Do we really need to check for cleanup lock on the new bucket? Here, we initialize the page, so ideally we don't need to perform any operation that requires such a check."? Feel free to suggest something better. -- With Regards, Amit Kapila.
Re: make_ctags: use -I option to ignore pg_node_attr macro
On Thu, 13 Oct 2022 15:35:09 +0900 (JST) Tatsuo Ishii wrote: > > OK, that sounds good then. I would make a feature request to have a > > switch that supresses creation of these links, then. > > Ok, I have added "-n" option to make_ctags so that it skips to create > the links. > > Also I have changed make_etags so that it exec make_ctags, which seems > to be the consensus. Thank you for following up my patch. I fixed the patch to allow use both -e and -n options together. Regards, Yugo Nagata > > Best reagards, > -- > Tatsuo Ishii > SRA OSS LLC > English: http://www.sraoss.co.jp/index_en/ > Japanese:http://www.sraoss.co.jp -- Yugo NAGATA diff --git a/src/tools/make_ctags b/src/tools/make_ctags index 912b6fafac..8062ff2d3e 100755 --- a/src/tools/make_ctags +++ b/src/tools/make_ctags @@ -1,12 +1,37 @@ #!/bin/sh -# src/tools/make_ctags +# src/tools/make_ctags [-e] [-n] +# If -e is specified, generate tags files for emacs. +# If -n is specified, don't create symbolic links of tags file. +usage="$0 [-e][-n]" +if [ $# -gt 2 ];then +echo $usage +exit 1 +fi + +mode= +no_symlink= +tags_file=tags + +while [ $# -gt 0 ] +do +if [ $1 = "-e" ];then + mode="-e" + tags_file=TAGS +elif [ $1 = "-n" ];then + no_symlink=yes +else + echo $usage + exit 1 +fi +shift +done command -v ctags >/dev/null || \ { echo "'ctags' program not found" 1>&2; exit 1; } trap "ret=$?; rm -rf /tmp/$$; exit $ret" 0 1 2 3 15 -rm -f ./tags +rm -f ./$tags_file IS_EXUBERANT="" ctags --version 2>&1 | grep Exuberant && IS_EXUBERANT="Y" @@ -34,9 +59,17 @@ then FLAGS="--c-kinds=+dfmstuv" else FLAGS="-dt" fi +# Use -I option to ignore a macro +if [ "$IS_EXUBERANT" ] +then IGNORE_IDENTIFIES="-I pg_node_attr+" +else IGNORE_IDENTIFIES= +fi + # this is outputting the tags into the file 'tags', and appending -find `pwd`/ -type f -name '*.[chyl]' -print | - xargs ctags -a -f tags "$FLAGS" +find `pwd`/ \( -name tmp_install -prune -o -name tmp_check -prune \) \ + -o \( -name "*.[chly]" -o -iname "*makefile*" -o -name "*.mk" -o -name "*.in" \ + -o -name "*.sql" -o -name "*.p[lm]" \) -type f -print | + xargs ctags $mode -a -f $tags_file "$FLAGS" "$IGNORE_IDENTIFIES" # Exuberant tags has a header that we cannot sort in with the other entries # so we skip the sort step @@ -45,10 +78,13 @@ find `pwd`/ -type f -name '*.[chyl]' -print | if [ ! "$IS_EXUBERANT" ] then LC_ALL=C export LC_ALL - sort tags >/tmp/$$ && mv /tmp/$$ tags + sort $tags_file >/tmp/$$ && mv /tmp/$$ $tags_file fi -find . \( -name 'CVS' -prune \) -o \( -name .git -prune \) -o -type d -print | -while read DIR -do [ "$DIR" != "." ] && ln -f -s `echo "$DIR" | sed 's;/[^/]*;/..;g'`/tags "$DIR"/tags -done +# create symblink links +if [ ! "$no_symlink" ];then +find . \( -name 'CVS' -prune \) -o \( -name .git -prune \) -o -type d -print | + while read DIR + do [ "$DIR" != "." ] && ln -f -s `echo "$DIR" | sed 's;/[^/]*;/..;g'`/$tags_file "$DIR"/$tags_file + done +fi diff --git a/src/tools/make_etags b/src/tools/make_etags index 9288ef7b14..afc57e3e89 100755 --- a/src/tools/make_etags +++ b/src/tools/make_etags @@ -1,16 +1,6 @@ #!/bin/sh - # src/tools/make_etags -command -v etags >/dev/null || \ - { echo "'etags' program not found" 1>&2; exit 1; } - -rm -f ./TAGS - -find `pwd`/ -type f -name '*.[chyl]' -print | - xargs etags --append -o TAGS - -find . \( -name CVS -prune \) -o \( -name .git -prune \) -o -type d -print | -while read DIR -do [ "$DIR" != "." ] && ln -f -s `pwd`/TAGS "$DIR" -done +cdir=`dirname $0` +dir=`(cd $cdir && pwd)` +exec $dir/make_ctags -e $*
thinko in basic_archive.c
Hi hackers, I intended for the temporary file name generated by basic_archive.c to include the current timestamp so that the name was "sufficiently unique." Of course, this could also be used to determine the creation time, but you can just as easily use stat(1) for that. In any case, I forgot to divide the microseconds field by 1000 to obtain the current timestamp in milliseconds, so while the value is unique, it's also basically garbage. I've attached a small patch that fixes this so that the temporary file name includes the timestamp in milliseconds for when it was created. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c index 776a386e35..a02708fc12 100644 --- a/contrib/basic_archive/basic_archive.c +++ b/contrib/basic_archive/basic_archive.c @@ -265,7 +265,7 @@ basic_archive_file_internal(const char *file, const char *path) */ gettimeofday(, NULL); if (pg_mul_u64_overflow((uint64) 1000, (uint64) tv.tv_sec, ) || - pg_add_u64_overflow(epoch, (uint64) tv.tv_usec, )) + pg_add_u64_overflow(epoch, (uint64) tv.tv_usec / 1000, )) elog(ERROR, "could not generate temporary file name for archiving"); snprintf(temp, sizeof(temp), "%s/%s.%s.%d." UINT64_FORMAT,
Re: Perform streaming logical transactions by background workers and parallel apply
On Wed, Oct 12, 2022 at 7:41 AM wangw.f...@fujitsu.com wrote: > > On Fri, Oct 7, 2022 at 14:18 PM Hou, Zhijie/侯 志杰 > wrote: > > Attach the new version patch set which addressed most of the comments. > > Rebased the patch set because the new change in HEAD (776e1c8). > > Attach the new patch set. > +static void +HandleParallelApplyMessage(ParallelApplyWorkerInfo *winfo, StringInfo msg) { ... + case 'X': /* Terminate, indicating clean exit */ + { + shm_mq_detach(winfo->error_mq_handle); + winfo->error_mq_handle = NULL; + break; + } ... } I don't see the use of this message in the patch. If this is not required by the latest version then we can remove it and its corresponding handling in parallel_apply_start_worker(). I am referring to the below code in parallel_apply_start_worker(): + if (tmp_winfo->error_mq_handle == NULL) + { + /* + * Release the worker information and try next one if the parallel + * apply worker exited cleanly. + */ + ParallelApplyWorkersList = foreach_delete_current(ParallelApplyWorkersList, lc); + shm_mq_detach(tmp_winfo->mq_handle); + dsm_detach(tmp_winfo->dsm_seg); + pfree(tmp_winfo); + } -- With Regards, Amit Kapila.
Re: Fix error message for MERGE foreign tables
On Fri, Oct 14, 2022 at 12:07 PM Richard Guo wrote: > > On Fri, Oct 14, 2022 at 10:59 AM bt22nakamorit < > bt22nakamo...@oss.nttdata.com> wrote: > >> Hi, >> >> MERGE command does not accept foreign tables as targets. >> When a foreign table is specified as a target, it shows error messages >> like this: >> >> -- ERROR: cannot execute MERGE on relation "child1" >> -- DETAIL: This operation is not supported for foreign tables. >> >> However, when a partitioned table includes foreign tables as partitions >> and MERGE is executed on the partitioned table, following error message >> shows. >> >> -- ERROR: unexpected operation: 5 >> >> The latter error message is unclear, and should be the same as the >> former one. >> The attached patch adds the code to display error the former error >> messages in the latter case. >> Any thoughts? > > > +1. The new message is an improvement to the default one. > > I wonder if we can provide more details in the error message, such as > foreign table name. > Maybe something like below, so that we keep it consistent with the case of a foreign table being specified as a target. --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -1872,6 +1872,13 @@ postgresPlanForeignModify(PlannerInfo *root, returningList, _attrs); break; + case CMD_MERGE: + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +errmsg("cannot execute MERGE on relation \"%s\"", + RelationGetRelationName(rel)), + errdetail_relkind_not_supported(rel->rd_rel->relkind))); + break; Thanks Richard
Re: Fix error message for MERGE foreign tables
On Fri, Oct 14, 2022 at 10:59 AM bt22nakamorit < bt22nakamo...@oss.nttdata.com> wrote: > Hi, > > MERGE command does not accept foreign tables as targets. > When a foreign table is specified as a target, it shows error messages > like this: > > -- ERROR: cannot execute MERGE on relation "child1" > -- DETAIL: This operation is not supported for foreign tables. > > However, when a partitioned table includes foreign tables as partitions > and MERGE is executed on the partitioned table, following error message > shows. > > -- ERROR: unexpected operation: 5 > > The latter error message is unclear, and should be the same as the > former one. > The attached patch adds the code to display error the former error > messages in the latter case. > Any thoughts? +1. The new message is an improvement to the default one. I wonder if we can provide more details in the error message, such as foreign table name. Thanks Richard
Re: GUC values - recommended way to declare the C variables?
Michael Paquier writes: > On Thu, Oct 13, 2022 at 11:14:57PM -0400, Tom Lane wrote: >> For strings, maybe the rule could be "the >> old value must be NULL or strcmp-equal to the boot_val". > pg_strcasecmp()'d would be more flexible here? Don't see the point for that. The case we're talking about is where the variable is declared like char *my_guc_variable = "foo_bar"; where the initialization value is going to be a compile-time constant. I don't see why we'd need to allow any difference between that constant and the one used in guc_tables.c. On the other hand, we could insist that string values be strcmp-equal with no allowance for NULL. But that probably results in duplicate copies of the string constant, and I'm not sure it buys anything in most places. Allowing NULL doesn't seem like it creates any extra hazard for early references, because they'd just crash if they try to use the value while it's still NULL. regards, tom lane
Re: GUC values - recommended way to declare the C variables?
On Thu, Oct 13, 2022 at 11:14:57PM -0400, Tom Lane wrote: > Could we fix the out-of-sync risk by having InitializeGUCOptions insist > that the pre-existing value of the variable match what is in guc_tables.c? > That may not work for string values but I think we could insist on it > for other GUC data types. For strings, maybe the rule could be "the > old value must be NULL or strcmp-equal to the boot_val". pg_strcasecmp()'d would be more flexible here? Sometimes the character casing on the values is not entirely consistent, but no objections to use something stricter, either. -- Michael signature.asc Description: PGP signature
Re: [RFC] building postgres with meson - v13
"shiy.f...@fujitsu.com" writes: > On Fri, Oct 14, 2022 12:40 AM Andres Freund wrote: >> It'd be a fair amount of work, both initially and to maintain it, to generate >> something compatible. I can see some benefit in showing some feature >> influencing output in --configure, but compatible output doesn't seem worth >> it >> to me. > And it's ok for me that the output is not exactly the same as before. Yeah, the output doesn't have to be exactly the same. But it had better show all the options influencing the compilation, or else we will be looking for other ways to reconstruct that information. regards, tom lane
RE: [RFC] building postgres with meson - v13
On Fri, Oct 14, 2022 12:40 AM Andres Freund wrote: > > Hi, > > On 2022-10-13 09:24:51 +, shiy.f...@fujitsu.com wrote: > > I noticed that `pg_config --configure` didn't show the options given when > > building with meson. > > Yes, that was noted somewhere on this thread. > > > > Maybe it would be better if pg_config can output this information, to be > > consistent with the output when building with `./configure` and `make`. > > > > The output when building with `./configure` and `make`: > > $ pg_config --configure > > '--prefix=/home/postgres/install/' '--cache' 'gcc.cache' '--enable-dtrace' > > '-- > with-icu' '--enable-cassert' > > It'd be a fair amount of work, both initially and to maintain it, to generate > something compatible. I can see some benefit in showing some feature > influencing output in --configure, but compatible output doesn't seem worth it > to me. > I agree that there are some benefits to showing that, which helps to confirm the build options. Although that can be confirmed from the compile log, but the log may not be available all the time. And it's ok for me that the output is not exactly the same as before. Regards, Shi yu
Re: GUC values - recommended way to declare the C variables?
Peter Smith writes: > I agree if constants are used in both places then there will always be > some risk they can get out of sync again. Yeah. > But probably it is no problem to just add #defines (e.g. in > logicallauncher.h?) to be commonly used for the C variable declaration > and also in the guc_tables. The problem is exactly that there's no great place to put those #define's, at least not without incurring a lot of fresh #include bloat. Also, if you did it like that, then it doesn't really address Michael's desire to see the default value in the variable declaration. I do lean towards having the data available, mainly because of the fear I mentioned upthread that some GUCs may be accessed before InitializeGUCOptions runs. Could we fix the out-of-sync risk by having InitializeGUCOptions insist that the pre-existing value of the variable match what is in guc_tables.c? That may not work for string values but I think we could insist on it for other GUC data types. For strings, maybe the rule could be "the old value must be NULL or strcmp-equal to the boot_val". regards, tom lane
[meson] add missing pg_attribute_aligned for MSVC in meson build
Hi Andres, Commit ec3c9cc add pg_attribute_aligned in MSVC[1], which was pushed one day before the meson commits, so meson build missed this feature. [1]: https://www.postgresql.org/message-id/caaaqye-hbtzvr3msomtk+hyw2s0e0oapzmw8icsmytma+mn...@mail.gmail.com -- Regards Junwang Zhao 0001-meson-add-missing-pg_attribute_aligned-for-MSVC.patch Description: Binary data
Fix error message for MERGE foreign tables
Hi, MERGE command does not accept foreign tables as targets. When a foreign table is specified as a target, it shows error messages like this: -- ERROR: cannot execute MERGE on relation "child1" -- DETAIL: This operation is not supported for foreign tables. However, when a partitioned table includes foreign tables as partitions and MERGE is executed on the partitioned table, following error message shows. -- ERROR: unexpected operation: 5 The latter error message is unclear, and should be the same as the former one. The attached patch adds the code to display error the former error messages in the latter case. Any thoughts? Best, Tatsuhiro Nakamoridiff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index d98709e5e8..b4fd54d913 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -1872,6 +1872,9 @@ postgresPlanForeignModify(PlannerInfo *root, returningList, _attrs); break; + case CMD_MERGE: + elog(ERROR, "MERGE not permitted on foreign tables"); + break; default: elog(ERROR, "unexpected operation: %d", (int) operation); break;
Re: create subscription - improved warning message
On Thu, Oct 13, 2022 at 9:07 AM Peter Smith wrote: > > On Thu, Oct 13, 2022 at 2:01 AM Tom Lane wrote: > > > > Alvaro Herrera writes: > > > On 2022-Oct-12, Amit Kapila wrote: > > >> Okay, then I think we can commit the last error message patch of Peter > > >> [1] as we have an agreement on the same, and then work on this as a > > >> separate patch. What do you think? > > > > > No objection. > > > > Yeah, the v3-0001 patch is fine. I agree that the docs change needs > > more work. > > Thanks to everybody for the feedback/suggestions. I will work on > improving the documentation for this and post something in a day or > so. PSA a patch for adding examples of how to activate a subscription that was originally created in a disconnected mode. The new docs are added as part of the "Logical Replication - Subscription" section 31.2. The CREATE SUBSCRIPTION reference page was also updated to include links to the new docs. Feedback welcome. -- Kind Regards, Peter Smith. Fujitsu Australia. v4-0001-create-subscriptipon-pgdocs-for-deferred-slot-cre.patch Description: Binary data
RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Wed, Aug 24, 2022 12:25 AM Önder Kalacı wrote: > Hi, > > Thanks for the review! > Thanks for your reply. > > > > > 1. > > In FilterOutNotSuitablePathsForReplIdentFull(), is > > "nonPartialIndexPathList" a > > good name for the list? Indexes on only expressions are also be filtered. > > > > +static List * > > +FilterOutNotSuitablePathsForReplIdentFull(List *pathlist) > > +{ > > + ListCell *lc; > > + List *nonPartialIndexPathList = NIL; > > > > > Yes, true. We only started filtering the non-partial ones first. Now > changed to *suitableIndexList*, does that look right? > That looks ok to me. > > > > 3. > > It looks we should change the comment for FindReplTupleInLocalRel() in this > > patch. > > > > /* > > * Try to find a tuple received from the publication side (in > > 'remoteslot') in > > * the corresponding local relation using either replica identity index, > > * primary key or if needed, sequential scan. > > * > > * Local tuple, if found, is returned in '*localslot'. > > */ > > static bool > > FindReplTupleInLocalRel(EState *estate, Relation localrel, > > > > > I made a small change, just adding "index". Do you expect a larger change? > > I think that's sufficient. > > > > 5. > > + if (!AttributeNumberIsValid(mainattno)) > > + { > > + /* > > +* There are two cases to consider. First, if the > > index is a primary or > > +* unique key, we cannot have any indexes with > > expressions. So, at this > > +* point we are sure that the index we deal is not > > these. > > +*/ > > + Assert(RelationGetReplicaIndex(rel) != > > RelationGetRelid(idxrel) && > > + RelationGetPrimaryKeyIndex(rel) != > > RelationGetRelid(idxrel)); > > + > > + /* > > +* For a non-primary/unique index with an > > expression, we are sure that > > +* the expression cannot be used for replication > > index search. The > > +* reason is that we create relevant index paths > > by providing column > > +* equalities. And, the planner does not pick > > expression indexes via > > +* column equality restrictions in the query. > > +*/ > > + continue; > > + } > > > > Is it possible that it is a usable index with an expression? I think > > indexes > > with an expression has been filtered in > > FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index > > with > > an expression, maybe we shouldn't use "continue" here. > > > > > > Ok, I think there are some confusing comments in the code, which I updated. > Also, added one more explicit Assert to make the code a little more > readable. > > We can support indexes involving expressions but not indexes that are only > consisting of expressions. FilterOutNotSuitablePathsForReplIdentFull() > filters out the latter, see IndexOnlyOnExpression(). > > So, for example, if we have an index as below, we are skipping the > expression while building the index scan keys: > > CREATE INDEX people_names ON people (firstname, lastname, (id || '_' || > sub_id)); > > We can consider removing `continue`, but that'd mean we should also adjust > the following code-block to handle indexprs. To me, that seems like an edge > case to implement at this point, given such an index is probably not > common. Do you think should I try to use the indexprs as well while > building the scan key? > > I'm mostly trying to keep the complexity small. If you suggest this > limitation should be lifted, I can give it a shot. I think the limitation I > leave here is with a single sentence: *The index on the subscriber can only > use simple column references. * > Thanks for your explanation. I get it and think it's OK. > > 6. > > In the following case, I got a result which is different from HEAD, could > > you > > please look into it? > > > > -- publisher > > CREATE TABLE test_replica_id_full (x int); > > ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL; > > CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full; > > > > -- subscriber > > CREATE TABLE test_replica_id_full (x int, y int); > > CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y); > > CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres > > port=5432' PUBLICATION tap_pub_rep_full; > > > > -- publisher > > INSERT INTO test_replica_id_full VALUES (1); > > UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1; > > > > The data in subscriber: > > on HEAD: > > postgres=# select * from test_replica_id_full ; > > x | y > > ---+--- > > 2 | > > (1 row) > > > > After applying the patch: > > postgres=# select * from test_replica_id_full ; > > x | y > > ---+--- > > 1 | > > (1 row) > >
Re: GUC values - recommended way to declare the C variables?
On Fri, Oct 14, 2022 at 8:26 AM Nathan Bossart wrote: > > On Thu, Oct 13, 2022 at 09:47:25AM +0900, Michael Paquier wrote: > > So, the initial values of max_wal_senders and max_replication_slots > > became out of sync with their defaults in guc_tables.c. FWIW, I would > > argue the opposite way: rather than removing the initializations, I > > would fix and keep them as these references can be useful when > > browsing the area of the code related to such GUCs, without having to > > look at guc_tables.c for this information. > > Well, those initializations are only useful when they are kept in sync, > which, as demonstrated by this patch, isn't always the case. I don't have > a terribly strong opinion about this, but I'd lean towards reducing the > number of places that track the default value of GUCs. > I agree if constants are used in both places then there will always be some risk they can get out of sync again. But probably it is no problem to just add #defines (e.g. in logicallauncher.h?) to be commonly used for the C variable declaration and also in the guc_tables. I chose not to do that way only because it didn't seem to be the typical convention for all the other numeric GUCs I looked at, but it's fine by me if that way is preferred -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
On Thu, Oct 13, 2022 at 6:43 PM David Rowley wrote: > On Thu, 13 Oct 2022 at 21:17, Richard Guo wrote: > > > > On Thu, Oct 13, 2022 at 2:48 PM David Rowley > wrote: > >> To stop the planner from adding that final sort, I opted to hack the > >> LimitPath's pathkeys to say that it's already sorted by the > >> PlannerInfo's sort_pathkeys. That feels slightly icky, but it does > >> seem a little wasteful to initialise a sort node on every execution of > >> the plan to sort a single tuple. > > > > > > I don't get how this plan comes out. It seems not correct because Limit > > node above an unsorted path would give us an unpredictable row. > > Actually, you're right. That manual setting of the pathkeys is an > unneeded remanent from a bug I fixed before sending out v2. It can > just be removed. > > I've attached the v3 patch. The v3 patch looks good to me. Thanks Richard
Re: Support tls-exporter as channel binding for TLSv1.3
On Thu, Oct 13, 2022 at 10:30:37AM -0700, Jacob Champion wrote: > Is the intent to backport tls-exporter client support? Or is the > compatibility break otherwise acceptable? Well, I'd rather say yes thanks to the complexity avoided in the backend as that's the most sensible piece security-wise, even if we always require a certificate to exist in PG. A server attempting to trick a client in downgrading would still be a problem :/ tls-exporter would be a new feature, so backporting is out of the picture. > It seemed like there was also some general interest in proxy TLS > termination (see also the PROXY effort, though it has stalled a bit). > For that, I would think tls-server-end-point is an important feature. Oh, okay. That's an argument in favor of not doing that, then. Perhaps we'd better revisit the introduction of tls-exporter once we know more about all that, and it looks like we would need a way to be able to negotiate which channel binding to use (I recall that the surrounding RFCs allowed some extra negotiation, vaguely, but my impression may be wrong). -- Michael signature.asc Description: PGP signature
Re: Incorrect comment regarding command completion tags
On Fri, 14 Oct 2022 at 11:38, Mark Dilger wrote: > > > On Oct 13, 2022, at 2:56 PM, David Rowley wrote: > > In the attached, I rewrote the comment to remove mention of the > > Asserts. I also tried to form the comment in a way that's more > > understandable about why we always write a "0" in "INSERT 0 ". > > Your wording is better. +1 Thanks for having a look. I adjusted the wording slightly as I had written "ancient" in regards to PostgreSQL 11 and earlier. It's probably a bit early to call a supported version of PostgreSQL ancient so I just decided to mention the version number instead. I pushed the resulting patch. Thanks for having a look. David
Re: New "single-call SRF" APIs are very confusingly named
Hi, On 2022-10-14 10:28:34 +0900, Michael Paquier wrote: > On Thu, Oct 13, 2022 at 12:48:20PM -0700, Andres Freund wrote: > > Maybe something like InitMaterializedSRF() w/ > > MAT_SRF_(USE_EXPECTED_DESC|BLESS) > > Or just SetMaterializedFuncCall()? I think starting any function that's not a setter with Set* is very likely to be misunderstood (SetReturning* is clearer, but long). This just reads like you're setting the materialized function call on something. > Do we always have to mention the SRF part of it once we tell about the > materialization part? Yes. The SRF is the important part. > The latter sort implies the former once a function returns multiple tuples. There's lot of other other things that can be materialized. > I don't mind doing some renaming of all that even post-release, though > comes the question of keeping some compabitility macros for > compilation in case one uses these routines? Agreed that we'd need compat. I think it'd need to be compatibility function, not just renaming via macro, so we keep ABI compatibility. Greetings, Andres Freund
Re: New "single-call SRF" APIs are very confusingly named
On Thu, Oct 13, 2022 at 12:48:20PM -0700, Andres Freund wrote: > I unfortunately just noticed this now, just after we released... Thanks for the feedback. > Even leaving the confusion with ExprSingleResult aside, calling it "Single" > still seems very non-descriptive. I assume it's named to contrast with > init_MultiFuncCall() etc. While those are also not named greatly, they're not > typically used directly but wraped in SRF_FIRSTCALL_INIT etc. This is mentioned on the original thread here, as of the point that we go through the function one single time: https://www.postgresql.org/message-id/yh8sbtuzkhq7j...@paquier.xyz > I also quite intensely dislike SRF_SINGLE_USE_EXPECTED. It sounds like it's > saying that a single use of the SRF is expected, but that's not at all what it > means: "use expectedDesc as tupdesc". Okay. Something like the USE_EXPECTED_DESC you are suggesting or USE_EXPECTED_TUPLE_DESC would be fine by me. > I'm also confused by SRF_SINGLE_BLESS - the comment says "validate tuple for > SRF". BlessTupleDesc can't really be described as validation, or am I missing > something? I'd rather keep the flag name to match the history behind this API. How about updating the comment as of "complete tuple descriptor, for a transient RECORD datatype", or something like that? > Maybe something like InitMaterializedSRF() w/ > MAT_SRF_(USE_EXPECTED_DESC|BLESS) Or just SetMaterializedFuncCall()? Do we always have to mention the SRF part of it once we tell about the materialization part? The latter sort implies the former once a function returns multiple tuples. I don't mind doing some renaming of all that even post-release, though comes the question of keeping some compabitility macros for compilation in case one uses these routines? Any existing extension code works out-of-the-box without these new routines, so the chance of somebody using the new stuff outside core sounds rather limited but it does not seem worth taking a risk, either.. And this has been in the tree for a bit more than half a year now. -- Michael signature.asc Description: PGP signature
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
On Thu, Oct 13, 2022 at 6:10 PM Andres Freund wrote: > My point here is a lot more mundane. The code essentially does > _hash_pageinit(), overwriting the whole page, and *then* conditionally > acquires a cleanup lock. It simply is bogus code. I understood that that was what you meant. It's easy to see why this code is broken, but to me it seems related to having too much confidence in what is possible while relying on cleanup locks. That's just my take. -- Peter Geoghegan
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Hi, On 2022-10-13 17:46:25 -0700, Peter Geoghegan wrote: > On Fri, Sep 30, 2022 at 12:05 PM Andres Freund wrote: > > My problem with this approach is that the whole cleanup lock is hugely > > misleading as-is. > > While nbtree VACUUM does use cleanup locks, they don't protect the > index structure itself -- it actually functions as an interlock > against concurrent TID recycling, which might otherwise confuse > in-flight index scans. That's why we need cleanup locks for VACUUM, > but not for index deletions, even though the physical modifications > that are performed to physical leaf pages are identical (the WAL > records are almost identical). Clearly the use of cleanup locks is not > really about protecting the leaf page itself -- it's about using the > physical leaf page as a proxy for the heap TIDs contained therein. A > very narrow protocol with a very specific purpose. > > More generally, cleanup locks exist to protect transient references > that point into a heap page. References held by one backend only. A > TID, or a HeapTuple C pointer, or something similar. Cleanup locks are > not intended to protect a physical data structure in the heap, either > -- just a reference/pointer that points to the structure. There are > implications for the physical page structure itself, of course, but > that seems secondary. The guarantees are often limited to "never allow > the backend holding the pin to become utterly confused". > > I am skeptical of the idea of using cleanup locks for anything more > ambitious than this. Especially in index AM code. It seems > uncomfortably close to "a buffer lock, but somehow also not a buffer > lock". My point here is a lot more mundane. The code essentially does _hash_pageinit(), overwriting the whole page, and *then* conditionally acquires a cleanup lock. It simply is bogus code. Greetings, Andres Freund
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
On Fri, Sep 30, 2022 at 12:05 PM Andres Freund wrote: > My problem with this approach is that the whole cleanup lock is hugely > misleading as-is. While nbtree VACUUM does use cleanup locks, they don't protect the index structure itself -- it actually functions as an interlock against concurrent TID recycling, which might otherwise confuse in-flight index scans. That's why we need cleanup locks for VACUUM, but not for index deletions, even though the physical modifications that are performed to physical leaf pages are identical (the WAL records are almost identical). Clearly the use of cleanup locks is not really about protecting the leaf page itself -- it's about using the physical leaf page as a proxy for the heap TIDs contained therein. A very narrow protocol with a very specific purpose. More generally, cleanup locks exist to protect transient references that point into a heap page. References held by one backend only. A TID, or a HeapTuple C pointer, or something similar. Cleanup locks are not intended to protect a physical data structure in the heap, either -- just a reference/pointer that points to the structure. There are implications for the physical page structure itself, of course, but that seems secondary. The guarantees are often limited to "never allow the backend holding the pin to become utterly confused". I am skeptical of the idea of using cleanup locks for anything more ambitious than this. Especially in index AM code. It seems uncomfortably close to "a buffer lock, but somehow also not a buffer lock". -- Peter Geoghegan
Re: archive modules
On Thu, Oct 13, 2022 at 02:53:38PM -0400, Tom Lane wrote: > Yeah, it really does not work to use GUC hooks to enforce multi-variable > constraints. We've learned that the hard way (more than once, if memory > serves). 414c2fd is one of the most recent ones. Its thread is about the same thing. -- Michael signature.asc Description: PGP signature
Re: problems with making relfilenodes 56-bits
On Wed, 12 Oct 2022 at 23:13, Andres Freund wrote: > > Hi, > > On 2022-10-12 22:05:30 +0200, Matthias van de Meent wrote: > > On Wed, 5 Oct 2022 at 01:50, Andres Freund wrote: > > > I light dusted off my old varint implementation from [1] and converted the > > > RelFileLocator and BlockNumber from fixed width integers to varint ones. > > > This > > > isn't meant as a serious patch, but an experiment to see if this is a path > > > worth pursuing. > > > > > > A run of installcheck in a cluster with autovacuum=off, > > > full_page_writes=off > > > (for increased reproducability) shows a decent saving: > > > > > > master: 241106544 - 230 MB > > > varint: 227858640 - 217 MB > > > > I think a signficant part of this improvement comes from the premise > > of starting with a fresh database. tablespace OID will indeed most > > likely be low, but database OID may very well be linearly distributed > > if concurrent workloads in the cluster include updating (potentially > > unlogged) TOASTed columns and the databases are not created in one > > "big bang" but over the lifetime of the cluster. In that case DBOID > > will consume 5B for a significant fraction of databases (anything with > > OID >=2^28). > > > > My point being: I don't think that we should have different WAL > > performance in databases which is dependent on which OID was assigned > > to that database. > > To me this is raising the bar to an absurd level. Some minor space usage > increase after oid wraparound and for very large block numbers isn't a huge > issue - if you're in that situation you already have a huge amount of wal. I didn't want to block all varlen encoding, I just want to make clear that I don't think it's great for performance testing and consistency across installations if WAL size (and thus part of your performance) is dependent on which actual database/relation/tablespace combination you're running your workload in. With the 56-bit relfilenode, the size of a block reference would realistically differ between 7 bytes and 23 bytes: - tblspc=0=1B db=16386=3B rel=797=2B (787 = 4 * default # of data relations in a fresh DB in PG14, + 1) block=0=1B vs - tsp>=2^28 = 5B dat>=2^28 =5B rel>=2^49 =8B block>=2^28 =5B That's a difference of 16 bytes, of which only the block number can realistically be directly influenced by the user ("just don't have relations larger than X blocks"). If applied to Dilip's pgbench transaction data, that would imply a minimum per transaction wal usage of 509 bytes, and a maximum per transaction wal usage of 609 bytes. That is nearly a 20% difference in WAL size based only on the location of your data, and I'm just not comfortable with that. Users have little or zero control over the internal IDs we assign to these fields, while it would affect performance fairly significantly. (difference % between min/max wal size is unchanged (within 0.1%) after accounting for record alignment) > > 0002 - Rework XLogRecord > > This makes many fields in the xlog header optional, reducing the size > > of many xlog records by several bytes. This implements the design I > > shared in my earlier message [1]. > > > > 0003 - Rework XLogRecordBlockHeader. > > This patch could be applied on current head, and saves some bytes in > > per-block data. It potentially saves some bytes per registered > > block/buffer in the WAL record (max 2 bytes for the first block, after > > that up to 3). See the patch's commit message in the patch for > > detailed information. > > The amount of complexity these two introduce seems quite substantial to > me. Both from an maintenance and a runtime perspective. I think we'd be better > off using building blocks like variable lengths encoded values than open > coding it in many places. I guess that's true for length fields, but I don't think dynamic header field presence (the 0002 rewrite, and the omission of data_length in 0003) is that bad. We already have dynamic data inclusion through block ids 25x; I'm not sure why we couldn't do that more compactly with bitfields as indicators instead (hence the dynamic header size). As for complexity, I think my current patchset is mostly complex due to a lack of tooling. Note that decoding makes common use of COPY_HEADER_FIELD, which we don't really have an equivalent for in XLogRecordAssemble. I think the code for 0002 would improve significantly in readability if such construct would be available. To reduce complexity in 0003, I could drop the 'repeat id' optimization, as that reduces the complexity significantly, at the cost of not saving that 1 byte per registered block after the first. Kind regards, Matthias van de Meent
Re: Incorrect comment regarding command completion tags
> On Oct 13, 2022, at 2:56 PM, David Rowley wrote: > > I was looking at the code in EndCommand() and noticed a comment > talking about some Asserts which didn't seem to exist in the code. > The comment also talks about LastOid which looks like the name of a > variable that's nowhere to be seen. > > It looks like the Asserts did exists when the completion tag patch was > being developed [1] but they disappeared somewhere along the line and > the comment didn't get an update before 2f9661311 went in. > > In the attached, I rewrote the comment to remove mention of the > Asserts. I also tried to form the comment in a way that's more > understandable about why we always write a "0" in "INSERT 0 ". Your wording is better. +1 — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: New docs chapter on Transaction Management and related changes
On Thu, Oct 13, 2022 at 11:54:51PM +0200, Erik Rijkers wrote: > > [xact.diff] > > I think that > 'This chapter explains how the control the reliability of' > > should be: > 'This chapter explains how to control the reliability of' > > > And in these lines: > + together in a transactional manner. The commands PREPARE > + TRANSACTION, COMMIT PREPARED and > + ROLLBACK PREPARED. Two-phase transactions > > 'The commands' > > should be > 'The commands are' Thanks, updated patch attached. You can see the output at: https://momjian.us/tmp/pgsql/ -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 66312b53b8..024a3c5101 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7224,12 +7224,14 @@ local0.*/var/log/postgresql %v - Virtual transaction ID (backendID/localXID) + Virtual transaction ID (backendID/localXID); see + no %x - Transaction ID (0 if none is assigned) + Transaction ID (0 if none is assigned); see + no diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index b030b36002..fdffba4442 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4992,7 +4992,8 @@ WHERE ... xmin and xmax. Transaction identifiers are 32-bit quantities. In some contexts, a 64-bit variant xid8 is used. Unlike xid values, xid8 values increase strictly -monotonically and cannot be reused in the lifetime of a database cluster. +monotonically and cannot be reused in the lifetime of a database +cluster. See for more details. diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index de450cd661..0d6be9a2fa 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -104,6 +104,7 @@ + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index b5a2f94c4e..1e0d587932 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24676,7 +24676,10 @@ SELECT collation for ('foo' COLLATE "de_DE"); Returns the current transaction's ID. It will assign a new one if the current transaction does not have one already (because it has not -performed any database updates). +performed any database updates); see for details. If executed in a +subtransaction this will return the top-level xid; see for details. @@ -24693,6 +24696,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); ID is assigned yet. (It's best to use this variant if the transaction might otherwise be read-only, to avoid unnecessary consumption of an XID.) +If executed in a subtransaction this will return the top-level xid. @@ -24736,6 +24740,8 @@ SELECT collation for ('foo' COLLATE "de_DE"); Returns a current snapshot, a data structure showing which transaction IDs are now in-progress. +Only top-level xids are included in the snapshot; subxids are not +shown; see for details. @@ -24790,7 +24796,8 @@ SELECT collation for ('foo' COLLATE "de_DE"); Is the given transaction ID visible according to this snapshot (that is, was it completed before the snapshot was taken)? Note that this function will not give the correct answer for -a subtransaction ID. +a subtransaction ID (subxid); see for +details. @@ -24802,8 +24809,9 @@ SELECT collation for ('foo' COLLATE "de_DE"); wraps around every 4 billion transactions. However, the functions shown in use a 64-bit type xid8 that does not wrap around during the life -of an installation, and can be converted to xid by casting if -required. The data type pg_snapshot stores information about +of an installation and can be converted to xid by casting if +required; see for details. +The data type pg_snapshot stores information about transaction ID visibility at a particular moment in time. Its components are described in . pg_snapshot's textual representation is @@ -24849,7 +24857,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); xmax and not in this list was already completed at the time of the snapshot, and thus is either visible or dead according to its commit status. This list does not include the transaction IDs of -subtransactions. +subtransactions (subxids). diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml index d6d0a3a814..fe138a47bb 100644
Re: remove redundant memset() call
> On 13 Oct 2022, at 21:59, Bruce Momjian wrote: > > On Thu, Oct 13, 2022 at 09:40:42PM +0200, Daniel Gustafsson wrote: >>> On 13 Oct 2022, at 21:18, Nathan Bossart wrote: >>> >>> On Thu, Oct 13, 2022 at 03:15:13PM -0400, Bruce Momjian wrote: On Thu, Oct 13, 2022 at 12:12:35PM -0700, Zhihong Yu wrote: > the memory has been zero'ed out by palloc0(). > > memcpy is not relevant w.r.t. resetting memory. Ah, good point. >>> >>> Yeah, it looks like this was missed in ca7f8e2. >> >> Agreed, it looks like I missed that one, I can't see any reason to keep it. >> Do >> you want me to take care of it Bruce, and clean up after myself, or are you >> already on it? > > You can do it, thanks. Done now. -- Daniel Gustafsson https://vmware.com/
Re: New docs chapter on Transaction Management and related changes
Op 13-10-2022 om 23:28 schreef Bruce Momjian: On Tue, Sep 13, 2022 at 03:02:34PM +0100, Simon Riggs wrote: Thanks Robert. I've tried to accommodate all of your thoughts, plus Alvaro's. New v5 attached. Happy to receive further comments. I liked this patch very much. It gives details on a lot of the internals we expose to users. Some of my changes were: * tightening the wording * restructuring the flow * splitting out user-visible details (prepared transactions) from internals, e.g., xid, vxid, subtransactions * adding references from places in our docs to these new sections [xact.diff] I think that 'This chapter explains how the control the reliability of' should be: 'This chapter explains how to control the reliability of' And in these lines: + together in a transactional manner. The commands PREPARE + TRANSACTION, COMMIT PREPARED and + ROLLBACK PREPARED. Two-phase transactions 'The commands' should be 'The commands are' thanks, Erik Rijkers I plan to apply this and backpatch it to all supported versions since these details apply to all versions. These docs should enable our users to much better understand and monitor Postgres. Updated patch attached.
Incorrect comment regarding command completion tags
I was looking at the code in EndCommand() and noticed a comment talking about some Asserts which didn't seem to exist in the code. The comment also talks about LastOid which looks like the name of a variable that's nowhere to be seen. It looks like the Asserts did exists when the completion tag patch was being developed [1] but they disappeared somewhere along the line and the comment didn't get an update before 2f9661311 went in. In the attached, I rewrote the comment to remove mention of the Asserts. I also tried to form the comment in a way that's more understandable about why we always write a "0" in "INSERT 0 ". David [1] https://www.postgresql.org/message-id/1c642743-8e46-4246-b4a0-c9a638b3e...@enterprisedb.com diff --git a/src/backend/tcop/dest.c b/src/backend/tcop/dest.c index c952cbea8b..e6934d7b66 100644 --- a/src/backend/tcop/dest.c +++ b/src/backend/tcop/dest.c @@ -179,12 +179,11 @@ EndCommand(const QueryCompletion *qc, CommandDest dest, bool force_undecorated_o * We assume the tagname is plain ASCII and therefore requires no * encoding conversion. * -* We no longer display LastOid, but to preserve the wire -* protocol, we write InvalidOid where the LastOid used to be -* written. -* -* All cases where LastOid was written also write nprocessed -* count, so just Assert that rather than having an extra test. +* In ancient versions of PostgreSQL, INSERT used to include the +* Oid of the inserted record in the completion tag. We no longer +* support tables with Oids, so to maintain compatibility in the +* wire protocol, we write a "0" for InvalidOid in the location +* where we once wrote the inserted record's Oid. */ tag = qc->commandTag; tagname = GetCommandTagName(tag);
Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options
On Wed, Oct 12, 2022 at 5:35 PM David Rowley wrote: > On Wed, 12 Oct 2022 at 16:33, Vik Fearing wrote: > > Per spec, the ROW_NUMBER() window function is not even allowed to have a > > frame specified. > > > > b) The window framing clause of WDX shall not be present. > > > > Also, the specification for ROW_NUMBER() is: > > > > f) ROW_NUMBER() OVER WNS is equivalent to the : > > > > COUNT (*) OVER (WNS1 ROWS UNBOUNDED PRECEDING) > > > > > > So I don't think we need to test for anything at all and can > > indiscriminately add or replace the frame with ROWS UNBOUNDED PRECEDING. > > Thanks for digging that out. > > Just above that I see: > > RANK() OVER WNS is equivalent to: > ( COUNT (*) OVER (WNS1 RANGE UNBOUNDED PRECEDING) > - COUNT (*) OVER (WNS1 RANGE CURRENT ROW) + 1 ) > > and > > DENSE_RANK() OVER WNS is equivalent to the : > COUNT (DISTINCT ROW ( VE1, ..., VEN ) ) > OVER (WNS1 RANGE UNBOUNDED PRECEDING) > > So it looks like the same can be done for rank() and dense_rank() too. > I've added support for those in the attached. > > This also got me thinking that maybe we should be a bit more generic > with the support function node tag name. After looking at the > nodeWindowAgg.c code for a while, I wondered if we might want to add > some optimisations in the future that makes WindowAgg not bother > storing tuples for row_number(), rank() and dense_rank(). That might > save a bit of overhead from the tuple store. I imagined that we'd > want to allow the expansion of this support request so that the > support function could let the planner know if any tuples will be > accessed by the window function or not. The > SupportRequestWFuncOptimizeFrameOpts name didn't seem very fitting for > that so I adjusted it to become SupportRequestOptimizeWindowClause > instead. > > The updated patch is attached. > > David > Hi, + req->frameOptions = (FRAMEOPTION_ROWS | +FRAMEOPTION_START_UNBOUNDED_PRECEDING | +FRAMEOPTION_END_CURRENT_ROW); The bit combination appears multiple times in the patch. Maybe define the combination as a constant in supportnodes.h and reference it in the code. Cheers
Re: New docs chapter on Transaction Management and related changes
On Tue, Sep 13, 2022 at 03:02:34PM +0100, Simon Riggs wrote: > Thanks Robert. I've tried to accommodate all of your thoughts, plus Alvaro's. > > New v5 attached. > > Happy to receive further comments. I liked this patch very much. It gives details on a lot of the internals we expose to users. Some of my changes were: * tightening the wording * restructuring the flow * splitting out user-visible details (prepared transactions) from internals, e.g., xid, vxid, subtransactions * adding references from places in our docs to these new sections I plan to apply this and backpatch it to all supported versions since these details apply to all versions. These docs should enable our users to much better understand and monitor Postgres. Updated patch attached. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 66312b53b8..024a3c5101 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -7224,12 +7224,14 @@ local0.*/var/log/postgresql %v - Virtual transaction ID (backendID/localXID) + Virtual transaction ID (backendID/localXID); see + no %x - Transaction ID (0 if none is assigned) + Transaction ID (0 if none is assigned); see + no diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index b030b36002..fdffba4442 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4992,7 +4992,8 @@ WHERE ... xmin and xmax. Transaction identifiers are 32-bit quantities. In some contexts, a 64-bit variant xid8 is used. Unlike xid values, xid8 values increase strictly -monotonically and cannot be reused in the lifetime of a database cluster. +monotonically and cannot be reused in the lifetime of a database +cluster. See for more details. diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index de450cd661..0d6be9a2fa 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -104,6 +104,7 @@ + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index b5a2f94c4e..1e0d587932 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24676,7 +24676,10 @@ SELECT collation for ('foo' COLLATE "de_DE"); Returns the current transaction's ID. It will assign a new one if the current transaction does not have one already (because it has not -performed any database updates). +performed any database updates); see for details. If executed in a +subtransaction this will return the top-level xid; see for details. @@ -24693,6 +24696,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); ID is assigned yet. (It's best to use this variant if the transaction might otherwise be read-only, to avoid unnecessary consumption of an XID.) +If executed in a subtransaction this will return the top-level xid. @@ -24736,6 +24740,8 @@ SELECT collation for ('foo' COLLATE "de_DE"); Returns a current snapshot, a data structure showing which transaction IDs are now in-progress. +Only top-level xids are included in the snapshot; subxids are not +shown; see for details. @@ -24790,7 +24796,8 @@ SELECT collation for ('foo' COLLATE "de_DE"); Is the given transaction ID visible according to this snapshot (that is, was it completed before the snapshot was taken)? Note that this function will not give the correct answer for -a subtransaction ID. +a subtransaction ID (subxid); see for +details. @@ -24802,8 +24809,9 @@ SELECT collation for ('foo' COLLATE "de_DE"); wraps around every 4 billion transactions. However, the functions shown in use a 64-bit type xid8 that does not wrap around during the life -of an installation, and can be converted to xid by casting if -required. The data type pg_snapshot stores information about +of an installation and can be converted to xid by casting if +required; see for details. +The data type pg_snapshot stores information about transaction ID visibility at a particular moment in time. Its components are described in . pg_snapshot's textual representation is @@ -24849,7 +24857,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); xmax and not in this list was already completed at the time of the snapshot, and thus is either visible or dead according to its commit status. This list does not include the
Re: GUC values - recommended way to declare the C variables?
On Thu, Oct 13, 2022 at 09:47:25AM +0900, Michael Paquier wrote: > So, the initial values of max_wal_senders and max_replication_slots > became out of sync with their defaults in guc_tables.c. FWIW, I would > argue the opposite way: rather than removing the initializations, I > would fix and keep them as these references can be useful when > browsing the area of the code related to such GUCs, without having to > look at guc_tables.c for this information. Well, those initializations are only useful when they are kept in sync, which, as demonstrated by this patch, isn't always the case. I don't have a terribly strong opinion about this, but I'd lean towards reducing the number of places that track the default value of GUCs. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
WIP: Analyze whether our docs need more granular refentries.
In reviewing another patch, I noticed that the documentation had an xref to a fairly large page of documentation (create_table.sgml), and I wondered if that link was chosen because the original author genuinely felt the entire page was relevant, or merely because a more granular link did not exist at the time, and this link had been carried forward since then while the referenced page grew in complexity. In the interest of narrowing the problem down to a manageable size, I wrote a script (attached) to find all xrefs and rank them by criteria[1] that I believe hints at the possibility that the xrefs should be more granular than they are. I intend to use the script output below as a guide for manually reviewing the references and seeing if there are opportunities to guide the reader to the relevant section of those pages. In case anyone is curious, here is a top excerpt of the script output: file_name link_name link_count line_count num_refentries - -- -- -- ref/psql-ref.sgml app-psql 20 52151 ecpg.sgml ecpg-sql-allocate-descriptor 4 10101 17 ref/create_table.sgml sql-createtable 23 24371 ref/select.sgmlsql-select23 22071 ref/create_function.sgml sql-createfunction30 935 1 ref/alter_table.sgml sql-altertable12 17761 ref/pg_dump.sgml app-pgdump11 15451 ref/pg_basebackup.sgml app-pgbasebackup 11 10081 ref/create_type.sgml sql-createtype10 10291 ref/create_index.sgml sql-createindex 9 999 1 ref/postgres-ref.sgml app-postgres 10 845 1 ref/copy.sgml sql-copy 7 10811 ref/create_role.sgml sql-createrole13 511 1 ref/grant.sgml sql-grant 13 507 1 ref/create_foreign_table.sgml sql-createforeigntable14 455 1 ref/insert.sgmlsql-insert8 792 1 ref/pg_ctl-ref.sgmlapp-pg-ctl8 713 1 ref/create_trigger.sgmlsql-createtrigger 7 777 1 ref/set.sgml sql-set 15 332 1 ref/create_aggregate.sgml sql-createaggregate 6 805 1 ref/initdb.sgmlapp-initdb8 588 1 ref/create_policy.sgml sql-createpolicy 7 655 1 dblink.sgmlcontrib-dblink-connect1 213619 ref/create_subscription.sgml sql-createsubscription9 472 1 Some of these will clearly be false positives. For instance, dblink.sgml and ecpg.sgml have a lot of refentries, but they seem to lack a global "top" refentry which I assumed would be there. On the other hand, I have to wonder if the references to psql might be to a specific feature of the tool, and perhaps we can create refentries to those. [1] The criteria is: must be first refentry in file, file must be at least 200 lines long, then rank by lines*references, 2x for referencing the top refentry when others exist xref-analysis.sh Description: application/shellscript
Re: libpq support for NegotiateProtocolVersion
On Thu, Oct 13, 2022 at 10:33:01AM +0200, Peter Eisentraut wrote: > + if (their_version != conn->pversion) Shouldn't this be 'their_version < conn->pversion'? If the server supports a later protocol than what is requested but not all the requested protocol extensions, I think libpq would still report "protocol version not supported." > + appendPQExpBuffer(>errorMessage, > + libpq_gettext("protocol > version not supported by server: client uses %d.%d, server supports %d.%d\n"), > + > PG_PROTOCOL_MAJOR(conn->pversion), PG_PROTOCOL_MINOR(conn->pversion), > + > PG_PROTOCOL_MAJOR(their_version), PG_PROTOCOL_MINOR(their_version)); Should this match the error in postmaster.c and provide the range of versions the server supports? The FATAL in postmaster.c is for the major version, but I believe the same information is relevant when a NegotiateProtocolVersion message is sent. ereport(FATAL, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("unsupported frontend protocol %u.%u: server supports %u.0 to %u.%u", > + else > + appendPQExpBuffer(>errorMessage, > + libpq_gettext("protocol > extension not supported by server: %s\n"), buf.data); nitpick: s/extension/extensions What if neither the protocol version nor the requested extensions are supported? Right now, I think only the unsupported protocol version is supported in that case, but presumably we could report both pretty easily. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Hi, On 2022-10-06 12:44:24 +0530, Amit Kapila wrote: > On Sat, Oct 1, 2022 at 12:35 AM Andres Freund wrote: > > My problem with this approach is that the whole cleanup lock is hugely > > misleading as-is. As I noted in > > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de > > we take the cleanup lock *after* re-initializing the page. Thereby > > completely breaking the properties that a cleanup lock normally tries to > > guarantee. > > > > Even if that were to achieve something useful (doubtful in this case), > > it'd need a huge comment explaining what's going on. > > > > Attached are two patches. The first patch is what Robert has proposed > with some changes in comments to emphasize the fact that cleanup lock > on the new bucket is just to be consistent with the old bucket page > locking as we are initializing it just before checking for cleanup > lock. In the second patch, I removed the acquisition of cleanup lock > on the new bucket page and changed the comments/README accordingly. > > I think we can backpatch the first patch and the second patch can be > just a HEAD-only patch. Does that sound reasonable to you? Not particularly, no. I don't understand how "overwrite a page and then get a cleanup lock" can sensibly be described by this comment: > +++ b/src/backend/access/hash/hashpage.c > @@ -807,7 +807,8 @@ restart_expand: >* before changing the metapage's mapping info, in case we can't get the >* disk space. Ideally, we don't need to check for cleanup lock on new >* bucket as no other backend could find this bucket unless meta page is > - * updated. However, it is good to be consistent with old bucket > locking. > + * updated and we initialize the page just before it. However, it is > just > + * to be consistent with old bucket locking. >*/ > buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM); > if (!IsBufferCleanupOK(buf_nblkno)) This is basically saying "I am breaking basic rules of locking just to be consistent", no? Greetings, Andres Freund
Re: remove redundant memset() call
On Thu, Oct 13, 2022 at 09:40:42PM +0200, Daniel Gustafsson wrote: > > On 13 Oct 2022, at 21:18, Nathan Bossart wrote: > > > > On Thu, Oct 13, 2022 at 03:15:13PM -0400, Bruce Momjian wrote: > >> On Thu, Oct 13, 2022 at 12:12:35PM -0700, Zhihong Yu wrote: > >>> the memory has been zero'ed out by palloc0(). > >>> > >>> memcpy is not relevant w.r.t. resetting memory. > >> > >> Ah, good point. > > > > Yeah, it looks like this was missed in ca7f8e2. > > Agreed, it looks like I missed that one, I can't see any reason to keep it. > Do > you want me to take care of it Bruce, and clean up after myself, or are you > already on it? You can do it, thanks. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: Lack of PageSetLSN in heap_xlog_visible
On Thu, 2022-10-13 at 12:50 +0300, Konstantin Knizhnik wrote: > /* > * We don't bump the LSN of the heap page when setting the > visibility > * map bit (unless checksums or wal_hint_bits is enabled, in > which > * case we must), because that would generate an unworkable > volume of > * full-page writes. It clearly says there that it must set the page LSN, but I don't see where that's happening. It seems to go all the way back to the original checksums commit, 96ef3b8ff1. I can reproduce a case where a replica ends up with a different page header than the primary (checksums enabled): Primary: create extension pageinspect; create table t(i int) with (autovacuum_enabled=off); insert into t values(0); Shut down and restart primary and replica. Primary: insert into t values(1); vacuum t; Crash replica and let it recover. Shut down and restart primary and replica. Primary: select * from page_header(get_raw_page('t', 0)); Replica: select * from page_header(get_raw_page('t', 0)); The LSN on the replica is lower, but the flags are the same (PD_ALL_VISIBLE set). That's a problem, right? The checksums are valid on both, though. It may violate our torn page protections for checksums, as well, but I couldn't construct a scenario for that because recovery can only create restartpoints at certain times. > But it still not clear for me that not bumping LSN in this place is > correct if wal_log_hints is set. > In this case we will have VM page with larger LSN than heap page, > because visibilitymap_set > bumps LSN of VM page. It means that in theory after recovery we may > have > page marked as all-visible in VM, > but not having PD_ALL_VISIBLE in page header. And it violates VM > constraint: I'm not quite following this scenario. If the heap page has a lower LSN than the VM page, how could we recover to a point where the VM bit is set but the heap flag isn't? And what does it have to do with wal_log_hints/checksums? -- Jeff Davis PostgreSQL Contributor Team - AWS
New "single-call SRF" APIs are very confusingly named
Hi, I unfortunately just noticed this now, just after we released... In commit 9e98583898c347e007958c8a09911be2ea4acfb9 Author: Michael Paquier Date: 2022-03-07 10:26:29 +0900 Create routine able to set single-call SRFs for Materialize mode a new helper was added: #define SRF_SINGLE_USE_EXPECTED 0x01/* use expectedDesc as tupdesc */ #define SRF_SINGLE_BLESS0x02/* validate tuple for SRF */ extern void SetSingleFuncCall(FunctionCallInfo fcinfo, bits32 flags); I think the naming here is very poor. For one, "Single" here conflicts with ExprSingleResult which indicates "expression does not return a set", i.e. precisely the opposite what SetSingleFuncCall() is used for. For another the "Set" in SetSingleFuncCall makes it sound like it's function setting a property. Even leaving the confusion with ExprSingleResult aside, calling it "Single" still seems very non-descriptive. I assume it's named to contrast with init_MultiFuncCall() etc. While those are also not named greatly, they're not typically used directly but wraped in SRF_FIRSTCALL_INIT etc. I also quite intensely dislike SRF_SINGLE_USE_EXPECTED. It sounds like it's saying that a single use of the SRF is expected, but that's not at all what it means: "use expectedDesc as tupdesc". I'm also confused by SRF_SINGLE_BLESS - the comment says "validate tuple for SRF". BlessTupleDesc can't really be described as validation, or am I missing something? This IMO needs to be cleaned up. Maybe something like InitMaterializedSRF() w/ MAT_SRF_(USE_EXPECTED_DESC|BLESS) Greetings, Andres Freund
Re: remove redundant memset() call
> On 13 Oct 2022, at 21:18, Nathan Bossart wrote: > > On Thu, Oct 13, 2022 at 03:15:13PM -0400, Bruce Momjian wrote: >> On Thu, Oct 13, 2022 at 12:12:35PM -0700, Zhihong Yu wrote: >>> the memory has been zero'ed out by palloc0(). >>> >>> memcpy is not relevant w.r.t. resetting memory. >> >> Ah, good point. > > Yeah, it looks like this was missed in ca7f8e2. Agreed, it looks like I missed that one, I can't see any reason to keep it. Do you want me to take care of it Bruce, and clean up after myself, or are you already on it? -- Daniel Gustafsson https://vmware.com/
Re: remove redundant memset() call
On Thu, Oct 13, 2022 at 03:15:13PM -0400, Bruce Momjian wrote: > On Thu, Oct 13, 2022 at 12:12:35PM -0700, Zhihong Yu wrote: >> the memory has been zero'ed out by palloc0(). >> >> memcpy is not relevant w.r.t. resetting memory. > > Ah, good point. Yeah, it looks like this was missed in ca7f8e2. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: remove redundant memset() call
On Thu, Oct 13, 2022 at 12:12:35PM -0700, Zhihong Yu wrote: > On Thu, Oct 13, 2022 at 12:10 PM Bruce Momjian wrote: > > On Thu, Oct 13, 2022 at 10:55:08AM -0700, Zhihong Yu wrote: > > Hi, > > I was looking at combo_init in contrib/pgcrypto/px.c . > > > > There is a memset() call following palloc0() - the call is redundant. > > > > Please see the patch for the proposed change. > > > > Thanks > > > diff --git a/contrib/pgcrypto/px.c b/contrib/pgcrypto/px.c > > index 3b098c6151..d35ccca777 100644 > > --- a/contrib/pgcrypto/px.c > > +++ b/contrib/pgcrypto/px.c > > @@ -203,7 +203,6 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned > klen, > > if (klen > ks) > > klen = ks; > > keybuf = palloc0(ks); > > - memset(keybuf, 0, ks); > > memcpy(keybuf, key, klen); > > > > err = px_cipher_init(c, keybuf, klen, ivbuf); > > Uh, the memset() is ks length but the memcpy() is klen, and the above > test allows ks to be larger than klen. > > -- > Bruce Momjian https://momjian.us > EDB https://enterprisedb.com > > Indecision is a decision. Inaction is an action. Mark Batterson > > > Hi, > the memory has been zero'ed out by palloc0(). > > memcpy is not relevant w.r.t. resetting memory. Ah, good point. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: remove redundant memset() call
On Thu, Oct 13, 2022 at 12:10 PM Bruce Momjian wrote: > On Thu, Oct 13, 2022 at 10:55:08AM -0700, Zhihong Yu wrote: > > Hi, > > I was looking at combo_init in contrib/pgcrypto/px.c . > > > > There is a memset() call following palloc0() - the call is redundant. > > > > Please see the patch for the proposed change. > > > > Thanks > > > diff --git a/contrib/pgcrypto/px.c b/contrib/pgcrypto/px.c > > index 3b098c6151..d35ccca777 100644 > > --- a/contrib/pgcrypto/px.c > > +++ b/contrib/pgcrypto/px.c > > @@ -203,7 +203,6 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned > klen, > > if (klen > ks) > > klen = ks; > > keybuf = palloc0(ks); > > - memset(keybuf, 0, ks); > > memcpy(keybuf, key, klen); > > > > err = px_cipher_init(c, keybuf, klen, ivbuf); > > Uh, the memset() is ks length but the memcpy() is klen, and the above > test allows ks to be larger than klen. > > -- > Bruce Momjian https://momjian.us > EDB https://enterprisedb.com > > Indecision is a decision. Inaction is an action. Mark Batterson > > Hi, the memory has been zero'ed out by palloc0(). memcpy is not relevant w.r.t. resetting memory. Cheers
Re: remove redundant memset() call
On Thu, Oct 13, 2022 at 10:55:08AM -0700, Zhihong Yu wrote: > Hi, > I was looking at combo_init in contrib/pgcrypto/px.c . > > There is a memset() call following palloc0() - the call is redundant. > > Please see the patch for the proposed change. > > Thanks > diff --git a/contrib/pgcrypto/px.c b/contrib/pgcrypto/px.c > index 3b098c6151..d35ccca777 100644 > --- a/contrib/pgcrypto/px.c > +++ b/contrib/pgcrypto/px.c > @@ -203,7 +203,6 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned klen, > if (klen > ks) > klen = ks; > keybuf = palloc0(ks); > - memset(keybuf, 0, ks); > memcpy(keybuf, key, klen); > > err = px_cipher_init(c, keybuf, klen, ivbuf); Uh, the memset() is ks length but the memcpy() is klen, and the above test allows ks to be larger than klen. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: Suppressing useless wakeups in walreceiver
On Thu, Oct 13, 2022 at 12:37:39PM +0200, Alvaro Herrera wrote: > I think in 0001 we should put more stuff in the state struct -- > specifically these globals: > > static intrecvFile = -1; > static TimeLineID recvFileTLI = 0; > static XLogSegNo recvSegNo = 0; > > The main reason is that it seems odd to have startpointTLI in the struct > used in some places together with a file-global recvFileTLI which isn't. > The way one is passed as argument and the other as part of a struct > seemed too distracting. This should reduce the number of moving parts, > ISTM. Makes sense. Do you think the struct should be file-global so that it doesn't need to be provided as an argument to most of the static functions in this file? > One thing that confused me for a moment is that we have some state in > walrcv and some more state in 'state'. The difference is pretty obvious > once you look at the other, but it suggest to me that a better variable > name for the latter is 'localstate' to more obviously distinguish them. Sure, I'll change it to 'localstate'. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: archive modules
Nathan Bossart writes: > On Thu, Oct 13, 2022 at 03:25:27PM +0530, Bharath Rupireddy wrote: >> The intent here looks reasonable to me. However, why should the user >> be able to set both archive_command and archive_library in the first >> place only to later fail in LoadArchiveLibrary() per the patch? IMO, >> the check_hook() is the right way to disallow any sorts of GUC >> misconfigurations, no? > There was some discussion upthread about using the GUC hooks to enforce > this [0]. In general, it doesn't seem to be a recommended practice. Yeah, it really does not work to use GUC hooks to enforce multi-variable constraints. We've learned that the hard way (more than once, if memory serves). regards, tom lane
Re: PG upgrade 14->15 fails - database contains our own extension
I wrote: > We might be able to put in some kluge in pg_dump to make it less > likely to fail with existing DBs, but I think the true fix lies > in adding that dependency. Here's a draft patch for that. I'm unsure whether it's worth back-patching; even if we did, we couldn't guarantee that existing databases would have the additional pg_depend entries. If we do only put it in HEAD, maybe we should break compatibility to the extent of changing IsBinaryCoercible's API rather than inventing a separate call. I'm still not excited about recording additional dependencies elsewhere, but that path would leave us with cleaner code if we eventually do that. regards, tom lane diff --git a/src/backend/catalog/pg_cast.c b/src/backend/catalog/pg_cast.c index 1812bb7fcc..b50a56954d 100644 --- a/src/backend/catalog/pg_cast.c +++ b/src/backend/catalog/pg_cast.c @@ -35,13 +35,20 @@ * Caller must have already checked privileges, and done consistency * checks on the given datatypes and cast function (if applicable). * + * Since we allow binary coercibility of the datatypes to the cast + * function's input and result, there could be one or two WITHOUT FUNCTION + * casts that this one depends on. We don't record that explicitly + * in pg_cast, but we still need to make dependencies on those casts. + * * 'behavior' indicates the types of the dependencies that the new - * cast will have on its input and output types and the cast function. + * cast will have on its input and output types, the cast function, + * and the other casts if any. * */ ObjectAddress -CastCreate(Oid sourcetypeid, Oid targettypeid, Oid funcid, char castcontext, - char castmethod, DependencyType behavior) +CastCreate(Oid sourcetypeid, Oid targettypeid, + Oid funcid, Oid incastid, Oid outcastid, + char castcontext, char castmethod, DependencyType behavior) { Relation relation; HeapTuple tuple; @@ -102,6 +109,18 @@ CastCreate(Oid sourcetypeid, Oid targettypeid, Oid funcid, char castcontext, add_exact_object_address(, addrs); } + /* dependencies on casts required for function */ + if (OidIsValid(incastid)) + { + ObjectAddressSet(referenced, CastRelationId, incastid); + add_exact_object_address(, addrs); + } + if (OidIsValid(outcastid)) + { + ObjectAddressSet(referenced, CastRelationId, outcastid); + add_exact_object_address(, addrs); + } + record_object_address_dependencies(, addrs, behavior); free_object_addresses(addrs); diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c index e6fcfc23b9..1f820c93e9 100644 --- a/src/backend/commands/functioncmds.c +++ b/src/backend/commands/functioncmds.c @@ -1526,6 +1526,8 @@ CreateCast(CreateCastStmt *stmt) char sourcetyptype; char targettyptype; Oid funcid; + Oid incastid = InvalidOid; + Oid outcastid = InvalidOid; int nargs; char castcontext; char castmethod; @@ -1603,7 +1605,9 @@ CreateCast(CreateCastStmt *stmt) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("cast function must take one to three arguments"))); - if (!IsBinaryCoercible(sourcetypeid, procstruct->proargtypes.values[0])) + if (!IsBinaryCoercibleWithCast(sourcetypeid, + procstruct->proargtypes.values[0], + )) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("argument of cast function must match or be binary-coercible from source data type"))); @@ -1617,7 +1621,9 @@ CreateCast(CreateCastStmt *stmt) (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("third argument of cast function must be type %s", "boolean"))); - if (!IsBinaryCoercible(procstruct->prorettype, targettypeid)) + if (!IsBinaryCoercibleWithCast(procstruct->prorettype, + targettypeid, + )) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("return data type of cast function must match or be binary-coercible to target data type"))); @@ -1756,8 +1762,8 @@ CreateCast(CreateCastStmt *stmt) break; } - myself = CastCreate(sourcetypeid, targettypeid, funcid, castcontext, - castmethod, DEPENDENCY_NORMAL); + myself = CastCreate(sourcetypeid, targettypeid, funcid, incastid, outcastid, + castcontext, castmethod, DEPENDENCY_NORMAL); return myself; } diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 33b64fd279..b7c3dded17 100644 --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -1705,7 +1705,9 @@ DefineRange(ParseState *pstate, CreateRangeStmt *stmt) ); /* Create cast from the range type to its multirange type */ - CastCreate(typoid, multirangeOid, castFuncOid, 'e', 'f', DEPENDENCY_INTERNAL); + CastCreate(typoid, multirangeOid, castFuncOid, InvalidOid, InvalidOid, + COERCION_CODE_EXPLICIT, COERCION_METHOD_FUNCTION, +
Re: proposal: possibility to read dumped table's name from file
Hi, On 2022-10-07 07:26:08 +0200, Pavel Stehule wrote: > I am sending version with handy written parser and meson support Given this is a new approach it seems inaccurate to have the CF entry marked ready-for-committer. I've updated it to needs-review. Greetings, Andres Freund
Re: archive modules
On Thu, Oct 13, 2022 at 03:25:27PM +0530, Bharath Rupireddy wrote: > The intent here looks reasonable to me. However, why should the user > be able to set both archive_command and archive_library in the first > place only to later fail in LoadArchiveLibrary() per the patch? IMO, > the check_hook() is the right way to disallow any sorts of GUC > misconfigurations, no? There was some discussion upthread about using the GUC hooks to enforce this [0]. In general, it doesn't seem to be a recommended practice. One basic example of the problems with this approach is the following: 1. Set archive_command and leave archive_library unset and restart the server. 2. Unset archive_command and set archive_library and call 'pg_ctl reload'. After these steps, you'll see the following log messages: 2022-10-13 10:58:42.112 PDT [1562524] LOG: received SIGHUP, reloading configuration files 2022-10-13 10:58:42.114 PDT [1562524] LOG: cannot set "archive_library" when "archive_command" is specified 2022-10-13 10:58:42.114 PDT [1562524] DETAIL: Only one of "archive_library" or "archive_command" can be specified. 2022-10-13 10:58:42.114 PDT [1562524] LOG: parameter "archive_command" changed to "" 2022-10-13 10:58:42.114 PDT [1562524] LOG: configuration file "/home/nathan/pgdata/postgresql.conf" contains errors; unaffected changes were applied [0] https://postgr.es/m/20220914200305.GA2984249%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
remove redundant memset() call
Hi, I was looking at combo_init in contrib/pgcrypto/px.c . There is a memset() call following palloc0() - the call is redundant. Please see the patch for the proposed change. Thanks remove-redundant-memset-call.patch Description: Binary data
Re: Suppressing useless wakeups in walreceiver
On Thu, Oct 13, 2022 at 03:35:14PM +0530, Bharath Rupireddy wrote: > I think the hot standby feedback message gets sent too frequently > without honouring the wal_receiver_status_interval because the 'now' > is actually not the current time with your patch but 'now + > XLogWalRcvSendReply()'s time'. However, it's possible that I may be > wrong here. Is your concern that the next wakeup time will be miscalculated because of a stale value in 'now'? I think that's existing behavior, as we save the current time before sending the message in XLogWalRcvSendReply and XLogWalRcvSendHSFeedback. The only way to completely avoid this would be to call GetCurrentTimestamp() after the message has been sent and to use that value to calculate when the next message should be sent. However, I am skeptical there's really any practical impact. Is it really a problem if a message that should be sent every 30 seconds is sent at an interval of 29.9 seconds sometimes? I think it could go the other way, too. If 'now' is stale, we might decide not to send a feedback message until the next loop iteration, so the interval might be 30.1 seconds. I don't think it's realistic to expect complete accuracy here. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Support tls-exporter as channel binding for TLSv1.3
On Wed, Oct 12, 2022 at 11:01 PM Michael Paquier wrote: > One thing that would reduce the complexity of the equation is > to drop support for tls-server-end-point in the backend in PG >= 16 as > the versions of OpenSSL we still support on HEAD would cover > completely tls-exporter. Is the intent to backport tls-exporter client support? Or is the compatibility break otherwise acceptable? It seemed like there was also some general interest in proxy TLS termination (see also the PROXY effort, though it has stalled a bit). For that, I would think tls-server-end-point is an important feature. --Jacob
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Mon, Oct 10, 2022 at 7:43 PM Maciek Sakrejda wrote: > > Thanks for working on this! Like Lukas, I'm excited to see more > visibility into important parts of the system like this. Thanks for taking another look! > > On Mon, Oct 10, 2022 at 11:49 AM Melanie Plageman > wrote: > > > > I've gone ahead and implemented option 1 (commented below). > > No strong opinion on 1 versus 2, but I guess at least partly because I > don't understand the implications (I do understand the difference, > just not when it might be important in terms of stats). Can we think > of a situation where combining stats about initial additions with > pinned additions hides some behavior that might be good to understand > and hard to pinpoint otherwise? I think that it makes sense to count both the initial buffers added to the ring and subsequent shared buffers added to the ring (either when the current strategy buffer is pinned or in use or when a bulkread rejects dirty strategy buffers in favor of new shared buffers) as strategy clocksweeps because of how the statistic would be used. Clocksweeps give you an idea of how much of your working set is cached (setting aside initially reading data into shared buffers when you are warming up the db). You may use clocksweeps to determine if you need to make shared buffers larger. Distinguishing strategy buffer clocksweeps from shared buffer clocksweeps allows us to avoid enlarging shared buffers if most of the clocksweeps are to bring in blocks for the strategy operation. However, I could see an argument that discounting strategy clocksweeps done because the current strategy buffer is pinned makes the number of shared buffer clocksweeps artificially low since those other queries using the buffer would have suffered a cache miss were it not for the strategy. And, in this case, you would take strategy clocksweeps together with shared clocksweeps to make your decision. And if we include buffers initially added to the strategy ring in the strategy clocksweep statistic, this number may be off because those blocks may not be needed in the main shared working set. But you won't know that until you try to reuse the buffer and it is pinned. So, I think we don't have a better option than counting initial buffers added to the ring as strategy clocksweeps (as opposed to as reuses). So, in answer to your question, no, I cannot think of a scenario like that. Sitting down and thinking about that for a long time did, however, help me realize that some of my code comments were misleading (and some incorrect). I will update these in the next version once we agree on updated docs. It also made me remember that I am incorrectly counting rejected buffers as reused. I'm not sure if it is a good idea to subtract from reuses when a buffer is rejected. Waiting until after it is rejected to count the reuse will take some other code changes. Perhaps we could also count rejections in the stats? > > I took a look at the latest docs (as someone mostly familiar with > internals at only a pretty high level, so probably somewhat close to > the target audience) and have some feedback. > > + > + role="column_definition"> > + backend_type text > + > + > + Type of backend (e.g. background worker, autovacuum worker). > + > + > > Not critical, but is there a list of backend types we could > cross-reference elsewhere in the docs? The most I could find was this longer explanation (with exhaustive list of types) in pg_stat_activity docs [1]. I could duplicate what it says or I could link to the view and say "see pg_stat_activity" for a description of backend_type" or something like that (to keep them from getting out of sync as new backend_types are added. I suppose I could also add docs on backend_types, but I'm not sure where something like that would go. > > From the io_context column description: > > + The autovacuum daemon, explicit VACUUM, > explicit > + ANALYZE, many bulk reads, and many bulk > writes use a > + fixed amount of memory, acquiring the equivalent number of > shared > + buffers and reusing them circularly to avoid occupying an > undue portion > + of the main shared buffer pool. > + > > I don't understand how this is relevant to the io_context column. > Could you expand on that, or am I just missing something obvious? > I'm trying to explain why those other IO Contexts exist (bulkread, bulkwrite, vacuum) and why they are separate from shared buffers. Should I cut it altogether or preface it with something like: these are counted separate from shared buffers because...? > + > + role="column_definition"> > + extended bigint > + > + > + Extends of relations done by this > backend_type in > + order to write data in this io_context. > + > + > > I understand what this is, but not why this is something I might want > to know about. Unlike writes, backends largely have to
Re: Summary function for pg_buffercache
Hi, On 2022-10-12 12:27:54 -0700, Andres Freund wrote: > I intentionally put my changes into a fixup commit, in case you want to look > at the differences. I pushed the (combined) patch now. Thanks for your contribution! Greetings, Andres Freund
Re: [RFC] building postgres with meson - v13
Hi, On 2022-10-13 09:24:51 +, shiy.f...@fujitsu.com wrote: > I noticed that `pg_config --configure` didn't show the options given when > building with meson. Yes, that was noted somewhere on this thread. > Maybe it would be better if pg_config can output this information, to be > consistent with the output when building with `./configure` and `make`. > > The output when building with `./configure` and `make`: > $ pg_config --configure > '--prefix=/home/postgres/install/' '--cache' 'gcc.cache' '--enable-dtrace' > '--with-icu' '--enable-cassert' It'd be a fair amount of work, both initially and to maintain it, to generate something compatible. I can see some benefit in showing some feature influencing output in --configure, but compatible output doesn't seem worth it to me. Greetings, Andres Freund
Re: Bloom filter Pushdown Optimization for Merge Join
On Thu, Oct 13, 2022 at 7:30 AM Zhihong Yu wrote: > > > On Wed, Oct 12, 2022 at 4:35 PM Zhihong Yu wrote: > >> >> >> On Wed, Oct 12, 2022 at 3:36 PM Lyu Pan wrote: >> >>> Hello Zhihong Yu & Tomas Vondra, >>> >>> Thank you so much for your review and feedback! >>> >>> We made some updates based on previous feedback and attached the new >>> patch set. Due to time constraints, we didn't get to resolve all the >>> comments, and we'll continue to improve this patch. >>> >>> > In this prototype, the cost model is based on an assumption that there >>> is >>> > a linear relationship between the performance gain from using a >>> semijoin >>> > filter and the estimated filtering rate: >>> > % improvement to Merge Join cost = 0.83 * estimated filtering rate - >>> 0.137. >>> > >>> > How were the coefficients (0.83 and 0.137) determined ? >>> > I guess they were based on the results of running certain workload. >>> >>> Right, the coefficients (0.83 and 0.137) determined are based on some >>> preliminary testings. The current costing model is pretty naive and >>> we'll work on a more robust costing model in future work. >>> >>> >>> > I agree, in principle, although I think the current logic / formula is >>> a >>> > bit too crude and fitted to the simple data used in the test. I think >>> > this needs to be formulated as a regular costing issue, considering >>> > stuff like cost of the hash functions, and so on. >>> > >>> > I think this needs to do two things: >>> > >>> > 1) estimate the cost of building the bloom filter - This shall depend >>> on >>> > the number of rows in the inner relation, number/cost of the hash >>> > functions (which may be higher for some data types), etc. >>> > >>> > 2) estimate improvement for the probing branch - Essentially, we need >>> to >>> > estimate how much we save by filtering some of the rows, but this also >>> > neeeds to include the cost of probing the bloom filter. >>> > >>> > This will probably require some improvements to the lib/bloomfilter, in >>> > order to estimate the false positive rate - this may matter a lot for >>> > large data sets and small work_mem values. The bloomfilter library >>> > simply reduces the size of the bloom filter, which increases the false >>> > positive rate. At some point it'll start reducing the benefit. >>> > >>> >>> These suggestions make a lot of sense. The current costing model is >>> definitely not good enough, and we plan to work on a more robust >>> costing model as we continue to improve the patch. >>> >>> >>> > OK. Could also build the bloom filter in shared memory? >>> > >>> >>> We thought about this approach but didn't prefer this one because if >>> all worker processes share the same bloom filter in shared memory, we >>> need to frequently lock and unlock the bloom filter to avoid race >>> conditions. So we decided to have each worker process create its own >>> bloom filter. >>> >>> >>> > IMHO we shouldn't make too many conclusions from these examples. Yes, >>> it >>> > shows merge join can be improved, but for cases where a hashjoin works >>> > better so we wouldn't use merge join anyway. >>> > >>> > I think we should try constructing examples where either merge join >>> wins >>> > already (and gets further improved by the bloom filter), or would lose >>> > to hash join and the bloom filter improves it enough to win. >>> > >>> > AFAICS that requires a join of two large tables - large enough that >>> hash >>> > join would need to be batched, or pre-sorted inputs (which eliminates >>> > the explicit Sort, which is the main cost in most cases). >>> > >>> > The current patch only works with sequential scans, which eliminates >>> the >>> > second (pre-sorted) option. So let's try the first one - can we invent >>> > an example with a join of two large tables where a merge join would >>> win? >>> > >>> > Can we find such example in existing benchmarks like TPC-H/TPC-DS. >>> > >>> >>> Agreed. The current examples are only intended to show us that using >>> bloom filters in merge join could improve the merge join performance >>> in some cases. We are working on testing more examples that merge join >>> with bloom filter could out-perform hash join, which should be more >>> persuasive. >>> >>> >>> > The bloom filter is built by the first seqscan (on t0), and then used >>> by >>> > the second seqscan (on t1). But this only works because we always run >>> > the t0 scan to completion (because we're feeding it into Sort) before >>> we >>> > start scanning t1. >>> > >>> > But when the scan on t1 switches to an index scan, it's over - we'd be >>> > building the filter without being able to probe it, and when we finish >>> > building it we no longer need it. So this seems pretty futile. >>> > >>> > It might still improve plans like >>> > >>> >-> Merge Join >>> > Merge Cond: (t0.c1 = t1.c1) >>> > SemiJoin Filter Created Based on: (t0.c1 = t1.c1) >>> > SemiJoin Estimated Filtering Rate: 1. >>> > -> Sort
Is anybody planning to release pglogical for v15 ?
So, is anybody planning to release pglogical for v15 ? There are still a few things that one can do in pglogical but not in native / built0in replication ... Best Regards Hannu
Re: PG upgrade 14->15 fails - database contains our own extension
I wrote: > Hmm ... I think it's a very ancient bug that somehow David has avoided > tripping over up to now. Looking closer, I don't see how b55f2b692 could have changed pg_dump's opinion of the order to sort these three casts in; that sort ordering logic is old enough to vote. So I'm guessing that in fact this *never* worked. Perhaps this extension has never been through pg_upgrade before, or at least not with these casts? > We might be able to put in some kluge in pg_dump to make it less > likely to fail with existing DBs, but I think the true fix lies > in adding that dependency. I don't see any painless way to fix this in pg_dump, and I'm inclined not to bother trying if it's not a regression. Better to spend the effort on the backend-side fix. On the backend side, really anyplace that we consult IsBinaryCoercible during DDL is at hazard. While there aren't a huge number of such places, there's certainly more than just CreateCast. I'm trying to decide how much trouble it's worth going to there. I could be wrong, but I think that only the cast-vs-cast case is really likely to be problematic for pg_dump, given that it dumps casts pretty early now. So it might be sufficient to fix that one case. regards, tom lane
Re: PG upgrade 14->15 fails - database contains our own extension
Robert Haas writes: > CREATE CAST (integer AS public.lbuid) WITH FUNCTION > pg_catalog.int8(integer) AS IMPLICIT; > CREATE CAST (bigint AS public.lbuid) WITHOUT FUNCTION AS IMPLICIT; > CREATE CAST (public.lbuid AS bigint) WITHOUT FUNCTION AS IMPLICIT; > That's not a valid dump ordering, and if I drop the casts and try to > recreate them that way, it fails in the same way you saw. > My guess is that this is a bug in Tom's commit > b55f2b6926556115155930c4b2d006c173f45e65, "Adjust pg_dump's priority > ordering for casts." Hmm ... I think it's a very ancient bug that somehow David has avoided tripping over up to now. Namely, that we require the bigint->lbuid implicit cast to exist in order to make that WITH FUNCTION cast, but we fail to record it as a dependency during CastCreate. So pg_dump is flying blind as to the required restoration order, and if it ever worked, you were just lucky. We might be able to put in some kluge in pg_dump to make it less likely to fail with existing DBs, but I think the true fix lies in adding that dependency. (I'm pretty skeptical about it being a good idea to have a set of casts like this, but I don't suppose pg_dump is chartered to editorialize on that.) regards, tom lane
Re: Bloom filter Pushdown Optimization for Merge Join
On Wed, Oct 12, 2022 at 4:35 PM Zhihong Yu wrote: > > > On Wed, Oct 12, 2022 at 3:36 PM Lyu Pan wrote: > >> Hello Zhihong Yu & Tomas Vondra, >> >> Thank you so much for your review and feedback! >> >> We made some updates based on previous feedback and attached the new >> patch set. Due to time constraints, we didn't get to resolve all the >> comments, and we'll continue to improve this patch. >> >> > In this prototype, the cost model is based on an assumption that there >> is >> > a linear relationship between the performance gain from using a semijoin >> > filter and the estimated filtering rate: >> > % improvement to Merge Join cost = 0.83 * estimated filtering rate - >> 0.137. >> > >> > How were the coefficients (0.83 and 0.137) determined ? >> > I guess they were based on the results of running certain workload. >> >> Right, the coefficients (0.83 and 0.137) determined are based on some >> preliminary testings. The current costing model is pretty naive and >> we'll work on a more robust costing model in future work. >> >> >> > I agree, in principle, although I think the current logic / formula is a >> > bit too crude and fitted to the simple data used in the test. I think >> > this needs to be formulated as a regular costing issue, considering >> > stuff like cost of the hash functions, and so on. >> > >> > I think this needs to do two things: >> > >> > 1) estimate the cost of building the bloom filter - This shall depend on >> > the number of rows in the inner relation, number/cost of the hash >> > functions (which may be higher for some data types), etc. >> > >> > 2) estimate improvement for the probing branch - Essentially, we need to >> > estimate how much we save by filtering some of the rows, but this also >> > neeeds to include the cost of probing the bloom filter. >> > >> > This will probably require some improvements to the lib/bloomfilter, in >> > order to estimate the false positive rate - this may matter a lot for >> > large data sets and small work_mem values. The bloomfilter library >> > simply reduces the size of the bloom filter, which increases the false >> > positive rate. At some point it'll start reducing the benefit. >> > >> >> These suggestions make a lot of sense. The current costing model is >> definitely not good enough, and we plan to work on a more robust >> costing model as we continue to improve the patch. >> >> >> > OK. Could also build the bloom filter in shared memory? >> > >> >> We thought about this approach but didn't prefer this one because if >> all worker processes share the same bloom filter in shared memory, we >> need to frequently lock and unlock the bloom filter to avoid race >> conditions. So we decided to have each worker process create its own >> bloom filter. >> >> >> > IMHO we shouldn't make too many conclusions from these examples. Yes, it >> > shows merge join can be improved, but for cases where a hashjoin works >> > better so we wouldn't use merge join anyway. >> > >> > I think we should try constructing examples where either merge join wins >> > already (and gets further improved by the bloom filter), or would lose >> > to hash join and the bloom filter improves it enough to win. >> > >> > AFAICS that requires a join of two large tables - large enough that hash >> > join would need to be batched, or pre-sorted inputs (which eliminates >> > the explicit Sort, which is the main cost in most cases). >> > >> > The current patch only works with sequential scans, which eliminates the >> > second (pre-sorted) option. So let's try the first one - can we invent >> > an example with a join of two large tables where a merge join would win? >> > >> > Can we find such example in existing benchmarks like TPC-H/TPC-DS. >> > >> >> Agreed. The current examples are only intended to show us that using >> bloom filters in merge join could improve the merge join performance >> in some cases. We are working on testing more examples that merge join >> with bloom filter could out-perform hash join, which should be more >> persuasive. >> >> >> > The bloom filter is built by the first seqscan (on t0), and then used by >> > the second seqscan (on t1). But this only works because we always run >> > the t0 scan to completion (because we're feeding it into Sort) before we >> > start scanning t1. >> > >> > But when the scan on t1 switches to an index scan, it's over - we'd be >> > building the filter without being able to probe it, and when we finish >> > building it we no longer need it. So this seems pretty futile. >> > >> > It might still improve plans like >> > >> >-> Merge Join >> > Merge Cond: (t0.c1 = t1.c1) >> > SemiJoin Filter Created Based on: (t0.c1 = t1.c1) >> > SemiJoin Estimated Filtering Rate: 1. >> > -> Sort >> >Sort Key: t0.c1 >> >-> Seq Scan on t0 >> > -> Index Scan on t1 >> > >> > But I don't know how common/likely that actually is. I'd expect to have >> > an index on
Re: PG upgrade 14->15 fails - database contains our own extension
On Thu, Oct 13, 2022 at 9:57 AM David Turoň wrote: > pg_restore: creating TYPE "public.lbuid" > pg_restore: creating CAST "CAST (integer AS "public"."lbuid")" > pg_restore: while PROCESSING TOC: > pg_restore: from TOC entry 3751; 2605 16393 CAST CAST (integer AS > "public"."lbuid") (no owner) > pg_restore: error: could not execute query: ERROR: return data type of cast > function must match or be binary-coercible to target data type > Command was: CREATE CAST (integer AS "public"."lbuid") WITH FUNCTION > "pg_catalog"."int8"(integer) AS IMPLICIT; I think the error is complaining that the return type of int8(integer), which is bigint, needs to coercible WITHOUT FUNCTION to lbuid. Your extension contains such a cast, but at the point when the error occurs, it hasn't been restored yet. That suggests that either the cast didn't get included in the dump file, or it got included in the wrong order. A quick test suggest the latter. If I execute your SQL file and the dump the database, I get: CREATE CAST (integer AS public.lbuid) WITH FUNCTION pg_catalog.int8(integer) AS IMPLICIT; CREATE CAST (bigint AS public.lbuid) WITHOUT FUNCTION AS IMPLICIT; CREATE CAST (public.lbuid AS bigint) WITHOUT FUNCTION AS IMPLICIT; That's not a valid dump ordering, and if I drop the casts and try to recreate them that way, it fails in the same way you saw. My guess is that this is a bug in Tom's commit b55f2b6926556115155930c4b2d006c173f45e65, "Adjust pg_dump's priority ordering for casts." -- Robert Haas EDB: http://www.enterprisedb.com
Re: Move backup-related code to xlogbackup.c/.h
On Thu, Oct 13, 2022 at 4:43 PM Alvaro Herrera wrote: > Thanks for reviewing. > > Hm. Agree. But, that requires us to include xlogbackup.h in > > xlog_internal.h for SessionBackupState enum in > > ResetXLogBackupActivity(). Is that okay? > > It's not great, but it's not *that* bad, ISTM, mainly because > xlog_internal.h will affect less stuff than xlog.h. Moved them to xlog_internal.h without xlogbackup.h included, please see below. > > SessionBackupState and it needs to be set before we release WAL insert > > locks, see the comment [1]. > > I see. Maybe we could keep that enum in xlog.h, instead. It's not required now, please see below. > While looking at how that works: I think calling a local variable > "session_backup_state" is super confusing, seeing that we have a > file-global variable called sessionBackupState. I recommend naming the > local "newstate" or something along those lines instead. > > I wonder why does pg_backup_start_callback() not change the backup state > before your patch. This seems a gratuitous difference, or is it? If > you change that code so that it also sets the status to BACKUP_NONE, > then you can pass a bare SessionBackupState to ResetXLogBackupActivity > rather than a pointer to one, which is a very strange arrangement that > exists only so that you can have a third state (NULL) meaning "don't > change state" -- that looks quite weird. > > Alternatively, if you don't want or can't change > pg_backup_start_callback to pass a valid state value, another solution > might be to pass a separate boolean "change state". > > But I would look at having another patch before your series that changes > pg_backup_start_callback to make the code identical for the three > callers, then you can simplify the patched code. The pg_backup_start_callback() can just go ahead and reset sessionBackupState. However, it leads us to the complete removal of pg_backup_start_callback() itself and use do_pg_abort_backup() consistently across, saving 20 LOC attached as v5-0001. With this, the other patches would get simplified a bit too, xlogbackup.h footprint got reduced now. Please find the v5 patch-set. 0002-0004 moves the backup code to xlogbackup.c/.h. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 5b2e8f86e388fbbe9f7eb276ca137b14ac0e2fa3 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Thu, 13 Oct 2022 12:18:46 + Subject: [PATCH v5] Use do_pg_abort_backup() consistently across the backup code Backup code uses another abort callback pg_backup_start_callback() when it has do_pg_abort_backup(). These two are before_shmem_exit() callbacks more or less do the same thing. The only difference between them is resetting sessionBackupState to SESSION_BACKUP_NONE. Actually, it is safe for us to reset sessionBackupState on aborts. This leads us to remove the pg_backup_start_callback() and use do_pg_abort_backup() consistently across. Author: Bharath Rupireddy per suggestion from Alvaro Herrera. Discussion: https://www.postgresql.org/message-id/20221013111330.564fk5tkwe3ha77l%40alvherre.pgsql --- src/backend/access/transam/xlog.c | 23 ++- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 27085b15a8..c9c1c6f287 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -679,8 +679,6 @@ static void ReadControlFile(void); static void UpdateControlFile(void); static char *str_time(pg_time_t tnow); -static void pg_backup_start_callback(int code, Datum arg); - static int get_sync_bit(int method); static void CopyXLogRecordToWAL(int write_len, bool isLogSwitch, @@ -8321,7 +8319,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, WALInsertLockRelease(); /* Ensure we release forcePageWrites if fail below */ - PG_ENSURE_ERROR_CLEANUP(pg_backup_start_callback, (Datum) 0); + PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, DatumGetBool(false)); { bool gotUniqueStartpoint = false; DIR *tblspcdir; @@ -8531,7 +8529,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, state->starttime = (pg_time_t) time(NULL); } - PG_END_ENSURE_ERROR_CLEANUP(pg_backup_start_callback, (Datum) 0); + PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, DatumGetBool(false)); state->started_in_recovery = backup_started_in_recovery; @@ -8541,23 +8539,6 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, sessionBackupState = SESSION_BACKUP_RUNNING; } -/* Error cleanup callback for pg_backup_start */ -static void -pg_backup_start_callback(int code, Datum arg) -{ - /* Update backup counters and forcePageWrites on failure */ - WALInsertLockAcquireExclusive(); - - Assert(XLogCtl->Insert.runningBackups > 0); - XLogCtl->Insert.runningBackups--; - - if (XLogCtl->Insert.runningBackups == 0) - { -
Re: Git tag for v15
Alvaro Herrera writes: > If for whatever reason a problem is found before the tag has been > posted, then a new Git commit is chosen and a new tarball published. > The tag will then match the new commit, not the original obviously. > I don't know what would happen if a problem were to be found *after* the > tag has been pushed; normally that would just mean the fix would have to > wait until the next minor version in that branch. It would have to be > something really serious in order for this process to be affected. > As far as I know, this has never happened. I don't think it's happened since we adopted Git, anyway. The plan if we did have to re-wrap at that point would be to increment the version number(s), since not doing so would probably create too much confusion about which tarballs were the correct ones. regards, tom lane
Re: Move backup-related code to xlogbackup.c/.h
On Thu, Oct 13, 2022 at 7:13 AM Alvaro Herrera wrote: > On 2022-Oct-13, Bharath Rupireddy wrote: > > Hm. Agree. But, that requires us to include xlogbackup.h in > > xlog_internal.h for SessionBackupState enum in > > ResetXLogBackupActivity(). Is that okay? > > It's not great, but it's not *that* bad, ISTM, mainly because > xlog_internal.h will affect less stuff than xlog.h. This is unfortunately a lot less true than I would like. I count 75 places where we #include "access/xlog.h" and 53 where we #include "access/xlog_internal.h". And many of those are in frontend code. I feel like the contents of xlog_internal.h are a bit too eclectic. Maybe stuff that has to do with the on-disk directory structure, like XLOGDIR and XLOG_FNAME_LEN, as well as stuff that has to do with where bytes are located, like XLByteToSeg, should move to another file. Besides that, which is the biggest part of the file, there's also stuff that has to do with the page and record format generally (like XLOG_PAGE_MAGIC and SizeOfXLogShortPHD) and stuff that is used for certain specific WAL record types (like xl_parameter_change and xl_restore_point) and some random rmgr-related things (like RmgrData and the stuff that folllows) and the usual assortment of random GUCs and global variables (like RecoveryTargetAction and ArchiveRecoveryRequested). Maybe it doesn't make sense to split this up into a thousand tiny little header files, but I think some rethinking would be a good idea, because it really doesn't make much sense to me to mix stuff that has to do with file-naming conventions, which a bunch of frontend code needs to know about, together with a bunch of backend-only things. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Transparent column encryption
If memory serves me correctly, if you statically link openssl this will work. If you are using ssl in a DLL, I believe that the DLL has its own "c-library" and its own heap. On Thu, Oct 13, 2022 at 9:43 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > On 06.10.22 17:19, Andres Freund wrote: > >>> psql error: stderr: 'OPENSSL_Uplink(7FFC165CBD50,08): no > OPENSSL_Applink' > >> What in the world is that about? What scant information I could find > >> suggests that it has something to do with building a "release" build > against > >> an "debug" build of the openssl library, or vice versa. But this patch > >> doesn't introduce any use of openssl that we haven't seen before. > > It looks to me that one needs to compile, in some form, > openssl/applink.c and > > link it to the application. No idea why that'd be required now and not > > earlier. > > I have figured this out. The problem is that on Windows you can't > reliably pass stdio FILE * handles between the application and OpenSSL. > To give the helpful places I found some Google juice, I'll mention them > here: > > - https://github.com/edenhill/librdkafka/pull/3602 > - https://github.com/edenhill/librdkafka/issues/3554 > - https://www.mail-archive.com/openssl-users@openssl.org/msg91029.html > > > >
PG upgrade 14->15 fails - database contains our own extension
Hi community, I have problem with pg_upgrade. Tested from 14.5 to 15.0 rc2 when database contains our extension with one new type. Using pg_dump & restore works well. We made workaround extension for some usage in javascript library that contains new type that represents bigint as text. So something like auto conversion from SELECT (2^32)::text -> bigint when data is stored and (2^32) -> text when data is retrieved. Im not sure if this is postgresql bug or we have something wrong in our extension with cast. So becouse this im writing there. Here is output of pg_upgrade: (our extension name is lbuid, our type public.lbuid) command: "/usr/pgsql-15/bin/pg_dump" --host /tmp/pg_upgrade_log --port 50432 --username postgres --schema-only --quote-all-identifiers --binary-upgrade --format=custom --file="/home/pgsql/data_new/pg_upgrade_output.d/20221013T104924.054/ dump/pg_upgrade_dump_16385.custom" 'dbname=lbstat' >> "/home/pgsql/data_new/pg_upgrade_output.d/20221013T104924.054/log/pg_upgrade_dump_16385.log" 2>&1 command: "/usr/pgsql-15/bin/pg_restore" --host /tmp/pg_upgrade_log --port 50432 --username postgres --create --exit-on-error --verbose --dbname template1 "/home/pgsql/data_new/pg_upgrade_output.d/20221013T104924.054/dump/pg_upgrade_dump_1 6385.custom" >> "/home/pgsql/data_new/pg_upgrade_output.d/20221013T104924.054/log/pg_upgrade_dump_16385.log" 2>&1 pg_restore: connecting to database for restore pg_restore: creating DATABASE "lbstat" pg_restore: connecting to new database "lbstat" pg_restore: creating DATABASE PROPERTIES "lbstat" pg_restore: connecting to new database "lbstat" pg_restore: creating pg_largeobject "pg_largeobject" pg_restore: creating SCHEMA "public" pg_restore: creating COMMENT "SCHEMA "public"" pg_restore: creating EXTENSION "lbuid" pg_restore: creating COMMENT "EXTENSION "lbuid"" pg_restore: creating SHELL TYPE "public.lbuid" pg_restore: creating FUNCTION "public.lbuid_in("cstring")" pg_restore: creating FUNCTION "public.lbuid_out("public"."lbuid")" pg_restore: creating TYPE "public.lbuid" pg_restore: creating CAST "CAST (integer AS "public"."lbuid")" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 3751; 2605 16393 CAST CAST (integer AS "public"."lbuid") (no owner) pg_restore: error: could not execute query: ERROR: return data type of cast function must match or be binary-coercible to target data type Command was: CREATE CAST (integer AS "public"."lbuid") WITH FUNCTION "pg_catalog"."int8"(integer) AS IMPLICIT; -- For binary upgrade, handle extension membership the hard way ALTER EXTENSION "lbuid" ADD CAST (integer AS "public"."lbuid"); Hint is good, but this cast is already member of our extension: lbstat=# ALTER EXTENSION lbuid ADD CAST (integer AS public.lbuid); ERROR: cast from integer to lbuid is already a member of extension "lbuid" Database contains this: CREATE EXTENSION lbuid ; CREATE TABLE test(id lbuild); INSERT INTO test VALUES ('1344456644646645456'); Tested on our distribution based on centos7. When i drop this cast from extension manualy a then manualy restore after pg_upgrade, then operation is without failure. In attachment are extension files. Thanks. Best regards. David T. (See attached file: lbuid.control)(See attached file: lbuid--0.1.0.sql) -- - Ing. David TUROŇ LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.:+420 591 166 224 fax:+420 596 621 273 mobil: +420 732 589 152 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - lbuid.control Description: Binary data lbuid--0.1.0.sql Description: Binary data
Re: Transparent column encryption
On 06.10.22 17:19, Andres Freund wrote: psql error: stderr: 'OPENSSL_Uplink(7FFC165CBD50,08): no OPENSSL_Applink' What in the world is that about? What scant information I could find suggests that it has something to do with building a "release" build against an "debug" build of the openssl library, or vice versa. But this patch doesn't introduce any use of openssl that we haven't seen before. It looks to me that one needs to compile, in some form, openssl/applink.c and link it to the application. No idea why that'd be required now and not earlier. I have figured this out. The problem is that on Windows you can't reliably pass stdio FILE * handles between the application and OpenSSL. To give the helpful places I found some Google juice, I'll mention them here: - https://github.com/edenhill/librdkafka/pull/3602 - https://github.com/edenhill/librdkafka/issues/3554 - https://www.mail-archive.com/openssl-users@openssl.org/msg91029.html
Re: Tracking last scan time
Hi On Wed, 12 Oct 2022 at 23:52, Andres Freund wrote: > Hi, > > On 2022-10-12 12:50:31 -0700, Andres Freund wrote: > > I think this should have at a basic test in > src/test/regress/sql/stats.sql. If > > I can write one in a few minutes I'll go for that, otherwise will reply > > detailing difficulties. > > Took a bit longer (+lunch). Attached. > > > In the attached 0001, the patch to make > GetCurrentTransactionStopTimestamp() > set xactStopTimestamp, I added a few comment updates and an Assert() to > ensure > that CurrentTransactionState->state is > TRANS_(DEFAULT|COMMIT|ABORT|PREPARE). I > am worried that otherwise we might end up with someone ending up using it > in a > place before the end of the transaction, which'd then end up recording the > wrong timestamp in the commit/abort record. > > > For 0002, the commit adding lastscan, I added catversion/stats version > bumps > (because I was planning to commit it already...), a commit message, and > that > minor docs change mentioned earlier. > > > 0003 adds the tests mentioned above. I plan to merge them with 0002, but > left > them separate for easier review for now. > > To be able to compare timestamps for > not just >= we need to make sure > that > two subsequent timestamps differ. The attached achieves this by sleeping > for > 100ms between those points - we do that in other places already. I'd > started > out with 10ms, which I am fairly sure would suffice, but then deciced to > copy > the existing 100ms sleeps. > > I verified tests pass under valgrind, debug_discard_caches and after I make > pgstat_report_stat() only flush when force is passed in. > Thanks for that. It looks good to me, bar one comment (repeated 3 times in the sql and expected files): fetch timestamps from before the next test "from " should be removed. -- Dave Page Blog: https://pgsnake.blogspot.com Twitter: @pgsnake EDB: https://www.enterprisedb.com
RE: possible typo for CREATE PUBLICATION description
Hi, Alvaro-san On Thursday, October 13, 2022 8:38 PM Alvaro Herrera wrote: > On 2022-Oct-13, osumi.takami...@fujitsu.com wrote: > > > I noticed a possible typo in the doc for create publication. > > This applies to PG15 as well. > > Kindly have a look at the attached patch for it. > > Yeah, looks good. It actually applies all the way back to 10, so I pushed it > to > all branches. I included Peter's wording changes as well. > > Thanks You are right and thank you so much for taking care of this ! Best Regards, Takamichi Osumi
Re: possible typo for CREATE PUBLICATION description
Hello On 2022-Oct-13, osumi.takami...@fujitsu.com wrote: > I noticed a possible typo in the doc for create publication. > This applies to PG15 as well. > Kindly have a look at the attached patch for it. Yeah, looks good. It actually applies all the way back to 10, so I pushed it to all branches. I included Peter's wording changes as well. Thanks -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Ellos andaban todos desnudos como su madre los parió, y también las mujeres, aunque no vi más que una, harto moza, y todos los que yo vi eran todos mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)
Re: Git tag for v15
On 2022-Oct-12, Matthias van de Meent wrote: > On Wed, 12 Oct 2022 at 20:08, Nemo wrote: > > Since v15 doesn't seem to be announced yet - this is confusing me. It > > doesn't impact us anywhere (yet), since we haven't added the v15 release > > cycle to our page yet, but I'd like to check if this is an incorrect tag > > or? If not, what's the correct source for us to use for automation purposes? > > Tags are usually published a few days in advance of the official > release, so that all packagers have the time to bundle and prepare the > release in their repositories. The official release is planned for > tomorrow the 13th, as mentioned in [0]. To be more precise, our cadence goes pretty much every time like this (both yearly major releases as well as quarterly minor): - a Git commit is determined for a release on Monday (before noon EDT/EST); a tarball produced from that commit is posted to registered packagers - By Tuesday evening EDT/EST, if no packagers have reported problems, the tag corresponding to the given commit is pushed to the repo - By Thursday noon EDT/EST the announcement is made and the packages are made available. Thus packagers have all of Monday and Tuesday to report problems. They typically don't, since the buildfarm alerts us soon enough to any portability problems. Packaging issues are normally found (and dealt with) during beta. If for whatever reason a problem is found before the tag has been posted, then a new Git commit is chosen and a new tarball published. The tag will then match the new commit, not the original obviously. I don't know what would happen if a problem were to be found *after* the tag has been pushed; normally that would just mean the fix would have to wait until the next minor version in that branch. It would have to be something really serious in order for this process to be affected. As far as I know, this has never happened. As far as Postgres is concerned, you could automate things so that a tag detected on a Tuesday is marked as released on the immediately following Thursday (noon EDT/EST). If you did this, you'd get it right 99% of the time, and the only way to achieve 100% would be to have a bot that follows the quarterly announcements in pgsql-announce. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Allow tests to pass in OpenSSL FIPS mode
On 2022-Oct-13, Peter Eisentraut wrote: > Right, that's the rest of my original patch. I'll come back with an updated > version of that. However, there are some changes in brin_multi.out that are quite surprising and suggest that we might have bugs in brin: +WARNING: unexpected number of results 31 for (macaddr8col,>,macaddr8,b1:d1:0e:7b:af:a4:42:12,33) +WARNING: unexpected number of results 17 for (macaddr8col,>=,macaddr8,d9:35:91:bd:f7:86:0e:1e,15) +WARNING: unexpected number of results 11 for (macaddr8col,<=,macaddr8,23:e8:46:63:86:07:ad:cb,13) +WARNING: unexpected number of results 4 for (macaddr8col,<,macaddr8,13:16:8e:6a:2e:6c:84:b4,6) -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "La victoria es para quien se atreve a estar solo"
Re: Move backup-related code to xlogbackup.c/.h
On 2022-Oct-13, Bharath Rupireddy wrote: > Hm. Agree. But, that requires us to include xlogbackup.h in > xlog_internal.h for SessionBackupState enum in > ResetXLogBackupActivity(). Is that okay? It's not great, but it's not *that* bad, ISTM, mainly because xlog_internal.h will affect less stuff than xlog.h. > SessionBackupState and it needs to be set before we release WAL insert > locks, see the comment [1]. I see. Maybe we could keep that enum in xlog.h, instead. While looking at how that works: I think calling a local variable "session_backup_state" is super confusing, seeing that we have a file-global variable called sessionBackupState. I recommend naming the local "newstate" or something along those lines instead. I wonder why does pg_backup_start_callback() not change the backup state before your patch. This seems a gratuitous difference, or is it? If you change that code so that it also sets the status to BACKUP_NONE, then you can pass a bare SessionBackupState to ResetXLogBackupActivity rather than a pointer to one, which is a very strange arrangement that exists only so that you can have a third state (NULL) meaning "don't change state" -- that looks quite weird. Alternatively, if you don't want or can't change pg_backup_start_callback to pass a valid state value, another solution might be to pass a separate boolean "change state". But I would look at having another patch before your series that changes pg_backup_start_callback to make the code identical for the three callers, then you can simplify the patched code. > Should we just remove the > SessionBackupState enum and convert SESSION_BACKUP_NONE and > SESSION_BACKUP_RUNNING to just macros in xlogbackup.h and use integer > type to pass the state across? I don't know what's better here. Do you > have any thoughts on this? No, please, no passing of unadorned magic numbers. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
On Wed, Oct 12, 2022 at 4:16 PM vignesh C wrote: > > On Thu, 6 Oct 2022 at 12:44, Amit Kapila wrote: > > > > On Sat, Oct 1, 2022 at 12:35 AM Andres Freund wrote: > > > > > > This issue does occasionally happen in CI, as e.g. noted in this thread: > > > https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com > > > > > > On 2022-08-18 15:17:47 +0530, Amit Kapila wrote: > > > > I agree with you that getting rid of the clean-up lock on the new > > > > bucket is a more invasive patch and should be done separately if > > > > required. Yesterday, I have done a brief analysis and I think that is > > > > possible but it doesn't seem to be a good idea to backpatch it. > > > > > > My problem with this approach is that the whole cleanup lock is hugely > > > misleading as-is. As I noted in > > > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de > > > we take the cleanup lock *after* re-initializing the page. Thereby > > > completely breaking the properties that a cleanup lock normally tries to > > > guarantee. > > > > > > Even if that were to achieve something useful (doubtful in this case), > > > it'd need a huge comment explaining what's going on. > > > > > > > Attached are two patches. The first patch is what Robert has proposed > > with some changes in comments to emphasize the fact that cleanup lock > > on the new bucket is just to be consistent with the old bucket page > > locking as we are initializing it just before checking for cleanup > > lock. In the second patch, I removed the acquisition of cleanup lock > > on the new bucket page and changed the comments/README accordingly. > > > > I think we can backpatch the first patch and the second patch can be > > just a HEAD-only patch. Does that sound reasonable to you? > > Thanks for the patches. > I have verified that the issue is fixed using a manual test upto > REL_10_STABLE version and found it to be working fine. > Thanks for the verification. I am planning to push the first patch (and backpatch it) next week (by next Tuesday) unless we have more comments or Robert intends to push it. -- With Regards, Amit Kapila.
Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
On Thu, 13 Oct 2022 at 21:17, Richard Guo wrote: > > On Thu, Oct 13, 2022 at 2:48 PM David Rowley wrote: >> To stop the planner from adding that final sort, I opted to hack the >> LimitPath's pathkeys to say that it's already sorted by the >> PlannerInfo's sort_pathkeys. That feels slightly icky, but it does >> seem a little wasteful to initialise a sort node on every execution of >> the plan to sort a single tuple. > > > I don't get how this plan comes out. It seems not correct because Limit > node above an unsorted path would give us an unpredictable row. Actually, you're right. That manual setting of the pathkeys is an unneeded remanent from a bug I fixed before sending out v2. It can just be removed. I've attached the v3 patch. David diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 5d0fd6e072..839b034b29 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -4780,11 +4780,44 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel, if (pathkeys_contained_in(needed_pathkeys, path->pathkeys)) { - add_path(distinct_rel, (Path *) -create_upper_unique_path(root, distinct_rel, - path, - list_length(root->distinct_pathkeys), - numDistinctRows)); + /* +* distinct_pathkeys may have become empty if all of the +* pathkeys were determined to be redundant. If all of the +* pathkeys are redundant then each DISTINCT target must only +* allow a single value, therefore all resulting tuples must +* be identical. We can uniquify these tuples simply by just +* taking the first tuple. All we do here is add a path to do +* LIMIT 1 on path. When doing DISTINCT ON we may still have +* a non-NIL sort_pathkeys list, so we must still only do this +* with paths which are contained in needed_pathkeys. +*/ + if (root->distinct_pathkeys == NIL) + { + Node *limitCount; + + limitCount = (Node *) makeConst(INT8OID, -1, InvalidOid, + sizeof(int64), + Int64GetDatum(1), false, + FLOAT8PASSBYVAL); + + /* +* If the query already has a LIMIT clause, then we could +* end up with a duplicate LimitPath in the final plan. +* That does not seem worth troubling over too much. +*/ + add_path(distinct_rel, (Path *) + create_limit_path(root, distinct_rel, path, NULL, + limitCount, LIMIT_OPTION_COUNT, + 0, 1)); + } + else + { + add_path(distinct_rel, (Path *) + create_upper_unique_path(root, distinct_rel, + path, + list_length(root->distinct_pathkeys), + numDistinctRows)); + } } } @@ -4805,13 +4838,33 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel, path = (Path *) create_sort_path(root, distinct_rel,
Re: Suppressing useless wakeups in walreceiver
I think in 0001 we should put more stuff in the state struct -- specifically these globals: static int recvFile = -1; static TimeLineID recvFileTLI = 0; static XLogSegNo recvSegNo = 0; The main reason is that it seems odd to have startpointTLI in the struct used in some places together with a file-global recvFileTLI which isn't. The way one is passed as argument and the other as part of a struct seemed too distracting. This should reduce the number of moving parts, ISTM. One thing that confused me for a moment is that we have some state in walrcv and some more state in 'state'. The difference is pretty obvious once you look at the other, but it suggest to me that a better variable name for the latter is 'localstate' to more obviously distinguish them. I was tempted to suggest that LogstreamResult would also be good to have in the new struct, but that might be going a bit too far for a first cut. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Make finding openssl program a configure or meson option
On 12.10.22 03:08, Michael Paquier wrote: On Tue, Oct 11, 2022 at 05:06:22PM +0200, Peter Eisentraut wrote: Various test suites use the "openssl" program as part of their setup. There isn't a way to override which openssl program is to be used, other than by fiddling with the path, perhaps. This has gotten increasingly problematic with some of the work I have been doing, because different versions of openssl have different capabilities and do different things by default. This patch checks for an openssl binary in configure and meson setup, with appropriate ways to override it. This is similar to how "lz4" and "zstd" are handled, for example. The meson build system actually already did this, but the result was only used in some places. This is now applied more uniformly. openssl-env allows the use of the environment variable of the same name. This reminds me a bit of the recent interferences with GZIP, for example. Sorry, what is "openssl-env"? I can't find that anywhere. This patch is missing one addition of set_single_env() in vcregress.pl, and one update of install-windows.sgml where all the supported environment variables for commands are listed. Ok, I'll add that.
Re: Allow tests to pass in OpenSSL FIPS mode
On 12.10.22 03:18, Michael Paquier wrote: On Tue, Oct 11, 2022 at 01:51:50PM +0200, Peter Eisentraut wrote: Let's make a small start on this. The attached patch moves the tests of the md5() function to a separate test file. That would ultimately make it easier to maintain a variant expected file for FIPS mode where that function will fail (similar to how we have done it for the pgcrypto tests). Makes sense to me. This slice looks fine. Committed. I think that the other md5() computations done in the main regression test suite could just be switched to use one of the sha*() functions as they just want to put their hands on text values. It looks like a few of them have some expections with the output size and generate_series(), though, but this could be tweaked by making the series shorter, for example. Right, that's the rest of my original patch. I'll come back with an updated version of that.
Re: Move backup-related code to xlogbackup.c/.h
On Thu, Oct 13, 2022 at 3:12 PM Alvaro Herrera wrote: > > As I see it, xlog.h is a header that exports XLOG manipulations to the > outside world (everything that produces WAL, as well as stuff that > controls operation); xlog_internal is the header that exports xlog*.c > internal stuff for other xlog*.c files and specialized frontends to use. > These new functions are internal to xlogbackup.c and xlog.c, so IMO they > belong in xlog_internal.h. > > Stuff that is used from xlog.c only by xlogrecovery.c should also appear > in xlog_internal.h only, not xlog.h, so I suggest not to take that as > precedent. Also, that file (xlogrecovery.c) is pretty new so we haven't > had time to nail down the .h layout yet. Hm. Agree. But, that requires us to include xlogbackup.h in xlog_internal.h for SessionBackupState enum in ResetXLogBackupActivity(). Is that okay? SessionBackupState and it needs to be set before we release WAL insert locks, see the comment [1]. Should we just remove the SessionBackupState enum and convert SESSION_BACKUP_NONE and SESSION_BACKUP_RUNNING to just macros in xlogbackup.h and use integer type to pass the state across? I don't know what's better here. Do you have any thoughts on this? [1] * You might think that WALInsertLockRelease() can be called before * cleaning up session-level lock because session-level lock doesn't need * to be protected with WAL insertion lock. But since * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be * cleaned up before it. */ -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Suppressing useless wakeups in walreceiver
On Tue, Oct 11, 2022 at 11:22 PM Nathan Bossart wrote: > > On Tue, Oct 11, 2022 at 09:34:25AM +0530, Bharath Rupireddy wrote: > > now = t1; > > XLogWalRcvSendReply() /* say it ran for a minute or so for whatever reasons > > */ > > XLogWalRcvSendHSFeedback() /* with patch walrecevier sends hot standby > > feedback more often without properly honouring > > wal_receiver_status_interval because the 'now' isn't actually the > > current time as far as that function is concerned, it is > > t1 + XLogWalRcvSendReply()'s time. */ > > > > Well, is this really a problem? I'm not sure about that. Let's hear from > > others. > > For this example, the feedback message would just be sent in the next loop > iteration instead. I think the hot standby feedback message gets sent too frequently without honouring the wal_receiver_status_interval because the 'now' is actually not the current time with your patch but 'now + XLogWalRcvSendReply()'s time'. However, it's possible that I may be wrong here. /* * Send feedback at most once per wal_receiver_status_interval. */ if (!TimestampDifferenceExceeds(sendTime, now, wal_receiver_status_interval * 1000)) return; As said upthread [1], I think the best way to move forward is to separate the GetCurrentTimestamp() calls optimizations into 0003. Do you have any further thoughts on review comments posted upthread [1]? [1] https://www.postgresql.org/message-id/CALj2ACV5q2dbVRKwu2PL2s_YY0owZFTRxLdX%3Dt%2BdZ1iag15khA%40mail.gmail.com -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Fast COPY FROM based on batch insert
On Thu, Oct 13, 2022 at 1:38 PM Andrey Lepikhov wrote: > On 10/12/22 07:56, Etsuro Fujita wrote: > > On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov > > wrote: > >> I reviewed the patch one more time. Only one question: bistate and > >> ri_FdwRoutine are strongly bounded. Maybe to add some assertion on > >> (ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future. > > > > You mean the bistate member of CopyMultiInsertBuffer? > Yes > > > > We do not use that member at all for foreign tables, so the patch > > avoids initializing that member in CopyMultiInsertBufferInit() when > > called for a foreign table. So we have bistate = NULL for foreign > > tables (and bistate != NULL for plain tables), as you mentioned above. > > I think it is a good idea to add such assertions. How about adding > > them to CopyMultiInsertBufferFlush() and > > CopyMultiInsertBufferCleanup() like the attached? In the attached I > > updated comments a bit further as well. > Yes, quite enough. I have committed the patch after tweaking comments a little bit further. Best regards, Etsuro Fujita
Re: archive modules
On Mon, Oct 10, 2022 at 1:17 PM Peter Eisentraut wrote: > > On 05.10.22 20:57, Nathan Bossart wrote: > >> What we are talking about here is, arguably, a misconfiguration, so it > >> should result in an error. > > > > Okay. What do you think about something like the attached? The intent here looks reasonable to me. However, why should the user be able to set both archive_command and archive_library in the first place only to later fail in LoadArchiveLibrary() per the patch? IMO, the check_hook() is the right way to disallow any sorts of GUC misconfigurations, no? FWIW, I'm attaching a small patch that uses check_hook(). > That looks like the right solution to me. > > Let's put that into PG 16, and maybe we can consider backpatching it. +1 to backpatch to PG 15 where the archive modules feature was introduced. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From c5969071dfb9064bbf9e22513bf2cef57bcfd84f Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Thu, 13 Oct 2022 09:37:40 + Subject: [PATCH v1] Handle misconfigurations of archive_command and archive_library The parameters archive_command and archive_library are mutually exclusive. This patch errors out if the user is trying to set both of them at a time. --- doc/src/sgml/config.sgml| 11 +++ src/backend/access/transam/xlog.c | 16 src/backend/postmaster/pgarch.c | 18 +- src/backend/utils/misc/guc_tables.c | 4 ++-- src/include/utils/guc_hooks.h | 4 5 files changed, 46 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 66312b53b8..62e7d61da7 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3597,9 +3597,10 @@ include_dir 'conf.d' This parameter can only be set in the postgresql.conf -file or on the server command line. It is ignored unless -archive_mode was enabled at server start and -archive_library is set to an empty string. +file or on the server command line. It is not allowed to set both +archive_command and archive_library +at the same time, doing so will cause an error. This parameter is ignored +unless archive_mode was enabled at server start. If archive_command is an empty string (the default) while archive_mode is enabled (and archive_library is set to an empty string), WAL archiving is temporarily @@ -3625,7 +3626,9 @@ include_dir 'conf.d' The library to use for archiving completed WAL file segments. If set to an empty string (the default), archiving via shell is enabled, and is used. Otherwise, the specified -shared library is used for archiving. For more information, see +shared library is used for archiving. It is not allowed to set both +archive_library and archive_command +at the same time, doing so will cause an error. For more information, see and . diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 27085b15a8..64714c6940 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4462,6 +4462,22 @@ show_archive_command(void) return "(disabled)"; } +/* + * GUC check_hook for archive_command + */ +bool +check_archive_command(char **newval, void **extra, GucSource source) +{ + if (*newval && strcmp(*newval, "") != 0 && XLogArchiveLibrary[0] != '\0') + { + GUC_check_errmsg("cannot set \"archive_command\" when \"archive_library\" is specified"); + GUC_check_errdetail("Only one of \"archive_command\" or \"archive_library\" can be specified."); + return false; + } + + return true; +} + /* * GUC show_hook for in_hot_standby */ diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 3868cd7bd3..f8c05754a1 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -44,7 +44,7 @@ #include "storage/procsignal.h" #include "storage/shmem.h" #include "storage/spin.h" -#include "utils/guc.h" +#include "utils/guc_hooks.h" #include "utils/memutils.h" #include "utils/ps_status.h" @@ -146,6 +146,22 @@ static int ready_file_comparator(Datum a, Datum b, void *arg); static void LoadArchiveLibrary(void); static void call_archive_module_shutdown_callback(int code, Datum arg); +/* + * GUC check_hook for check_archive_library + */ +bool +check_archive_library(char **newval, void **extra, GucSource source) +{ + if (*newval && strcmp(*newval, "") != 0 && XLogArchiveCommand[0] != '\0') + { + GUC_check_errmsg("cannot set \"archive_library\" when \"archive_command\" is specified"); + GUC_check_errdetail("Only one of \"archive_library\" or \"archive_command\" can be specified."); + return false; + } + + return true; +} + /* Report shared memory space needed by PgArchShmemInit */
Lack of PageSetLSN in heap_xlog_visible
Hi hackers! heap_xlog_visible is not bumping heap page LSN when setting all-visible flag in it. There is long comment explaining it: /* * We don't bump the LSN of the heap page when setting the visibility * map bit (unless checksums or wal_hint_bits is enabled, in which * case we must), because that would generate an unworkable volume of * full-page writes. This exposes us to torn page hazards, but since * we're not inspecting the existing page contents in any way, we * don't care. * * However, all operations that clear the visibility map bit *do* bump * the LSN, and those operations will only be replayed if the XLOG LSN * follows the page LSN. Thus, if the page LSN has advanced past our * XLOG record's LSN, we mustn't mark the page all-visible, because * the subsequent update won't be replayed to clear the flag. */ But it still not clear for me that not bumping LSN in this place is correct if wal_log_hints is set. In this case we will have VM page with larger LSN than heap page, because visibilitymap_set bumps LSN of VM page. It means that in theory after recovery we may have page marked as all-visible in VM, but not having PD_ALL_VISIBLE in page header. And it violates VM constraint: * When we *set* a visibility map during VACUUM, we must write WAL. This may * seem counterintuitive, since the bit is basically a hint: if it is clear, * it may still be the case that every tuple on the page is visible to all * transactions; we just don't know that for certain. The difficulty is that * there are two bits which are typically set together: the PD_ALL_VISIBLE bit * on the page itself, and the visibility map bit. If a crash occurs after the * visibility map page makes it to disk and before the updated heap page makes * it to disk, redo must set the bit on the heap page. Otherwise, the next * insert, update, or delete on the heap page will fail to realize that the * visibility map bit must be cleared, possibly causing index-only scans to * return wrong answers.
Re: Move backup-related code to xlogbackup.c/.h
On 2022-Oct-13, Bharath Rupireddy wrote: > On Wed, Oct 12, 2022 at 1:04 PM Alvaro Herrera > wrote: > > You added some commentary that these functions are tailor-made for > > internal operations, and then declared them in the most public header > > function that xlog has? I think at the bare minimum, these prototypes > > should be in xlog_internal.h, not xlog.h. > > I removed such comments. These are functions used by xlogbackup.c to > call back into xlog.c similar to the call back functions defined in > xlog.h for xlogrecovery.c. And, most of the XLogCtl set/get sort of > function declarations are in xlog.h. So, I'm retaining them in xlog.h. As I see it, xlog.h is a header that exports XLOG manipulations to the outside world (everything that produces WAL, as well as stuff that controls operation); xlog_internal is the header that exports xlog*.c internal stuff for other xlog*.c files and specialized frontends to use. These new functions are internal to xlogbackup.c and xlog.c, so IMO they belong in xlog_internal.h. Stuff that is used from xlog.c only by xlogrecovery.c should also appear in xlog_internal.h only, not xlog.h, so I suggest not to take that as precedent. Also, that file (xlogrecovery.c) is pretty new so we haven't had time to nail down the .h layout yet. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
RE: [RFC] building postgres with meson - v13
Hi, I noticed that `pg_config --configure` didn't show the options given when building with meson. For example, meson setup build -Dcache=gcc.cache -Ddtrace=enabled -Dicu=enabled -Dcassert=true -Dprefix=/home/postgres/install_meson/ meson compile -C build meson install -C build $ pg_config --configure The options I specified (like dtrace) are not shown. I found they actually work in compilation. When specifying `-Ddtrace=enabled`, there is a log like this. And with `-Ddtrace=disabled`, no such log. [120/1834] /usr/bin/dtrace -C -h -s ../src/include/utils/../../backend/utils/probes.d -o src/include/utils/probes.h.tmp Maybe it would be better if pg_config can output this information, to be consistent with the output when building with `./configure` and `make`. The output when building with `./configure` and `make`: $ pg_config --configure '--prefix=/home/postgres/install/' '--cache' 'gcc.cache' '--enable-dtrace' '--with-icu' '--enable-cassert' Regards, Shi yu
Re: Refactor to introduce pg_strcoll().
On 07.10.22 01:15, Jeff Davis wrote: + * Call ucol_strcollUTF8(), ucol_strcoll(), strcoll(), strcoll_l(), wcscoll(), + * or wcscoll_l() as appropriate for the given locale, platform, and database + * encoding. Arguments must be NUL-terminated. If the locale is not specified, + * use the database collation. + * + * If the collation is deterministic, break ties with memcmp(), and then with + * the string length. + */ +int +pg_strcoll(const char *arg1, int len1, const char *arg2, int len2, + pg_locale_t locale) It's a bit confusing that arguments must be NUL-terminated, but the length is still specified. Maybe another sentence to explain that would be helpful. The length arguments ought to be of type size_t, I think.
Question about pull_up_sublinks_qual_recurse
Hi: When I was working on another task, the following case caught my mind. create table t1(a int, b int, c int); create table t2(a int, b int, c int); create table t3(a int, b int, c int); explain (costs off) select * from t1 where exists (select 1 from t2 where exists (select 1 from t3 where t3.c = t1.c and t2.b = t3.b) and t2.a = t1.a); I got the plan like this: QUERY PLAN --- Hash Semi Join Hash Cond: (t1.a = t2.a) Join Filter: (hashed SubPlan 2) -> Seq Scan on t1 -> Hash -> Seq Scan on t2 SubPlan 2 -> Seq Scan on t3 (8 rows) Note we CAN'T pull up the inner sublink which produced the SubPlan 2. I traced the reason is after we pull up the outer sublink, we got: select * from t1 semi join t2 on t2.a = t1.a AND exists (select 1 from t3 where t3.c = t1.c and t2.b = t3.b); Later we tried to pull up the EXISTS sublink to t1 OR t2 *separately*, since this subselect referenced to t1 *AND* t2, so we CAN'T pull up the sublink. I am thinking why we have to pull up it t1 OR t2 rather than JoinExpr(t1, t2), I think the latter one is better. So I changed the code like this, I got the plan I wanted and 'make installcheck' didn't find any exception. QUERY PLAN Hash Semi Join Hash Cond: ((t2.b = t3.b) AND (t1.c = t3.c)) -> Hash Semi Join Hash Cond: (t1.a = t2.a) -> Seq Scan on t1 -> Hash -> Seq Scan on t2 -> Hash -> Seq Scan on t3 (9 rows) @@ -553,10 +553,10 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, */ j->quals = pull_up_sublinks_qual_recurse(root, j->quals, - >larg, - available_rels1, - >rarg, - child_rels); + jtlink1, + bms_union(available_rels1, child_rels), + NULL, + NULL); /* Return NULL representing constant TRUE */ return NULL; } Any feedback is welcome. -- Best Regards Andy Fan
libpq support for NegotiateProtocolVersion
We have the NegotiateProtocolVersion protocol message [0], but libpq doesn't actually handle it. Say I increase the protocol number in libpq: - conn->pversion = PG_PROTOCOL(3, 0); + conn->pversion = PG_PROTOCOL(3, 1); Then I get psql: error: connection to server on socket "/tmp/.s.PGSQL.65432" failed: expected authentication request from server, but received v And the same for a protocol option (_pq_.something). Over in the column encryption patch, I'm proposing to add such a protocol option, and the above is currently the behavior when the server doesn't support it. The attached patch adds explicit handling of this protocol message to libpq. So the output in the above case would then be: psql: error: connection to server on socket "/tmp/.s.PGSQL.65432" failed: protocol version not supported by server: client uses 3.1, server supports 3.0 Or to test a protocol option: @@ -2250,6 +2291,8 @@ build_startup_packet(const PGconn *conn, char *packet, if (conn->client_encoding_initial && conn->client_encoding_initial[0]) ADD_STARTUP_OPTION("client_encoding", conn->client_encoding_initial); + ADD_STARTUP_OPTION("_pq_.foobar", "1"); + /* Add any environment-driven GUC settings needed */ for (next_eo = options; next_eo->envName; next_eo++) { Result: psql: error: connection to server on socket "/tmp/.s.PGSQL.65432" failed: protocol extension not supported by server: _pq_.foobar [0]: https://www.postgresql.org/docs/devel/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-NEGOTIATEPROTOCOLVERSIONFrom 93df3a8d0a15b718669e4b21d8455a91ca1896fd Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 13 Oct 2022 10:22:49 +0200 Subject: [PATCH] libpq: Handle NegotiateProtocolVersion message Before, receiving a NegotiateProtocolVersion message would result in a confusing error message like expected authentication request from server, but received v This adds proper handling of this protocol message and produces an on-topic error message from it. --- src/interfaces/libpq/fe-connect.c | 18 ++--- src/interfaces/libpq/fe-protocol3.c | 41 + src/interfaces/libpq/libpq-int.h| 1 + 3 files changed, 57 insertions(+), 3 deletions(-) diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 746e9b4f1efc..1c0d8243a6ca 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -3246,10 +3246,11 @@ PQconnectPoll(PGconn *conn) /* * Validate message type: we expect only an authentication -* request or an error here. Anything else probably means -* it's not Postgres on the other end at all. +* request, NegotiateProtocolVersion, or an error here. +* Anything else probably means it's not Postgres on the other +* end at all. */ - if (!(beresp == 'R' || beresp == 'E')) + if (!(beresp == 'R' || beresp == 'v' || beresp == 'E')) { appendPQExpBuffer(>errorMessage, libpq_gettext("expected authentication request from server, but received %c\n"), @@ -3405,6 +3406,17 @@ PQconnectPoll(PGconn *conn) goto error_return; } + else if (beresp == 'v') + { + if (pqGetNegotiateProtocolVersion3(conn)) + { + /* We'll come back when there is more data */ + return PGRES_POLLING_READING; + } + /* OK, we read the message; mark data consumed */ + conn->inStart = conn->inCursor; + goto error_return; + } /* It is an authentication request. */ conn->auth_req_received = true; diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c index f001137b7692..2f6c1494c1b5 100644 --- a/src/interfaces/libpq/fe-protocol3.c +++ b/src/interfaces/libpq/fe-protocol3.c @@ -1397,6 +1397,47 @@ reportErrorPosition(PQExpBuffer msg, const char *query, int loc, int encoding) } +/* + * Attempt to read a NegotiateProtocolVersion message. + * Entry: 'v' message type and length have already been consumed. + * Exit: returns 0 if successfully consumed message. + *
Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
On Thu, Oct 13, 2022 at 2:48 PM David Rowley wrote: > On Thu, 13 Oct 2022 at 16:47, Richard Guo wrote: > > Currently in the patch the optimization is done before we check for > > presorted paths or do the explicit sort of the cheapest path. How about > > we move this optimization into the branch where we've found any > > presorted paths? Maybe something like: > > I've attached a patch to that effect, but it turns out a bit more > complex than what you imagined. We still need to handle the case for > when there's no path that has the required pathkeys and we must add a > SortPath to the cheapest path. That requires adding some similar code > to add the LimitPath after the foreach loop over the pathlist is over. Thanks for the new patch. Previously I considered we just apply this optimization for adequately-presorted paths so that we can just fetch the first row from that path. But yes we can also do this optimization for explicit-sort case so that we can get the result from a top-1 heapsort, just like the new patch does. I was also getting some weird plans like: > > regression=# explain select distinct on (four) four,hundred from tenk1 > where four=0 order by 1,2; > QUERY PLAN > -- > Sort (cost=0.20..0.20 rows=1 width=8) >Sort Key: hundred >-> Limit (cost=0.00..0.19 rows=1 width=8) > -> Seq Scan on tenk1 (cost=0.00..470.00 rows=2500 width=8) >Filter: (four = 0) > > To stop the planner from adding that final sort, I opted to hack the > LimitPath's pathkeys to say that it's already sorted by the > PlannerInfo's sort_pathkeys. That feels slightly icky, but it does > seem a little wasteful to initialise a sort node on every execution of > the plan to sort a single tuple. I don't get how this plan comes out. It seems not correct because Limit node above an unsorted path would give us an unpredictable row. I tried this query without the hack to LimitPath's pathkeys and I get plans below, with or with index scan: explain (costs off) select distinct on (four) four,hundred from tenk1 where four=0 order by 1,2; QUERY PLAN - Result -> Limit -> Index Scan using tenk1_hundred on tenk1 Filter: (four = 0) explain (costs off) select distinct on (four) four,hundred from tenk1 where four=0 order by 1,2; QUERY PLAN -- Limit -> Sort Sort Key: hundred -> Seq Scan on tenk1 Filter: (four = 0) Thanks Richard
Re: ExecRTCheckPerms() and many prunable partitions
On Wed, Oct 12, 2022 at 10:50 PM Peter Eisentraut wrote: > On 06.10.22 15:29, Amit Langote wrote: > > I tried in the attached 0004. ModifyTable gets a new member > > extraUpdatedColsBitmaps, which is List of Bitmapset "nodes". > > > > Actually, List of Bitmapsets turned out to be something that doesn't > > just-work with our Node infrastructure, which I found out thanks to > > -DWRITE_READ_PARSE_PLAN_TREES. So, I had to go ahead and add > > first-class support for copy/equal/write/read support for Bitmapsets, > > such that writeNode() can write appropriately labeled versions of them > > and nodeRead() can read them as Bitmapsets. That's done in 0003. I > > didn't actually go ahead and make*all* Bitmapsets in the plan trees > > to be Nodes, but maybe 0003 can be expanded to do that. We won't need > > to make gen_node_support.pl emit *_BITMAPSET_FIELD() blurbs then; can > > just use *_NODE_FIELD(). > > Seeing that on 64-bit platforms we have a 4-byte padding gap in the > Bitmapset struct, sticking a node tag in there seems pretty sensible. > So turning Bitmapset into a proper Node and then making the other > adjustments you describe makes sense to me. > > Making a new thread about this might be best. > > (I can't currently comment on the rest of the patch set. So I don't > know if you'll really end up needing lists of bitmapsets. But from here > it looks like turning bitmapsets into nodes might be a worthwhile change > just by itself.) Ok, thanks. I'll start a new thread about it. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: SI-read predicate locks on materialized views
On Fri, Sep 30, 2022 at 10:12:13AM +0900, Yugo NAGATA wrote: > Thank you for comment. Do you think it can be marked as Ready for Commiter? Matviews have been discarded from needing predicate locks since 3bf3ab8 and their introduction, where there was no concurrent flavor of refresh yet. Shouldn't this patch have at least an isolation test to show the difference in terms of read-write conflicts with some serializable transactions and REFRESH CONCURRENTLY? -- Michael signature.asc Description: PGP signature
Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
On Thu, Sep 22, 2022 at 7:16 AM Bharath Rupireddy wrote: > > PSA v8 patch. I simplified the code a bit by using a fixed temporary file name (thanks Michael Paquier for the suggestion) much like the core does in XLogFileInitInternal(). If there's any left-over temp file from a crash or failure in dir_open_for_write(), that gets deleted in the next call. Please review the v9 patch further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v9-0001-Make-WAL-file-initialization-by-pg_receivewal-ato.patch Description: Binary data
RE: Add mssing test to test_decoding/meson.build
Dear Michael, > Thanks, applied. This was an oversight of 7f13ac8, and the CI accepts > the test. I confirmed your commit. Great thanks! Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Pluggable toaster
Hi hackers, Fixed warning that failed build in previous patchset. Here's rebased patch: v23-0001-toaster-interface.patch - Pluggable TOAST API interface with reference (original) TOAST mechanics - new API is introduced but reference TOAST is still left unchanged; v23-0002-toaster-default.patch - Default TOAST mechanics is re-implemented using TOAST API and is plugged in as Default Toaster; v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package For TOAST API explanation please check /src/backend/access/README.toastapi Actual GitHub branch resides at https://github.com/postgrespro/postgres/tree/toasterapi_clean Please note that because the development goes on, the actual branch contains much more than is provided here. -- Regards, Nikita Malakhov Postgres Professional https://postgrespro.ru/ v23-0003-toaster-docs.patch.gz Description: GNU Zip compressed data v23-0001-toaster-interface.patch.gz Description: GNU Zip compressed data v23-0002-toaster-default.patch.gz Description: GNU Zip compressed data
Re: possible typo for CREATE PUBLICATION description
On Thu, Oct 13, 2022 at 6:16 PM osumi.takami...@fujitsu.com wrote: > > Hi, > > > I noticed a possible typo in the doc for create publication. > This applies to PG15 as well. > Kindly have a look at the attached patch for it. > > Your typo correction LGTM. FWIW, maybe other parts of that paragraph can be tidied too. e.g. The words "actually" and "So" didn't seem needed IMO. ~ BEFORE For an INSERT ... ON CONFLICT command, the publication will publish the operation that actually results from the command. So depending on the outcome, it may be published as either INSERT or UPDATE, or it may not be published at all. SUGGESTION For an INSERT ... ON CONFLICT command, the publication will publish the operation that results from the command. Depending on the outcome, it may be published as either INSERT or UPDATE, or it may not be published at all. -- Kind Regards, Peter Smith. Fujitsu Australia
possible typo for CREATE PUBLICATION description
Hi, I noticed a possible typo in the doc for create publication. This applies to PG15 as well. Kindly have a look at the attached patch for it. Best Regards, Takamichi Osumi v1-0001-modify-a-typo.patch Description: v1-0001-modify-a-typo.patch
Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
On Wed, Oct 12, 2022 at 11:18 AM Masahiko Sawada wrote: > > Summarizing this issue, the assertion check in AssertTXNLsnOrder() > fails as reported because the current logical decoding cannot properly > handle the case where the decoding restarts from NEW_CID. Since we > don't make the association between top-level transaction and its > subtransaction while decoding NEW_CID (ie, in > SnapBuildProcessNewCid()), two transactions are created in > ReorderBuffer as top-txn and have the same LSN. This failure happens > on all supported versions. > > To fix the problem, one idea is that we make the association between > top-txn and sub-txn during that by calling ReorderBufferAssignChild(), > as Tomas proposed. On the other hand, since we don't guarantee to make > the association between the top-level transaction and its > sub-transactions until we try to decode the actual contents of the > transaction, it makes sense to me that instead of trying to solve by > making association, we need to change the code which are assuming that > it is associated. > > I've attached the patch for this idea. With the patch, we skip the > assertion checks in AssertTXNLsnOrder() until we reach the LSN at > which we start decoding the contents of transaction, ie. > start_decoding_at in SnapBuild. The minor concern is other way that > the assertion check could miss some faulty cases where two unrelated > top-transactions could have same LSN. With this patch, it will pass > for such a case. Therefore, for transactions that we skipped checking, > we do the check when we reach the LSN. > > --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -113,6 +113,15 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor buf.origptr); } +#ifdef USE_ASSERT_CHECKING + /* + * Check the order of transaction LSNs when we reached the start decoding + * LSN. See the comments in AssertTXNLsnOrder() for details. + */ + if (SnapBuildGetStartDecodingAt(ctx->snapshot_builder) == buf.origptr) + AssertTXNLsnOrder(ctx->reorder); +#endif + rmgr = GetRmgr(XLogRecGetRmid(record)); > I am not able to think how/when this check will be useful. Because we skipped assert checking only for records that are prior to start_decoding_at point, I think for those records ordering should have been checked before the restart. start_decoding_at point will be either (a) confirmed_flush location, or (b) lsn sent by client, and any record prior to that must have been processed before restart. Now, say we have commit records for multiple transactions which are after start_decoding_at but all their changes are before start_decoding_at, then we won't check their ordering at commit time but OTOH, we would have checked their ordering before restart. Isn't that sufficient? -- With Regards, Amit Kapila.