Re: Allow non-superuser to cancel superuser tasks.
At Fri, 12 Apr 2024 09:10:35 +0900, Michael Paquier wrote in > On Thu, Apr 11, 2024 at 04:55:59PM +0500, Kirill Reshke wrote: > > The test doesn't fail because pg_terminate_backend actually meets his > > point: autovac is killed. But while dying, autovac also receives > > segfault. Thats because of injections points: > > > > (gdb) bt > > #0 0x56361c3379ea in tas (lock=0x7fbcb9632224 > access memory at address 0x7fbcb9632224>) at > > ../../../../src/include/storage/s_lock.h:228 > > #1 ConditionVariableCancelSleep () at condition_variable.c:238 ... > > #3 0x56361c330a40 in CleanupProcSignalState (status=, arg=) at procsignal.c:240 > > #4 0x56361c328801 in shmem_exit (code=code@entry=1) at ipc.c:276 > > #9 0x56361c3378d7 in ConditionVariableTimedSleep > > (cv=0x7fbcb9632224, timeout=timeout@entry=-1, ... > I can see this stack trace as well. Capturing a bit more than your > own stack, this is crashing in the autovacuum worker while waiting on > a condition variable when processing a ProcessInterrupts(). > > That may point to a legit bug with condition variables in this > context, actually? From what I can see, issuing a signal on a backend > process waiting with a condition variable is able to process the > interrupt correctly. ProcSignalInit sets up CleanupProcSignalState to be called via on_shmem_exit. If the CV is allocated in a dsm segment, shmem_exit should have detached the region for the CV. CV cleanup code should be invoked via before_shmem_exit. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: NLS doesn't work for pg_combinebackup
At Tue, 9 Apr 2024 15:00:27 +0900, Michael Paquier wrote in > I've checked the whole tree, and the two you are pointing at are the > only incorrect paths. So applied, thanks! Thank you for cross-checking and committing! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: NLS doesn't work for pg_combinebackup
At Mon, 08 Apr 2024 16:27:02 +0900 (JST), Kyotaro Horiguchi wrote in > Hello. > > I noticed that NLS doesn't work for pg_combinebackup. The cause is > that the tool forgets to call set_pglocale_pgservice(). > > This issue is fixed by the following chage. > > diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c > b/src/bin/pg_combinebackup/pg_combinebackup.c > index 1b07ca3fb6..2788c78fdd 100644 > --- a/src/bin/pg_combinebackup/pg_combinebackup.c > +++ b/src/bin/pg_combinebackup/pg_combinebackup.c > @@ -154,6 +154,7 @@ main(int argc, char *argv[]) > > pg_logging_init(argv[0]); > progname = get_progname(argv[0]); > + set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_combinebackup")); > handle_help_version_opts(argc, argv, progname, help); > > memset(, 0, sizeof(opt)); Forgot to mention, but pg_walsummary has the same issue. diff --git a/src/bin/pg_walsummary/pg_walsummary.c b/src/bin/pg_walsummary/pg_walsummary.c index 5e41b376d7..daf6cd14ce 100644 --- a/src/bin/pg_walsummary/pg_walsummary.c +++ b/src/bin/pg_walsummary/pg_walsummary.c @@ -67,6 +67,7 @@ main(int argc, char *argv[]) pg_logging_init(argv[0]); progname = get_progname(argv[0]); + set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_walsummary")); handle_help_version_opts(argc, argv, progname, help); /* process command-line options */ regards. -- Kyotaro Horiguchi NTT Open Source Software Center
NLS doesn't work for pg_combinebackup
Hello. I noticed that NLS doesn't work for pg_combinebackup. The cause is that the tool forgets to call set_pglocale_pgservice(). This issue is fixed by the following chage. diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c index 1b07ca3fb6..2788c78fdd 100644 --- a/src/bin/pg_combinebackup/pg_combinebackup.c +++ b/src/bin/pg_combinebackup/pg_combinebackup.c @@ -154,6 +154,7 @@ main(int argc, char *argv[]) pg_logging_init(argv[0]); progname = get_progname(argv[0]); + set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_combinebackup")); handle_help_version_opts(argc, argv, progname, help); memset(, 0, sizeof(opt)); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: remaining sql/json patches
At Fri, 22 Mar 2024 11:44:08 +0900, Amit Langote wrote in > Thanks for the heads up. > > My bad, will push a fix shortly. No problem. Thank you for the prompt correction. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: remaining sql/json patches
At Wed, 20 Mar 2024 21:53:52 +0900, Amit Langote wrote in > I'll push 0001 tomorrow. This patch (v44-0001-Add-SQL-JSON-query-functions.patch) introduced the following new erro message: +errmsg("can only specify constant, non-aggregate" + " function, or operator expression for" + " DEFAULT"), I believe that our convention here is to write an error message in a single string literal, not split into multiple parts, for better grep'ability. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Inconsistent printf placeholders
Thank you for looking this. At Tue, 19 Mar 2024 10:50:23 +0100, Peter Eisentraut wrote in > On 15.03.24 08:20, Kyotaro Horiguchi wrote: > > diff --git a/src/backend/access/transam/twophase.c > > b/src/backend/access/transam/twophase.c > > @@ -1369,8 +1369,8 @@ ReadTwoPhaseFile(TransactionId xid, bool > > missing_ok) > > errmsg("could not read file \"%s\": > > %m", path))); > > else > > ereport(ERROR, > > - (errmsg("could not read file \"%s\": > > - read %d of %lld", > > - path, r, (long long > > - int) stat.st_size))); > > + (errmsg("could not read file \"%s\": read %zd of %zu", > > + path, r, (Size) stat.st_size))); > > } > > pgstat_report_wait_end(); > > This might be worse, because stat.st_size is of type off_t, which > could be smaller than Size/size_t. I think you were trying to mention that off_t could be wider than size_t and you're right in that point. I thought that it is safe since we are trying to read the whole content of the file at once here into palloc'ed memory. However, on second thought, if st_size is out of the range of ssize_t, and palloc accepts that size, at least on Linux, read(2) reads only 0x7000 bytes and raches the error reporting. Addition to that, this size was closer to the memory allocation size limit than I had thought. As the result, I removed the change. However, I kept the change of the type of variable "r" and corresponding placeholder %zd. > > diff --git a/src/backend/libpq/be-secure-gssapi.c > > b/src/backend/libpq/be-secure-gssapi.c > > index bc04e78abb..68645b4519 100644 > > --- a/src/backend/libpq/be-secure-gssapi.c > > +++ b/src/backend/libpq/be-secure-gssapi.c > > @@ -572,9 +572,9 @@ secure_open_gssapi(Port *port) > > if (input.length > PQ_GSS_RECV_BUFFER_SIZE) > > { > > ereport(COMMERROR, > > - (errmsg("oversize GSSAPI packet sent > > - by the client (%zu > %d)", > > + (errmsg("oversize GSSAPI packet sent by the client (%zu > %zu)", > > (size_t) input.length, > > - > > PQ_GSS_RECV_BUFFER_SIZE))); > > + (size_t) PQ_GSS_RECV_BUFFER_SIZE))); > > return -1; > > } > > > > Might be better to add that cast to the definition of > PQ_GSS_RECV_BUFFER_SIZE instead, so that all code can benefit. As far as I see, the only exceptional uses I found were a comparison with int values, and being passed as an OM_uint32 (to one of the parameters of gss_wrap_size_limit()). Therefore, I agree that it is beneficial. By the way, we currently define Size as the same as size_t (since 1998). Is it correct to consider Size as merely for backward compatibility and we should use size_t for new code? I used size_t in the modified part in the attached patch. > > diff --git a/src/backend/replication/repl_gram.y > > b/src/backend/replication/repl_gram.y > > index 7474f5bd67..baa76280b9 100644 > > --- a/src/backend/replication/repl_gram.y > > +++ b/src/backend/replication/repl_gram.y > > @@ -312,11 +312,6 @@ timeline_history: > > { > > TimeLineHistoryCmd *cmd; > > - if ($2 <= 0) > > - ereport(ERROR, > > - > > (errcode(ERRCODE_SYNTAX_ERROR), > > -errmsg("invalid > > -timeline %u", > > -$2))); > > - ... > I don't think this is correct. It loses the check for == 0. Ugh. It's my mistake. So we need to consider unifying the messages again. In walsummaryfuncs.c, %lld is required, but it's silly for the uses in repl_gram.y. Finally, I chose not to change anything here. > > diff --git a/src/backend/tsearch/to_tsany.c > > b/src/backend/tsearch/to_tsany.c > > index 88cba58cba..9d21178107 100644 > > --- a/src/backend/tsearch/to_tsany.c > > +++ b/src/backend/tsearch/to_tsany.c > > @@ -191,7 +191,8 @@ make_tsvector(Pa
Re: Add new error_action COPY ON_ERROR "log"
At Fri, 15 Mar 2024 16:57:25 +0900, Michael Paquier wrote in > On Wed, Mar 13, 2024 at 07:32:37PM +0530, Bharath Rupireddy wrote: > > I'm attaching the v8 patch set implementing the above idea. With this, > > [1] is sent to the client, [2] is sent to the server log. This > > approach not only reduces the duplicate info in the NOTICE and CONTEXT > > messages, but also makes it easy for users to get all the necessary > > info in the NOTICE message without having to set extra parameters to > > get CONTEXT message. > > In terms of eliminating the information duplication while allowing > clients to know the value, the attribute and the line involved in the > conversion failure, the approach of tweaking the context information > has merits, I guess. > > How do others feel about this kind of tuning? If required, I think that we have already included the minimum information necessary for the primary diagnosis, including locations, within the primary messages. Here is an example: > LOG: 0: invalid record length at 0/18049F8: expected at least 24, got 0 And I believe that CONTEXT, if it exists, is augmentation information to the primary messages. The objective of the precedent for the use of relname_only was somewhat different, but this use also seems legit. In short, I think the distribution between message types (primary and context) is fine as it is in the latest patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Inconsistent printf placeholders
At Fri, 15 Mar 2024 16:20:27 +0900 (JST), Kyotaro Horiguchi wrote in > I checked for that kind of msgids in a bit more intensive way. The > numbers are the line numbers in ja.po of backend. I didn't check the > same for other modules. > > > ###: invalid timeline %lld > >@ backup/walsummaryfuncs.c:95 > > invalid timeline %u > >@ repl_gram.y:318 repl_gram.y:359 "The numbers are the line numbers in ja.po" is wrong. Correctly, it should be written as: The information consists of two lines: the first of them is the msgid, and the second indicates the locations where they appear. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Inconsistent printf placeholders
At Fri, 15 Mar 2024 16:01:28 +1300, David Rowley wrote in > On Fri, 15 Mar 2024 at 15:27, Kyotaro Horiguchi > wrote: > > I have considered only the two messages. Actually, buffile.c and md.c > > are already like that. The attached aligns the messages in > > pg_combinebackup.c and reconstruct.c with the precedents. > > This looks like a worthy cause to make translator work easier. > > I don't want to widen the goalposts or anything, but just wondering if > you'd searched for any others that could get similar treatment? > > I only just had a quick look at the following. > > $ cat src/backend/po/fr.po | grep -E "^msgid\s" | sed -E > 's/%[a-zA-Z]+/\%/g' | sort | uniq -d -c > 31 msgid "" > 2 msgid "could not accept SSL connection: %" > 2 msgid "could not initialize LDAP: %" > 2 msgid "could not look up local user ID %: %" ... > I've not looked at how hard it would be with any of the above to > determine how hard it would be to make the formats consistent. The > 3rd last one seems similar enough that it might be worth doing > together with this? I checked for that kind of msgids in a bit more intensive way. The numbers are the line numbers in ja.po of backend. I didn't check the same for other modules. > ###: invalid timeline %lld >@ backup/walsummaryfuncs.c:95 > invalid timeline %u >@ repl_gram.y:318 repl_gram.y:359 In the first case, the %lld can be more than 2^31. In the second case, %u is uint32. However, the bigger issue is checking if the uint32 value is negative: repl_gram.c: 147 > if ((yyvsp[0].uintval) <= 0) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), >errmsg("invalid timeline %u", > (yyvsp[0].uintval; which cannot happen. I think we can simply remove the useless error check. > ###: could not read file \"%s\": read %d of %zu >@ ../common/controldata_utils.c:116 ../common/controldata_utils.c:119 > access/transam/xlog.c:3417 access/transam/xlog.c:4278 > replication/logical/origin.c:750 replication/logical/origin.c:789 > replication/logical/snapbuild.c:2040 replication/slot.c:2218 > replication/slot.c:2259 replication/walsender.c:660 > utils/cache/relmapper.c:833 > could not read file \"%s\": read %d of %lld >@ access/transam/twophase.c:1372 > could not read file \"%s\": read %zd of %zu >@ backup/basebackup.c:2102 > ###: oversize GSSAPI packet sent by the client (%zu > %zu) >@ libpq/be-secure-gssapi.c:351 > oversize GSSAPI packet sent by the client (%zu > %d) >@ libpq/be-secure-gssapi.c:575 > ###: compressed segment file \"%s\" has incorrect uncompressed size %d, > skipping >@ pg_receivewal.c:362 > compressed segment file \"%s\" has incorrect uncompressed size %zu, > skipping >@ pg_receivewal.c:448 > ###: could not read file \"%s\": read only %zd of %lld bytes >@ pg_combinebackup.c:1304 > could not read file \"%s\": read only %d of %u bytes >@ reconstruct.c:514 We can "fix" them the same way. I debated whether to use ssize_t for read() and replace all instances of size_t with Size. However, in the end, I decided to only keep it consistent with the surrounding code. > ###: index %d out of valid range, 0..%d >@ utils/adt/varlena.c:3220 utils/adt/varlena.c:3287 > index %lld out of valid range, 0..%lld >@ utils/adt/varlena.c:3251 utils/adt/varlena.c:3323 > ###: string is too long for tsvector (%d bytes, max %d bytes) >@ tsearch/to_tsany.c:194 utils/adt/tsvector.c:277 > utils/adt/tsvector_op.c:1126 > string is too long for tsvector (%ld bytes, max %ld bytes) >@ utils/adt/tsvector.c:223 We can unify them and did in the attached, but I'm not sure that's sensible.. > ###: could not look up local user ID %d: %s >@ ../port/user.c:43 ../port/user.c:79 > could not look up local user ID %ld: %s >@ libpq/auth.c:1886 Both of the above use uid_t (defined as int) as user ID and it is explicitly cast to "int" in the first case and to long in the second, which seems mysterious. Although I'm not sure if there's a possibility of uid_t being widened in the future, I unified them to the latter way for robustness. > ###: error while cloning relation \"%s.%s\" (\"%s\" to \"%s\"): %m >@ file.c:44 > error while cloning relation \"%s.%s\" (\"%s\" to \"%s\"): %s >@ file.c:65 > > The latter seems to be changed to %m by reassiging save_
Re: Typos in reorderbuffer.c.
At Thu, 14 Mar 2024 11:23:38 +0530, Amit Kapila wrote in > On Thu, Mar 14, 2024 at 9:58 AM Kyotaro Horiguchi > wrote: > > > > While examining reorderbuffer.c, I found several typos. I'm not sure > > if fixing them is worthwhile, but I've attached a fix just in case. > > > > LGTM. I'll push this in some time. Thanks! -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Inconsistent printf placeholders
Thank you for the suggestions. At Thu, 14 Mar 2024 13:45:41 +0100, Daniel Gustafsson wrote in > I've only skimmed this so far but +1 on keeping the messages the same where > possible to reduce translation work. Adding a comment on the message where > the > casting is done to indicate that it is for translation might reduce the risk > of > it "getting fixed" down the line. Added a comment "/* cast xxx to avoid extra translatable messages */. At Thu, 14 Mar 2024 14:02:46 +0100, Peter Eisentraut wrote in > If you want to make them uniform, then I suggest the error messages Yeah. Having the same messages with only the placeholders changed is not very pleasing during translation. If possible, I would like to align them. > should both be "%zd of %zu bytes", which are the actual types read() > deals with. I have considered only the two messages. Actually, buffile.c and md.c are already like that. The attached aligns the messages in pg_combinebackup.c and reconstruct.c with the precedents. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c index 6f0814d9ac..feb4d5dcf4 100644 --- a/src/bin/pg_combinebackup/pg_combinebackup.c +++ b/src/bin/pg_combinebackup/pg_combinebackup.c @@ -1301,8 +1301,9 @@ slurp_file(int fd, char *filename, StringInfo buf, int maxlen) if (rb < 0) pg_fatal("could not read file \"%s\": %m", filename); else - pg_fatal("could not read file \"%s\": read only %zd of %lld bytes", - filename, rb, (long long int) st.st_size); + /* cast st_size to avoid extra translatable messages */ + pg_fatal("could not read file \"%s\": read only %zd of %zu bytes", + filename, rb, (size_t) st.st_size); } /* Adjust buffer length for new data and restore trailing-\0 invariant */ diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c index 41f06bb26b..a4badb90e2 100644 --- a/src/bin/pg_combinebackup/reconstruct.c +++ b/src/bin/pg_combinebackup/reconstruct.c @@ -504,15 +504,16 @@ make_rfile(char *filename, bool missing_ok) static void read_bytes(rfile *rf, void *buffer, unsigned length) { - int rb = read(rf->fd, buffer, length); + ssize_t rb = read(rf->fd, buffer, length); if (rb != length) { if (rb < 0) pg_fatal("could not read file \"%s\": %m", rf->filename); else - pg_fatal("could not read file \"%s\": read only %d of %u bytes", - rf->filename, rb, length); + /* cast length to avoid extra translatable messages */ + pg_fatal("could not read file \"%s\": read only %zd of %zu bytes", + rf->filename, rb, (size_t) length); } }
Typos in reorderbuffer.c.
Hello. While examining reorderbuffer.c, I found several typos. I'm not sure if fixing them is worthwhile, but I've attached a fix just in case. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 001f901ee6..92cf39ff74 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -32,7 +32,7 @@ * * In order to cope with large transactions - which can be several times as * big as the available memory - this module supports spooling the contents - * of a large transactions to disk. When the transaction is replayed the + * of large transactions to disk. When the transaction is replayed the * contents of individual (sub-)transactions will be read from disk in * chunks. * @@ -3078,10 +3078,10 @@ ReorderBufferImmediateInvalidation(ReorderBuffer *rb, uint32 ninvalidations, * least once for every xid in XLogRecord->xl_xid (other places in records * may, but do not have to be passed through here). * - * Reorderbuffer keeps some datastructures about transactions in LSN order, + * Reorderbuffer keeps some data structures about transactions in LSN order, * for efficiency. To do that it has to know about when transactions are seen * first in the WAL. As many types of records are not actually interesting for - * logical decoding, they do not necessarily pass though here. + * logical decoding, they do not necessarily pass through here. */ void ReorderBufferProcessXid(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn) @@ -3513,11 +3513,11 @@ ReorderBufferLargestTXN(ReorderBuffer *rb) * snapshot because we don't decode such transactions. Also, we do not select * the transaction which doesn't have any streamable change. * - * Note that, we skip transactions that contains incomplete changes. There + * Note that, we skip transactions that contain incomplete changes. There * is a scope of optimization here such that we can select the largest * transaction which has incomplete changes. But that will make the code and * design quite complex and that might not be worth the benefit. If we plan to - * stream the transactions that contains incomplete changes then we need to + * stream the transactions that contain incomplete changes then we need to * find a way to partially stream/truncate the transaction changes in-memory * and build a mechanism to partially truncate the spilled files. * Additionally, whenever we partially stream the transaction we need to @@ -4701,8 +4701,8 @@ ReorderBufferToastAppendChunk(ReorderBuffer *rb, ReorderBufferTXN *txn, } /* - * Rejigger change->newtuple to point to in-memory toast tuples instead to - * on-disk toast tuples that may not longer exist (think DROP TABLE or VACUUM). + * Rejigger change->newtuple to point to in-memory toast tuples instead of + * on-disk toast tuples that may no longer exist (think DROP TABLE or VACUUM). * * We cannot replace unchanged toast tuples though, so those will still point * to on-disk toast data. @@ -4713,7 +4713,7 @@ ReorderBufferToastAppendChunk(ReorderBuffer *rb, ReorderBufferTXN *txn, * at unexpected times. * * We simply subtract size of the change before rejiggering the tuple, and - * then adding the new size. This makes it look like the change was removed + * then add the new size. This makes it look like the change was removed * and then added back, except it only tweaks the accounting info. * * In particular it can't trigger serialization, which would be pointless
Inconsistent printf placeholders
Hello. A recent commit 6612185883 introduced two error messages that are identical in text but differ in their placeholders. - pg_fatal("could not read file \"%s\": read only %d of %d bytes", -filename, (int) rb, (int) st.st_size); + pg_fatal("could not read file \"%s\": read only %zd of %lld bytes", +filename, rb, (long long int) st.st_size); ... - pg_fatal("could not read file \"%s\": read only %d of %d bytes", + pg_fatal("could not read file \"%s\": read only %d of %u bytes", rf->filename, rb, length); I'd be happy if the two messages kept consistency. I suggest aligning types instead of making the messages different, as attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c index 41f06bb26b..e3b8e84289 100644 --- a/src/bin/pg_combinebackup/reconstruct.c +++ b/src/bin/pg_combinebackup/reconstruct.c @@ -504,15 +504,15 @@ make_rfile(char *filename, bool missing_ok) static void read_bytes(rfile *rf, void *buffer, unsigned length) { - int rb = read(rf->fd, buffer, length); + ssize_t rb = read(rf->fd, buffer, length); if (rb != length) { if (rb < 0) pg_fatal("could not read file \"%s\": %m", rf->filename); else - pg_fatal("could not read file \"%s\": read only %d of %u bytes", - rf->filename, rb, length); + pg_fatal("could not read file \"%s\": read only %zd of %lld bytes", + rf->filename, rb, (long long int) length); } }
Re: Infinite loop in XLogPageRead() on standby
At Mon, 11 Mar 2024 16:43:32 +0900 (JST), Kyotaro Horiguchi wrote in > Oh, I once saw the fix work, but seems not to be working after some > point. The new issue was a corruption of received WAL records on the > first standby, and it may be related to the setting. I identified the cause of the second issue. When I tried to replay the issue, the second standby accidentally received the old timeline's last page-spanning record till the end while the first standby was promoting (but it had not been read by recovery). In addition to that, on the second standby, there's a time window where the timeline increased but the first segment of the new timeline is not available yet. In this case, the second standby successfully reads the page-spanning record in the old timeline even after the second standby noticed that the timeline ID has been increased, thanks to the robustness of XLogFileReadAnyTLI(). I think the primary change to XLogPageRead that I suggested is correct (assuming the use of wal_segment_size instead of the constant). However, still XLogFileReadAnyTLI() has a chance to read the segment from the old timeline after the second standby notices a timeline switch, leading to the second issue. The second issue was fixed by preventing XLogFileReadAnyTLI from reading segments from older timelines than those suggested by the latest timeline history. (In other words, disabling the "AnyTLI" part). I recall that there was a discussion for commit 4bd0ad9e44, about the objective of allowing reading segments from older timelines than the timeline history suggests. In my faint memory, we concluded to postpone making the decision to remove the feature due to uncertainity about the objective. If there's no clear reason to continue using XLogFileReadAnyTLI(), I suggest we stop its use and instead adopt XLogFileReadOnTLHistory(), which reads segments that align precisely with the timeline history. Of course, regardless of the changes above, if recovery on the second standby had reached the end of the page-spanning record before redirection to the first standby, it would need pg_rewind to connect to the first standby. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Infinite loop in XLogPageRead() on standby
At Wed, 6 Mar 2024 11:34:29 +0100, Alexander Kukushkin wrote in > Hmm, I think you meant to use wal_segment_size, because 0x10 is just > 1MB. As a result, currently it works for you by accident. Oh, I once saw the fix work, but seems not to be working after some point. The new issue was a corruption of received WAL records on the first standby, and it may be related to the setting. > > Thus, I managed to reproduce precisely the same situation as you > > described utilizing your script with modifications and some core > > tweaks, and with the change above, I saw that the behavior was > > fixed. However, for reasons unclear to me, it shows another issue, and > > I am running out of time and need more caffeine. I'll continue > > investigating this tomorrow. > > > > Thank you for spending your time on it! You're welcome, but I aplogize for the delay in the work.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Infinite loop in XLogPageRead() on standby
At Tue, 5 Mar 2024 09:36:44 +0100, Alexander Kukushkin wrote in > Please find attached the patch fixing the problem and the updated TAP test > that addresses Nit. Record-level retries happen when the upper layer detects errors. In my previous mail, I cited code that is intended to prevent this at segment boundaries. However, the resulting code applies to all page boundaries, since we judged that the difference doen't significanty affects the outcome. > * Check the page header immediately, so that we can retry immediately if > * it's not valid. This may seem unnecessary, because ReadPageInternal() > * validates the page header anyway, and would propagate the failure up to So, the following (tentative) change should also work. xlogrecovery.c: @@ -3460,8 +3490,10 @@ retry: * responsible for the validation. */ if (StandbyMode && + targetPagePtr % 0x10 == 0 && !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf)) { Thus, I managed to reproduce precisely the same situation as you described utilizing your script with modifications and some core tweaks, and with the change above, I saw that the behavior was fixed. However, for reasons unclear to me, it shows another issue, and I am running out of time and need more caffeine. I'll continue investigating this tomorrow. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: initdb's -c option behaves wrong way?
At Mon, 4 Mar 2024 09:39:39 +0100, Daniel Gustafsson wrote in > > > > On 4 Mar 2024, at 02:01, Tom Lane wrote: > > > > Daniel Gustafsson writes: > >> I took the liberty to add this to the upcoming CF to make sure we don't > >> lose > >> track of it. > > > > Thanks for doing that, because the cfbot pointed out a problem: > > I should have written pg_strncasecmp not strncasecmp. If this > > version tests cleanly, I'll push it. > > +1, LGTM. Thank you for fixing this, Tom! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Infinite loop in XLogPageRead() on standby
At Fri, 01 Mar 2024 12:37:55 +0900 (JST), Kyotaro Horiguchi wrote in > Anyway, our current policy here is to avoid record-rereads beyond > source switches. However, fixing this seems to require that source > switches cause record rereads unless some additional information is > available to know if replay LSN needs to back up. It seems to me that the error messages are related to commit 0668719801. XLogPageRead: > * Check the page header immediately, so that we can retry immediately if > * it's not valid. This may seem unnecessary, because ReadPageInternal() > * validates the page header anyway, and would propagate the failure up to > * ReadRecord(), which would retry. However, there's a corner case with > * continuation records, if a record is split across two pages such that > * we would need to read the two pages from different sources. For ... > if (StandbyMode && > !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, > readBuf)) > { > /* >* Emit this error right now then retry this page immediately. > Use >* errmsg_internal() because the message was already translated. >*/ > if (xlogreader->errormsg_buf[0]) > ereport(emode_for_corrupt_record(emode, > xlogreader->EndRecPtr), > (errmsg_internal("%s", > xlogreader->errormsg_buf))); This code intends to prevent a page header error from causing a record reread, when a record is required to be read from multiple sources. We could restrict this to only fire at segment boundaries. At segment boundaries, we won't let LSN back up by using XLP_FIRST_IS_CONTRECORD. Having thought up to this point, I now believe that we should completely prevent LSN from going back in any case. One drawback is that the fix cannot be back-patched. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Infinite loop in XLogPageRead() on standby
At Fri, 01 Mar 2024 12:04:31 +0900 (JST), Kyotaro Horiguchi wrote in > At Fri, 01 Mar 2024 10:29:12 +0900 (JST), Kyotaro Horiguchi > wrote in > > After reading this, I came up with a possibility that walreceiver > > recovers more quickly than the calling interval to > > WaitForWALtoBecomeAvailable(). If walreceiver disconnects after a call > > to the function WaitForWAL...(), and then somehow recovers the > > connection before the next call, the function doesn't notice the > > disconnection and returns XLREAD_SUCCESS wrongly. If this assumption > > is correct, the correct fix might be for us to return XLREAD_FAIL when > > reconnection happens after the last call to the WaitForWAL...() > > function. > > That's my stupid. The function performs reconnection by itself. Anyway, our current policy here is to avoid record-rereads beyond source switches. However, fixing this seems to require that source switches cause record rereads unless some additional information is available to know if replay LSN needs to back up. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Infinite loop in XLogPageRead() on standby
At Fri, 01 Mar 2024 10:29:12 +0900 (JST), Kyotaro Horiguchi wrote in > After reading this, I came up with a possibility that walreceiver > recovers more quickly than the calling interval to > WaitForWALtoBecomeAvailable(). If walreceiver disconnects after a call > to the function WaitForWAL...(), and then somehow recovers the > connection before the next call, the function doesn't notice the > disconnection and returns XLREAD_SUCCESS wrongly. If this assumption > is correct, the correct fix might be for us to return XLREAD_FAIL when > reconnection happens after the last call to the WaitForWAL...() > function. That's my stupid. The function performs reconnection by itself. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Infinite loop in XLogPageRead() on standby
At Fri, 1 Mar 2024 08:17:04 +0900, Michael Paquier wrote in > On Thu, Feb 29, 2024 at 05:44:25PM +0100, Alexander Kukushkin wrote: > > On Thu, 29 Feb 2024 at 08:18, Kyotaro Horiguchi > > wrote: > >> In the first place, it's important to note that we do not guarantee > >> that an async standby can always switch its replication connection to > >> the old primary or another sibling standby. This is due to the > >> variations in replication lag among standbys. pg_rewind is required to > >> adjust such discrepancies. > > > > Sure, I know. But in this case the async standby received and flushed > > absolutely the same amount of WAL as the promoted one. > > Ugh. If it can happen transparently to the user without the user > knowing directly about it, that does not sound good to me. I did not > look very closely at monitoring tools available out there, but if both > standbys flushed the same WAL locations a rewind should not be > required. It is not something that monitoring tools would be able to > detect because they just look at LSNs. > > >> In the repro script, the replication connection of the second standby > >> is switched from the old primary to the first standby after its > >> promotion. After the switching, replication is expected to continue > >> from the beginning of the last replayed segment. > > > > Well, maybe, but apparently the standby is busy trying to decode a record > > that spans multiple pages, and it is just infinitely waiting for the next > > page to arrive. Also, the restart "fixes" the problem, because indeed it is > > reading the file from the beginning. > > What happens if the continuation record spawns across multiple segment > files boundaries in this case? We would go back to the beginning of > the segment where the record spawning across multiple segments began, > right? (I may recall this part of contrecords incorrectly, feel free > to correct me if necessary.) After reading this, I came up with a possibility that walreceiver recovers more quickly than the calling interval to WaitForWALtoBecomeAvailable(). If walreceiver disconnects after a call to the function WaitForWAL...(), and then somehow recovers the connection before the next call, the function doesn't notice the disconnection and returns XLREAD_SUCCESS wrongly. If this assumption is correct, the correct fix might be for us to return XLREAD_FAIL when reconnection happens after the last call to the WaitForWAL...() function. > >> But with the script, > >> the second standby copies the intentionally broken file, which differs > >> from the data that should be received via streaming. > > > > As I already said, this is a simple way to emulate the primary crash while > > standbys receiving WAL. > > It could easily happen that the record spans on multiple pages is not fully > > received and flushed. > > I think that's OK to do so at test level to force a test in the > backend, FWIW, because that's cheaper, and 039_end_of_wal.pl has > proved that this can be designed to be cheap and stable across the > buildfarm fleet. Yeah, I agree that it clearly illustrates the final state after the issue happened, but if my assumption above is correct, the test doesn't manifest the real issue. > For anything like that, the result had better have solid test > coverage, where perhaps we'd better refactor some of the routines of > 039_end_of_wal.pl into a module to use them here, rather than > duplicate the code. The other test has a few assumptions with the > calculation of page boundaries, and I'd rather not duplicate that > across the tree. I agree to the point. > Seeing that Alexander is a maintainer of Patroni, which is very > probably used by his employer across a large set of PostgreSQL > instances, if he says that he's seen that in the field, that's good > enough for me to respond to the problem, especially if reconnecting a > standby to a promoted node where both flushed the same LSN. Now the > level of response also depends on the invasiness of the change, and we > need a very careful evaluation here. I don't mean to say that we should respond with DNF to this "issue" at all. I simply wanted to make the real issue clear. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Infinite loop in XLogPageRead() on standby
At Thu, 29 Feb 2024 14:05:15 +0900, Michael Paquier wrote in > On Wed, Feb 28, 2024 at 11:19:41AM +0100, Alexander Kukushkin wrote: > > I spent some time debugging an issue with standby not being able to > > continue streaming after failover. > > > > The problem happens when standbys received only the first part of the WAL > > record that spans multiple pages. > > In this case the promoted standby discards the first part of the WAL record > > and writes END_OF_RECOVERY instead. If in addition to that someone will > > call pg_switch_wal(), then there are chances that SWITCH record will also > > fit to the page where the discarded part was settling, As a result the > > other standby (that wasn't promoted) will infinitely try making attempts to > > decode WAL record span on multiple pages by reading the next page, which is > > filled with zero bytes. And, this next page will never be written, because > > the new primary will be writing to the new WAL file after pg_switch_wal(). In the first place, it's important to note that we do not guarantee that an async standby can always switch its replication connection to the old primary or another sibling standby. This is due to the variations in replication lag among standbys. pg_rewind is required to adjust such discrepancies. I might be overlooking something, but I don't understand how this occurs without purposefully tweaking WAL files. The repro script pushes an incomplete WAL file to the archive as a non-partial segment. This shouldn't happen in the real world. In the repro script, the replication connection of the second standby is switched from the old primary to the first standby after its promotion. After the switching, replication is expected to continue from the beginning of the last replayed segment. But with the script, the second standby copies the intentionally broken file, which differs from the data that should be received via streaming. A similar problem to the issue here was seen at segment boundaries, before we introduced the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag, which prevents overwriting a WAL file that is already archived. However, in this case, the second standby won't see the broken record because it cannot be in a non-partial segment in the archive, and the new primary streams END_OF_RECOVERY instead of the broken record. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
At Tue, 27 Feb 2024 18:33:18 +0100, Alvaro Herrera wrote in > Here's the complete set, with these two names using the singular. The commit by the second patch added several GUC descriptions: > Sets the size of the dedicated buffer pool used for the commit timestamp > cache. Some of them, commit_timestamp_buffers, transaction_buffers, subtransaction_buffers use 0 to mean auto-tuning based on shared-buffer size. I think it's worth adding an extra_desc such as "0 to automatically determine this value based on the shared buffer size". regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: MultiXact\SLRU buffers configuration
At Sat, 3 Feb 2024 22:32:45 +0500, "Andrey M. Borodin" wrote in > Here's the test draft. This test reliably reproduces sleep on CV when waiting > next multixact to be filled into "members" SLRU. By the way, I raised a question about using multiple CVs simultaneously [1]. That is, I suspect that the current CV implementation doesn't allow us to use multiple condition variables at the same time, because all CVs use the same PCPROC member cvWaitLink to accommodate different waiter sets. If this assumption is correct, we should resolve the issue before spreading more uses of CVs. [1] https://www.postgresql.org/message-id/20240227.150709.1766217736683815840.horikyota.ntt%40gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: A failure in t/001_rep_changes.pl
At Fri, 23 Feb 2024 15:50:21 +0530, vignesh C wrote in > By any chance do you have the log files when this failure occurred, if > so please share it. In my understanding, within a single instance, no two proclists can simultaneously share the same waitlink member of PGPROC. On the other hand, a publisher uses two condition variables for slots and WAL waiting, which work on the same PGPROC member cvWaitLink. I suspect this issue arises from the configuration. However, although it is unlikly related to this specific issue, a similar problem can arise in instances that function both as logical publisher and physical primary. Regardless of this issue, I think we should provide separate waitlink members for condition variables that can possibly be used simultaneously. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: About a recently-added message
At Thu, 22 Feb 2024 10:51:07 +0530, Amit Kapila wrote in > > Do you think some additional tests for the rest of the messages are > > worth the trouble? > > > > We have discussed this during development and didn't find it worth > adding tests for all misconfigured parameters. However, in the next > patch where we are planning to add a slot sync worker that will > automatically sync slots, we are adding a test for one more parameter. > I am not against adding tests for all the parameters but it didn't > appeal to add more test cycles for this. Thanks for the explanation. I'm fine with that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: About a recently-added message
At Thu, 22 Feb 2024 09:36:43 +0900 (JST), Kyotaro Horiguchi wrote in > Yes, I'm happy with all of the changes. The proposed patch appears to > cover all instances related to slotsync.c, and it looks fine to > me. Thanks! I'd like to raise another potential issue outside the patch. The patch needed to change only one test item even though it changed nine messages. This means eigh out of nine messages that the patch changed are not covered by our test. I doubt all of them are worth additional test items; however, I think we want to increase coverage. Do you think some additional tests for the rest of the messages are worth the trouble? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: About a recently-added message
At Wed, 21 Feb 2024 14:57:42 +0530, Amit Kapila wrote in > On Tue, Feb 20, 2024 at 3:21 PM shveta malik wrote: > > > > okay, attached v2 patch with changed error msgs and double quotes > > around logical. > > > > Horiguchi-San, does this address all your concerns related to > translation with these new messages? Yes, I'm happy with all of the changes. The proposed patch appears to cover all instances related to slotsync.c, and it looks fine to me. Thanks! I found that logica.c is also using the policy that I complained about, but it is a separate issue. ./logical.c 122:errmsg("logical decoding requires wal_level >= logical"))); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Have pg_basebackup write "dbname" in "primary_conninfo"?
At Tue, 20 Feb 2024 19:56:10 +0530, Robert Haas wrote in > It seems like maybe somebody should look into why this is happening, > and perhaps fix it. GetConnection()@streamutil.c wants to ensure conninfo has a fallback database name ("replication"). However, the function seems to be ignoring the case where neither dbname nor connection string is given, which is the first case Kuroda-san raised. The second case is the intended behavior of the function. > /* pg_recvlogical uses dbname only; others use connection_string only. > */ > Assert(dbname == NULL || connection_string == NULL); And the function incorrectly assumes that the connection string requires "dbname=replication". >* Merge the connection info inputs given in form of connection string, >* options and default values (dbname=replication, replication=true, > etc.) But the name is a pseudo database name only used by pg_hba.conf (maybe) , which cannot be used as an actual database name (without quotation marks, or unless it is actually created). The function should not add the fallback database name because the connection string for physical replication connection doesn't require the dbname parameter. (attached patch) About the proposed patch, pg_basebackup cannot verify the validity of the dbname. It could be problematic. Although I haven't looked the original thread, it seems that the dbname is used only by pg_sync_replication_slots(). If it is true, couldn't we make the SQL function require a database name to make a connection, instead of requiring it in physical-replication conninfo? regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c index 56d1b15951..da63a04de4 100644 --- a/src/bin/pg_basebackup/streamutil.c +++ b/src/bin/pg_basebackup/streamutil.c @@ -78,7 +78,7 @@ GetConnection(void) /* * Merge the connection info inputs given in form of connection string, -* options and default values (dbname=replication, replication=true, etc.) +* options and default values (replication=true, etc.) */ i = 0; if (connection_string) @@ -96,14 +96,6 @@ GetConnection(void) keywords = pg_malloc0((argcount + 1) * sizeof(*keywords)); values = pg_malloc0((argcount + 1) * sizeof(*values)); - /* -* Set dbname here already, so it can be overridden by a dbname in the -* connection string. -*/ - keywords[i] = "dbname"; - values[i] = "replication"; - i++; - for (conn_opt = conn_opts; conn_opt->keyword != NULL; conn_opt++) { if (conn_opt->val != NULL && conn_opt->val[0] != '\0')
Re: Do away with zero-padding assumption before WALRead()
At Mon, 19 Feb 2024 11:02:39 +0530, Bharath Rupireddy wrote in > > After all, the patch looks good to me. > > Thanks. It was committed - > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=73f0a1326608ac3a7d390706fdeec59fe4dc42c0. Yeah. I realied that after I had already sent the mail.. No harm done:p regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: A new message seems missing a punctuation
At Mon, 19 Feb 2024 10:31:33 +0530, Robert Haas wrote in > On Mon, Feb 19, 2024 at 10:10 AM Kyotaro Horiguchi > wrote: > > A recent commit (7a424ece48) added the following message: > > > > > could not sync slot information as remote slot precedes local slot: > > > remote slot "%s": LSN (%X/%X), catalog xmin (%u) local slot: LSN > > > (%X/%X), catalog xmin (%u) > > > > Since it is a bit overloaded but doesn't have a separator between > > "catalog xmin (%u)" and "local slot:", it is somewhat cluttered. Don't > > we need something, for example a semicolon there to improve > > readability and reduce clutter? > > I think maybe we should even revise the message even more. In most > places we do not just print out a whole bunch of values, but use a > sentence to connect them. Mmm. Something like this?: "could not sync slot information: local slot LSN (%X/%X) or xmin(%u) behind remote (%X/%X, %u)" Or I thought the values could be moved to DETAILS: line. By the way, the code around the message is as follows. > /* > * The remote slot didn't catch up to locally reserved position. > * > * We do not drop the slot because the restart_lsn can be ahead of the > * current location when recreating the slot in the next cycle. It may > * take more time to create such a slot. Therefore, we keep this slot > * and attempt the synchronization in the next cycle. > * > * XXX should this be changed to elog(DEBUG1) perhaps? > */ > ereport(LOG, > errmsg("could not sync slot information as remote slot precedes > local slot:" > " remote slot \"%s\": LSN (%X/%X), > catalog xmin (%u) local slot: LSN (%X/%X), catalog xmin (%u)", If we change this message to DEBUG1, a timeout feature to show a WARNING message might be needed for DBAs to notice that something bad is ongoing. However, it doesn't seem appropriate as a LOG message to me. Perhaps, a LOG message should be more like: > "LOG: waiting for local slot to catch up to remote" > "DETAIL: remote slot LSN (%X/%X); " regards. -- Kyotaro Horiguchi NTT Open Source Software Center
A new message seems missing a punctuation
A recent commit (7a424ece48) added the following message: > could not sync slot information as remote slot precedes local slot: > remote slot "%s": LSN (%X/%X), catalog xmin (%u) local slot: LSN > (%X/%X), catalog xmin (%u) Since it is a bit overloaded but doesn't have a separator between "catalog xmin (%u)" and "local slot:", it is somewhat cluttered. Don't we need something, for example a semicolon there to improve readability and reduce clutter? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Do away with zero-padding assumption before WALRead()
At Mon, 19 Feb 2024 11:56:22 +0900 (JST), Kyotaro Horiguchi wrote in > Yeah, perhaps I was overly concerned. The removed comment made me > think that someone could add code relying on the incorrect assumption > that the remaining bytes beyond the returned count are cleared out. On > the flip side, SimpleXLogPageRead always reads a whole page and > returns XLOG_BLCKSZ. However, as you know, the returned buffer doesn't > contain random garbage bytes. Therefore, it's safe as long as the Forgot to mention that there is a case involving non-initialized pages, but it doesn't affect the correctness of the description you pointed out. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Do away with zero-padding assumption before WALRead()
At Fri, 16 Feb 2024 19:50:00 +0530, Bharath Rupireddy wrote in > On Fri, Feb 16, 2024 at 7:10 AM Kyotaro Horiguchi > wrote: > > 1. It's useless to copy the whole page regardless of the 'count'. It's > > enough to copy only up to the 'count'. The patch looks good in this > > regard. > > Yes, it's not needed to copy the whole page. Per my understanding > about page transfers between disk and OS page cache - upon OS page > cache miss, the whole page (of disk block size) gets fetched from disk > even if we just read 'count' bytes (< disk block size). Right, but with a possibly-different block size. Anyway that behavior doesn't affect the result of this change. (Could affect performance hereafter if it were not the case, though..) > > 2. Maybe we need a comment that states the page_read callback > > functions leave garbage bytes beyond the returned count, due to > > partial copying without clearing the unused portion. > > Isn't the comment around page_read callback at > https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/access/xlogreader.h;h=2e9e5f43eb2de1ca9ba81afe76d21357065c61aa;hb=d57b7cc3338e9d9aa1d7c5da1b25a17c5a72dcce#l78 > enough? > > "The callback shall return the number of bytes read (never more than > XLOG_BLCKSZ), or -1 on failure." Yeah, perhaps I was overly concerned. The removed comment made me think that someone could add code relying on the incorrect assumption that the remaining bytes beyond the returned count are cleared out. On the flip side, SimpleXLogPageRead always reads a whole page and returns XLOG_BLCKSZ. However, as you know, the returned buffer doesn't contain random garbage bytes. Therefore, it's safe as long as the caller doesn't access beyond the returned count. As a result, the description you pointed out seems to be enough. After all, the patch looks good to me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Do away with zero-padding assumption before WALRead()
At Tue, 13 Feb 2024 11:47:06 +0530, Bharath Rupireddy wrote in > Hi, > > I noticed an assumption [1] at WALRead() call sites expecting the > flushed WAL page to be zero-padded after the flush LSN. I think this > can't always be true as the WAL can get flushed after determining the > flush LSN before reading it from the WAL file using WALRead(). I've > hacked the code up a bit to check if that's true - Good catch! The comment seems wrong also to me. The subsequent bytes can be written simultaneously, and it's very normal that there are unflushed bytes are in OS's page buffer. The objective of the comment seems to be to declare that there's no need to clear out the remaining bytes, here. I agree that it's not a problem for now. However, I think we need two fixes here. 1. It's useless to copy the whole page regardless of the 'count'. It's enough to copy only up to the 'count'. The patch looks good in this regard. 2. Maybe we need a comment that states the page_read callback functions leave garbage bytes beyond the returned count, due to partial copying without clearing the unused portion. > I'm wondering why the WALRead() callers are always reading XLOG_BLCKSZ > despite knowing exactly how much to read. Is it to tell the OS to > explicitly fetch the whole page from the disk? If yes, the OS will do > that anyway because the page transfers from disk to OS page cache are > always in terms of disk block sizes, no? If I understand your question correctly, I guess that the whole-page copy was expected to clear out the remaining bytes, as I mentioned earlier. > Although, there's no immediate problem with it right now, the > assumption is going to be incorrect when reading WAL from WAL buffers > using WALReadFromBuffers - > https://www.postgresql.org/message-id/CALj2ACV=C1GZT9XQRm4iN1NV1T=hla_hsgwnx2y5-g+mswd...@mail.gmail.com. > > If we have no reason, can the WALRead() callers just read how much > they want like walsender for physical replication? Attached a patch > for the change. > > Thoughts? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: About a recently-added message
At Thu, 15 Feb 2024 09:22:23 +0530, shveta malik wrote in > On Thu, Feb 15, 2024 at 8:26 AM Amit Kapila wrote: > > > > On Wed, Feb 14, 2024 at 7:51 PM Euler Taveira wrote: > > > > > > On Wed, Feb 14, 2024, at 8:45 AM, Amit Kapila wrote: > > > > > > Now, I am less clear about whether to quote "logical" or not in the > > > above message. Do you have any suggestions? > > > > > > > > > The possible confusion comes from the fact that the sentence contains > > > "must be" > > > in the middle of a comparison expression. For pg_createsubscriber, we are > > > using > > > > > > publisher requires wal_level >= logical > > > > > > I suggest to use something like > > > > > > slot synchronization requires wal_level >= logical > > > > > > > This sounds like a better idea but shouldn't we directly use this in > > 'errmsg' instead of a separate 'errhint'? Also, if change this, then > > we should make a similar change for other messages in the same > > function. > > +1 on changing the msg(s) suggested way. Please find the patch for the > same. It also removes double quotes around the variable names Thanks for the discussion. With a translator hat on, I would be happy if I could determine whether a word requires translation with minimal background information. In this case, a translator needs to know which values wal_level can take. It's relatively easy in this case, but I'm not sure if this is always the case. Therefore, I would be slightly happier if "logical" were double-quoted. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: About a recently-added message
At Wed, 14 Feb 2024 16:26:52 +0900 (JST), Kyotaro Horiguchi wrote in > > "wal_level" must be >= logical. .. > > wal_level must be set to "replica" or "logical" at server start. .. > I suggest making the quoting policy consistent. Just after this, I found another inconsistency regarding quotation. > 'dbname' must be specified in "%s". The use of single quotes doesn't seem to comply with our standard. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
About a recently-added message
A recent commit added the following message: > "wal_level" must be >= logical. The use of the term "logical" here is a bit confusing, as it's unclear whether it's meant to be a natural language word or a token. (I believe it to be a token.) On the contrary, we already have the following message: > wal_level must be set to "replica" or "logical" at server start. This has the conflicting policy about quotation of variable names and enum values. I suggest making the quoting policy consistent. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: A comment in DropRole() contradicts the actual behavior
At Thu, 8 Feb 2024 16:39:23 +0900, Michael Paquier wrote in > On Thu, Feb 08, 2024 at 09:00:01AM +0300, Alexander Lakhin wrote: > > Hello, > > > > Please look at errors, which produced by the following script, starting > > from 6566133c5: > > for i in `seq 100`; do (echo "CREATE USER u; DROP USER u;"); done | psql > > >psql-1.log 2>&1 & > > for i in `seq 100`; do (echo "CREATE USER u; DROP USER u;"); done | psql > > >psql-2.log 2>&1 & > > wait > > > > ERROR: could not find tuple for role 16387 > > ERROR: could not find tuple for role 16390 > > ERROR: could not find tuple for role 16394 > > ... > > > > Maybe these errors are expected, but then I'm confused by the comment in > > DropRole(): > > Well, these errors should never happen, IMHO, so it seems to me that > the comment is right and that the code lacks locking, contrary to what > the comment tells. The function acquires a lock, but it does not perform an existence check until it first attempts to fetch the tuple, believing in its presence. However, I doubt performing an additional existence check right after acquiring the lock is worthwhile. By the way, I saw the following error with the provided script: > ERROR: duplicate key value violates unique constraint > "pg_authid_rolname_index" > DETAIL: Key (rolname)=(u) already exists. > STATEMENT: CREATE USER u; This seems to be another instance of a similar thinko. I vaguely think that we should just regard the absence as a concurrent drop and either adjust or remove the message, then fix the comment. The situation is slightly different for the duplication case. We shouldn't ignore it but rather need to adjust the error message. As long as these behaviors don't lead to inconsistencies. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: InstallXLogFileSegment() vs concurrent WAL flush
At Fri, 2 Feb 2024 14:42:46 +0100, Thomas Munro wrote in > On Fri, Feb 2, 2024 at 12:56 PM Yugo NAGATA wrote: > > On Fri, 2 Feb 2024 11:18:18 +0100 > > Thomas Munro wrote: > > > One simple way to address that would be to make XLogFileInitInternal() > > > wait for InstallXLogFileSegment() to finish. It's a little > > > > Or, can we make sure the rename is durable by calling fsync before > > returning the fd, as a patch attached here? > > Right, yeah, that works too. I'm not sure which way is better. I'm not sure I like issuing spurious syncs unconditionally. Therefore, I prefer Thomas' approach in that regard. 0002 would be beneficial, considering the case of a very large max_wal_size, and the code seems to be the minimal required. I don't think it matters that the lock attempts occur uselessly until the first segment installation. That being said, we could avoid it by initializing last_known_installed_segno properly. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
At Mon, 5 Feb 2024 17:22:56 +0900, Yugo NAGATA wrote in > On Mon, 05 Feb 2024 11:28:59 +0900 > torikoshia wrote: > > > > Based on this, I've made a patch. > > > based on COPY Synopsis: ON_ERROR 'error_action' > > > on_error 'null', the keyword NULL should be single quoted. > > > > As you mentioned, single quotation seems a little odd.. > > > > I'm not sure what is the best name and syntax for this feature, but > > since current error_action are verbs('stop' and 'ignore'), I feel 'null' > > might not be appropriate. > > I am not in favour of using 'null' either, so I suggested to use > "set_to_null" or more generic syntax like "set_to (col, val)" in my > previous post[1], although I'm not convinced what is the best either. > > [1] > https://www.postgresql.org/message-id/20240129172858.ccb6c77c3be95a295e2b2b44%40sraoss.co.jp Tom sugggested using a separate option, and I agree with the suggestion. Taking this into consideration, I imagined something like the following, for example. Although I'm not sure we are actually going to do whole-tuple replacement, the action name in this example has the suffix '-column'. COPY (on_error 'replace-colomn', replacement 'null') .. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: More new SQL/JSON item methods
Sorry for a minor correction, but.. At Thu, 01 Feb 2024 14:53:57 +0900 (JST), Kyotaro Horiguchi wrote in > Ah.. Understood. "NaN or Infinity" cannot be used in those > cases. Additionally, for jpiBoolean and jpiBigint, we lack the text > representation of the value. This "Additionally" was merely left in by mistake. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: More new SQL/JSON item methods
At Thu, 1 Feb 2024 09:22:22 +0530, Jeevan Chalke wrote in > On Thu, Feb 1, 2024 at 7:24 AM Kyotaro Horiguchi > wrote: > > > At Thu, 01 Feb 2024 10:49:57 +0900 (JST), Kyotaro Horiguchi < > > horikyota@gmail.com> wrote in > > > By the way, while playing with this feature, I noticed the following > > > error message: > > > > > > > select jsonb_path_query('1.1' , '$.boolean()'); > > > > ERROR: numeric argument of jsonpath item method .boolean() is out of > > range for type boolean > > > > > > The error message seems a bit off to me. For example, "argument '1.1' > > > is invalid for type [bB]oolean" seems more appropriate for this > > > specific issue. (I'm not ceratin about our policy on the spelling of > > > Boolean..) > > > > Or, following our general convention, it would be spelled as: > > > > 'invalid argument for type Boolean: "1.1"' > > > > jsonpath way: Hmm. I see. > ERROR: argument of jsonpath item method .boolean() is invalid for type > boolean > > or, if we add input value, then > > ERROR: argument "1.1" of jsonpath item method .boolean() is invalid for > type boolean > > And this should work for all the error types, like out of range, not valid, > invalid input, etc, etc. Also, we don't need separate error messages for > string input as well, which currently has the following form: > > "string argument of jsonpath item method .%s() is not a valid > representation.." Agreed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: More new SQL/JSON item methods
At Thu, 1 Feb 2024 09:19:40 +0530, Jeevan Chalke wrote in > > As Tom suggested, given that similar situations have already been > > disregarded elsewhere, worrying about excessively long input strings > > in this specific instance won't notably improve safety in total. > > > > > Also, for non-string input, we need to convert numeric to string just for > > > the error message, which seems overkill. > > > > As I suggested and you seem to agree, using literally "Nan or > > Infinity" would be sufficient. > > > > I am more concerned about .bigint() and .integer(). We can have errors when > the numeric input is out of range, but not NaN or Infinity. At those > places, we need to convert numeric to string to put that value into the > error. > Do you mean we should still put "Nan or Infinity" there? > > This is the case: > select jsonb_path_query('12345678901', '$.integer()'); > ERROR: numeric argument of jsonpath item method .integer() is out of > range for type integer Ah.. Understood. "NaN or Infinity" cannot be used in those cases. Additionally, for jpiBoolean and jpiBigint, we lack the text representation of the value. By a quick grepping, I found that the following functions call numeric_out to convert the jbvNumeric values back into text representation. JsonbValueAstext, populate_scalar, iterate_jsonb_values, executeItemOptUnrwapTarget, jsonb_put_escaped_value The function iterate_jsonb_values(), in particular, iterates over a values array, calling numeric_out on each iteration. The following functions re-converts the converted numeric into another type. jsonb_int[248]() converts the numeric value into int2 using numeric_int[248](). jsonb_float[48]() converts it into float4 using numeric_float[48](). Given these facts, it seems more efficient for jbvNumber to retain the original scalar value, converting it only when necessary. If needed, we could also add a numeric struct member as a cache for better performance. I'm not sure we refer the values more than once, though. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: More new SQL/JSON item methods
At Thu, 01 Feb 2024 10:49:57 +0900 (JST), Kyotaro Horiguchi wrote in > By the way, while playing with this feature, I noticed the following > error message: > > > select jsonb_path_query('1.1' , '$.boolean()'); > > ERROR: numeric argument of jsonpath item method .boolean() is out of range > > for type boolean > > The error message seems a bit off to me. For example, "argument '1.1' > is invalid for type [bB]oolean" seems more appropriate for this > specific issue. (I'm not ceratin about our policy on the spelling of > Boolean..) Or, following our general convention, it would be spelled as: 'invalid argument for type Boolean: "1.1"' regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: More new SQL/JSON item methods
Thank you for the fix! At Tue, 30 Jan 2024 13:46:17 +0530, Jeevan Chalke wrote in > On Mon, Jan 29, 2024 at 11:03 AM Tom Lane wrote: > > > Kyotaro Horiguchi writes: > > > Having said that, I'm a bit concerned about the case where an overly > > > long string is given there. However, considering that we already have > > > "invalid input syntax for type xxx: %x" messages (including for the > > > numeric), this concern might be unnecessary. > > > > Yeah, we have not worried about that in the past. > > > > > Another concern is that the input value is already a numeric, not a > > > string. This situation occurs when the input is NaN of +-Inf. Although > > > numeric_out could be used, it might cause another error. Therefore, > > > simply writing out as "argument NaN and Infinity.." would be better. > > > > Oh! I'd assumed that we were discussing a string that we'd failed to > > convert to numeric. If the input is already numeric, then either > > the error is unreachable or what we're really doing is rejecting > > special values such as NaN on policy grounds. I would ask first > > if that policy is sane at all. (I'd lean to "not" --- if we allow > > it in the input JSON, why not in the output?) If it is sane, the > > error message needs to be far more specific. > > > > regards, tom lane > > > > *Consistent error message related to type:* ... > What if we use phrases like "for type double precision" at both places, > like: > numeric argument of jsonpath item method .%s() is out of range for type > double precision > string argument of jsonpath item method .%s() is not a valid representation > for type double precision > > With this, the rest will be like: > for type numeric > for type bigint > for type integer > > Suggestions? FWIW, I prefer consistently using "for type hoge". > --- > > *Showing input string in the error message:* > > argument "...input string here..." of jsonpath item method .%s() is out of > range for type numeric > > If we add the input string in the error, then for some errors, it will be > too big, for example: > > -ERROR: numeric argument of jsonpath item method .double() is out of range > for type double precision > +ERROR: argument > "10" > of jsonpath item method .double() is out of range for type double precision As Tom suggested, given that similar situations have already been disregarded elsewhere, worrying about excessively long input strings in this specific instance won't notably improve safety in total. > Also, for non-string input, we need to convert numeric to string just for > the error message, which seems overkill. As I suggested and you seem to agree, using literally "Nan or Infinity" would be sufficient. > On another note, irrespective of these changes, is it good to show the > given input in the error messages? Error messages are logged and may leak > some details. > > I think the existing way seems ok. In my opinion, it is quite common to include the error-causing value in error messages. Also, we have already many functions that impliy the possibility for revealing input values when converting text representation into internal format, such as with int4in. However, I don't stick to that way. > --- > > *NaN and Infinity restrictions:* > > I am not sure why NaN and Infinity are not allowed in conversion to double > precision (.double() method). I have used the same restriction for > .decimal() and .number(). However, as you said, we should have error > messages more specific. I tried that in the attached patch; please have > your views. I have the following wordings for that error message: > "NaN or Infinity is not allowed for jsonpath item method .%s()" > > Suggestions... They seem good to *me*. By the way, while playing with this feature, I noticed the following error message: > select jsonb_path_query('1.1' , '$.boolean()'); > ERROR: numeric argument of jsonpath item method .boolean() is out of range > for type boolean The error message seems a bit off to me. For example, "argument '1.1' is invalid for type [bB]oolean" seems more appropriate for this specific issue. (I'm not ceratin about our policy on the spelling of Boolean..) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
possible inter-gcc-version incompatibility regarding fat objects
Hello. I would like to share some information regarding a recent issue we encoutered. While compiling an extension against the RPM package postgresql16-devel-16.1-2PGDG.rhel9.x86_64.rpm we faced a problem that appears to be caused by the combination of LTO and constant folding/propagation, leading to a SEGV. This crash didn't occur when we gave --fno-lto instead of --flto=auto to the compiler. To put this case briefly: postgresql16-devel-16.1-2PGDG.rhel9.x86_64.rpm + gcc 11.3 => building the extension completes but the binary may crash with SEGV --fno-lto prevents this crash from occurring. The PG package is built with gcc-11.4, and the compiler in our build environment for the extension was gcc-11.3. A quick search suggests that the LTO bytecode format was optimized in gcc-11.4. Conversely, when we built the extension with postgresql-16-16.0 and gc-11.4, the build fails. This failure seems to stem LTO binary format incompatibility. > Error: bad register name `%' > Error: open CFI at the end of file; missing .cfi_endproc directive > /usr/bin/ld: error: lto-wrapper failed To put this case briefly: postgresql16-devel-16.0-1PGDG.rhel9.x86_64.rpm + gcc 11.4 => build failure regarding LTO Given these issues, it seems there might be some issues with including fat objects in the -devel package. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Network failure may prevent promotion
Thank you fixing the issue. At Tue, 23 Jan 2024 11:43:43 +0200, Heikki Linnakangas wrote i n > There's an existing AmWalReceiverProcess() macro too. Let's use that. Mmm. I sought an Is* function becuase "IsLogicalWorker()" is placed on the previous line. Our convention regarding those functions (macros) and variables seems inconsistent. However, I can't say for sure that we should unify all of them. > (See also > https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0%40iki.fi > for refactoring in this area) > > Here's a patch set summarizing the changes so far. They should be > squashed, but I kept them separate for now to help with review: > > 1. revert the revert of 728f86fec6. > 2. your walrcv_shutdown_deblocking_v2-2.patch > 3. Also replace libpqrcv_PQexec() and libpqrcv_PQgetResult() with the > wrappers from libpq-be-fe-helpers.h Both replacements look fine. I didn't find another instance of similar code. > 4. Replace IsWalReceiver() with AmWalReceiverProcess() Just look fine. > >> - pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request > >> - shutdown */ > >> + pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown > >> */ > >> > >> Can't we just use die(), instead? > > There was a comment explaining the problems associated with exiting > > within a signal handler; > > - * Currently, only SIGTERM is of interest. We can't just exit(1) within > > - * the > > - * SIGTERM signal handler, because the signal might arrive in the middle > > - * of > > - * some critical operation, like while we're holding a spinlock. > > - * Instead, the > > And I think we should keep the considerations it suggests. The patch > > removes the comment itself, but it does so because it implements our > > standard process exit procedure, which incorporates points suggested > > by the now-removed comment. > > die() doesn't call exit(1). Unless DoingCommandRead is set, but it > never is in the walreceiver. It looks just like the new > WalRcvShutdownSignalHandler() function. Am I missing something? Ugh.. Doesn't the name 'die()' suggest exit()? I agree that die() can be used instad. > Hmm, but doesn't bgworker_die() have that problem with exit(1)ing in > the signal handler? I noticed that but ignored for this time. > I also wonder if we should replace SignalHandlerForShutdownRequest() > completely with die(), in all processes? The difference is that > SignalHandlerForShutdownRequest() uses ShutdownRequestPending, while > die() uses ProcDiePending && InterruptPending to indicate that the > signal was received. Or do some of the processes want to check for > ShutdownRequestPending only at specific places, and don't want to get > terminated at the any random CHECK_FOR_INTERRUPTS()? At least, pg_log_backend_memory_context() causes a call to ProcessInterrupts via "ereport(LOG_SERVER_ONLY" which can lead to an exit due to ProcDiePending. In this regard, checkpointer clearly requires the distinction. Rather than merely consolidating the notification variables and striving to annihilate CFI calls in the execution path, I believe we need a shutdown mechanism that CFI doesn't react to. However, as for the method to achieve this, whether we should keep the notification variables separate as they are now, or whether it would be better to introduce a variable that causes CFI to ignore ProcDiePending, is a matter I think is open to discussion. Attached patches are the rebased version of v3 (0003 is updated) and additional 0005 that makes use of die() instead of walreceiver's custom function. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From daac3aa06bd33f5e8f04a406c8a59fd4cc97 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Tue, 23 Jan 2024 11:01:03 +0200 Subject: [PATCH v4 1/5] Revert "Revert "libpqwalreceiver: Convert to libpq-be-fe-helpers.h"" This reverts commit 21ef4d4d897563adb2f7920ad53b734950f1e0a4. --- .../libpqwalreceiver/libpqwalreceiver.c | 55 +++ 1 file changed, 8 insertions(+), 47 deletions(-) diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 2439733b55..dbee2f8f0e 100644 --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c @@ -24,6 +24,7 @@ #include "common/connect.h" #include "funcapi.h" #include "libpq-fe.h" +#include "libpq/libpq-be-fe-helpers.h" #include "mb/pg_wchar.h" #include "miscadmin.h" #include "pgstat.h" @@ -136,7 +137,6 @@ libpqrcv_connect(const char *conninfo, bool logical, bool must_use_password, c
Re: More new SQL/JSON item methods
At Sun, 28 Jan 2024 22:47:02 -0500, Tom Lane wrote in Kyotaro Horiguchi writes: > They seem to be suggesting that PostgreSQL has the types "decimal" and > "number". I know of the former, but I don't think PostgreSQL has the > latter type. Perhaps the "number" was intended to refer to "numeric"? Probably. But I would write just "type numeric". We do not generally acknowledge "decimal" as a separate type, because for us it's only an alias for numeric (there is not a pg_type entry for it). Also, that leads to the thought that "numeric argument ... is out of range for type numeric" seems either redundant or contradictory depending on how you look at it. So I suggest wording like argument "...input string here..." of jsonpath item method .%s() is out of range for type numeric > (And I think it is largely helpful if the given string were shown in > the error message, but it would be another issue.) Agreed, so I suggest the above. Having said that, I'm a bit concerned about the case where an overly long string is given there. However, considering that we already have "invalid input syntax for type xxx: %x" messages (including for the numeric), this concern might be unnecessary. Another concern is that the input value is already a numeric, not a string. This situation occurs when the input is NaN of +-Inf. Although numeric_out could be used, it might cause another error. Therefore, simply writing out as "argument NaN and Infinity.." would be better. Once we resolve these issues, another question arises regarding on of the messages. In the case of NaN of Infinity, the message will be as the follows: "argument NaN or Infinity of jsonpath item method .%s() is out of range for type numeric" This message appears quite strange and puzzling. I suspect that the intended message was somewhat different. > The same commit has introduced the following set of messages: >> %s format is not recognized: "%s" >> date format is not recognized: "%s" >> time format is not recognized: "%s" >> time_tz format is not recognized: "%s" >> timestamp format is not recognized: "%s" >> timestamp_tz format is not recognized: "%s" > I believe that the first line was intended to cover all the others:p +1 regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: More new SQL/JSON item methods
I have two possible issues in a recent commit. Commit 66ea94e8e6 has introduced the following messages: (Aplogizies in advance if the commit is not related to this thread.) jsonpath_exec.c:1287 > if (numeric_is_nan(num) || numeric_is_inf(num)) > RETURN_ERROR(ereport(ERROR, > (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM), > errmsg("numeric argument of jsonpath item method .%s() is out > of range for type decimal or number", >jspOperationName(jsp->type); :1387 > noerr = DirectInputFunctionCallSafe(numeric_in, numstr, ... > if (!noerr || escontext.error_occurred) > RETURN_ERROR(ereport(ERROR, > (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM), > errmsg("string argument of jsonpath item method .%s() is not a > valid representation of a decimal or number", They seem to be suggesting that PostgreSQL has the types "decimal" and "number". I know of the former, but I don't think PostgreSQL has the latter type. Perhaps the "number" was intended to refer to "numeric"? (And I think it is largely helpful if the given string were shown in the error message, but it would be another issue.) The same commit has introduced the following set of messages: > %s format is not recognized: "%s" > date format is not recognized: "%s" > time format is not recognized: "%s" > time_tz format is not recognized: "%s" > timestamp format is not recognized: "%s" > timestamp_tz format is not recognized: "%s" I believe that the first line was intended to cover all the others:p They are attached to this message separately. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c index 22f598cd35..c10926a8a2 100644 --- a/src/backend/utils/adt/jsonpath_exec.c +++ b/src/backend/utils/adt/jsonpath_exec.c @@ -1287,7 +1287,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, if (numeric_is_nan(num) || numeric_is_inf(num)) RETURN_ERROR(ereport(ERROR, (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM), - errmsg("numeric argument of jsonpath item method .%s() is out of range for type decimal or number", + errmsg("numeric argument of jsonpath item method .%s() is out of range for type decimal or numeric", jspOperationName(jsp->type); if (jsp->type == jpiDecimal) @@ -1312,14 +1312,14 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, if (!noerr || escontext.error_occurred) RETURN_ERROR(ereport(ERROR, (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM), - errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or number", + errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or numeric", jspOperationName(jsp->type); num = DatumGetNumeric(datum); if (numeric_is_nan(num) || numeric_is_inf(num)) RETURN_ERROR(ereport(ERROR, (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM), - errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or number", + errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or numeric", jspOperationName(jsp->type); res = jperOk; @@ -1401,7 +1401,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, if (!noerr || escontext.error_occurred) RETURN_ERROR(ereport(ERROR, (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM), - errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or number", + errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or numeric", jspOperationName(jsp->type); num = DatumGetNumeric(numdatum); diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out index eea2af30c8..9d8ce48a25 100644 --- a/src/test/regress/expected/jsonb_jsonpath.out +++ b/src/test/regress/expected/jsonb_jsonpath.out @@ -2144,7 +2144,7 @@ select jsonb_path_query('"1.23"', '$.decimal()'); (1 row) select jsonb_path_query('"1.23aaa"
Re: Network failure may prevent promotion
Thank you for looking this! At Tue, 23 Jan 2024 15:07:10 +0900, Fujii Masao wrote in > Regarding the patch, here are the review comments. > > +/* > + * Is current process a wal receiver? > + */ > +bool > +IsWalReceiver(void) > +{ > + return WalRcv != NULL; > +} > > This looks wrong because WalRcv can be non-NULL in processes other > than walreceiver. Mmm. Sorry for the silly mistake. We can use B_WAL_RECEIVER instead. I'm not sure if the new function IsWalReceiver() is required. The expression "MyBackendType == B_WAL_RECEIVER" is quite descriptive. However, the function does make ProcessInterrupts() more aligned regarding process types. > - pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request shutdown */ > + pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown */ > > Can't we just use die(), instead? There was a comment explaining the problems associated with exiting within a signal handler; - * Currently, only SIGTERM is of interest. We can't just exit(1) within the - * SIGTERM signal handler, because the signal might arrive in the middle of - * some critical operation, like while we're holding a spinlock. Instead, the And I think we should keep the considerations it suggests. The patch removes the comment itself, but it does so because it implements our standard process exit procedure, which incorporates points suggested by the now-removed comment. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 201c36cb22..db779dc6ca 100644 --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c @@ -749,7 +749,7 @@ libpqrcv_PQgetResult(PGconn *streamConn) if (rc & WL_LATCH_SET) { ResetLatch(MyLatch); - ProcessWalRcvInterrupts(); + CHECK_FOR_INTERRUPTS(); } /* Consume whatever data is available from the socket */ @@ -1053,7 +1053,7 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres, { char *cstrs[MaxTupleAttributeNumber]; - ProcessWalRcvInterrupts(); + CHECK_FOR_INTERRUPTS(); /* Do the allocations in temporary context. */ oldcontext = MemoryContextSwitchTo(rowcontext); diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 728059518e..e491f7d4c5 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -147,39 +147,34 @@ static void XLogWalRcvSendReply(bool force, bool requestReply); static void XLogWalRcvSendHSFeedback(bool immed); static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime); static void WalRcvComputeNextWakeup(WalRcvWakeupReason reason, TimestampTz now); +static void WalRcvShutdownSignalHandler(SIGNAL_ARGS); -/* - * Process any interrupts the walreceiver process may have received. - * This should be called any time the process's latch has become set. - * - * Currently, only SIGTERM is of interest. We can't just exit(1) within the - * SIGTERM signal handler, because the signal might arrive in the middle of - * some critical operation, like while we're holding a spinlock. Instead, the - * signal handler sets a flag variable as well as setting the process's latch. - * We must check the flag (by calling ProcessWalRcvInterrupts) anytime the - * latch has become set. Operations that could block for a long time, such as - * reading from a remote server, must pay attention to the latch too; see - * libpqrcv_PQgetResult for example. - */ void -ProcessWalRcvInterrupts(void) +WalRcvShutdownSignalHandler(SIGNAL_ARGS) { - /* - * Although walreceiver interrupt handling doesn't use the same scheme as - * regular backends, call CHECK_FOR_INTERRUPTS() to make sure we receive - * any incoming signals on Win32, and also to make sure we process any - * barrier events. - */ - CHECK_FOR_INTERRUPTS(); + int save_errno = errno; - if (ShutdownRequestPending) + /* Don't joggle the elbow of proc_exit */ + if (!proc_exit_inprogress) { - ereport(FATAL, -(errcode(ERRCODE_ADMIN_SHUTDOWN), - errmsg("terminating walreceiver process due to administrator command"))); + InterruptPending = true; + ProcDiePending = true; } + + SetLatch(MyLatch); + + errno = save_errno; + } +/* + * Is current process a wal receiver? + */ +bool +IsWalReceiver(void) +{ + return MyBackendType == B_WAL_RECEIVER; +} /* Main entry point for walreceiver process */ void @@ -277,7 +272,7 @@ WalReceiverMain(void) pqsignal(SIGHUP, SignalHandlerForConfigReload); /* set flag to read config * file */ pqsignal(SIGINT, SIG_IGN); - pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request shutdown */ + pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown */ /* SIGQUIT handler was already set up by InitP
Fix some errdetail's message format
We recently made corrections to the capitalization of DETAIL messages. For example; - GUC_check_errdetail("invalid list syntax in parameter %s", + GUC_check_errdetail("Invalid list syntax in parameter %s", But still it is missing the period at the end. There are several patterns to this issue, but this time, I have only fixed the ones that are simple and obvious as follows: a. GUC_check_errdetail("LITERAL"), errdetail("LITERAL") without a period. b. psprintf()'ed string that is passed to errdetail_internal() I didn't touch the following patterns: c. errdetail_internal("%s") d. errdetail("Whatever: %s") e. errdetail("%s versus %s") and alikes f. errdetail_internal("%s", pchomp(PQerrorMessage())) g. complex message compilation The attached patch contains the following fix: - GUC_check_errdetail("timestamp out of range: \"%s\"", str); + GUC_check_errdetail("Timestamp out of range: \"%s\".", str); But I'm not quite confident about whether capitalizing the type name here is correct. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/contrib/cube/cubescan.l b/contrib/cube/cubescan.l index a30fbfc311..b45dc08f0c 100644 --- a/contrib/cube/cubescan.l +++ b/contrib/cube/cubescan.l @@ -82,7 +82,7 @@ cube_yyerror(NDBOX **result, Size scanbuflen, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for cube"), /* translator: %s is typically "syntax error" */ - errdetail("%s at end of input", message))); + errdetail("%s at end of input.", message))); } else { @@ -90,7 +90,7 @@ cube_yyerror(NDBOX **result, Size scanbuflen, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for cube"), /* translator: first %s is typically "syntax error" */ - errdetail("%s at or near \"%s\"", message, yytext))); + errdetail("%s at or near \"%s\".", message, yytext))); } } diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 19a362526d..2720e91c14 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -2633,7 +2633,7 @@ dblink_security_check(PGconn *conn, remoteConn *rconn, const char *connstr) ereport(ERROR, (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED), errmsg("password or GSSAPI delegated credentials required"), - errdetail("Non-superusers may only connect using credentials they provide, eg: password in connection string or delegated GSSAPI credentials"), + errdetail("Non-superusers may only connect using credentials they provide, eg: password in connection string or delegated GSSAPI credentials."), errhint("Ensure provided credentials match target server's authentication method."))); } diff --git a/contrib/seg/segscan.l b/contrib/seg/segscan.l index 4ad529eccc..f5a72c2496 100644 --- a/contrib/seg/segscan.l +++ b/contrib/seg/segscan.l @@ -79,7 +79,7 @@ seg_yyerror(SEG *result, struct Node *escontext, const char *message) (errcode(ERRCODE_SYNTAX_ERROR), errmsg("bad seg representation"), /* translator: %s is typically "syntax error" */ - errdetail("%s at end of input", message))); + errdetail("%s at end of input.", message))); } else { @@ -87,7 +87,7 @@ seg_yyerror(SEG *result, struct Node *escontext, const char *message) (errcode(ERRCODE_SYNTAX_ERROR), errmsg("bad seg representation"), /* translator: first %s is typically "syntax error" */ - errdetail("%s at or near \"%s\"", message, yytext))); + errdetail("%s at or near \"%s\".", message, yytext))); } } diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 1b48d7171a..2d90bbfee8 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -4882,7 +4882,7 @@ check_recovery_target_time(char **newval, void **extra, GucSource source) if (tm2timestamp(tm, fsec, , ) != 0) { -GUC_check_errdetail("timestamp out of range: \"%s\"", str); +GUC_check_errdetail("Timestamp out of range: \"%s\".", str); return false; } } diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c index 226f85d0e3..564f79dab9 100644 --- a/src/backend/commands/extension.c +++ b/src/backend/commands/extension.c @@ -2946,7 +2946,7 @@ AlterExtensionNamespace(const char *extensionName, const char *newschema, Oid *o (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("extension \"%s\" does not sup
Re: Network failure may prevent promotion
At Mon, 22 Jan 2024 13:29:10 -0800, Andres Freund wrote in > Hi, > > On 2024-01-19 12:28:05 +0900, Michael Paquier wrote: > > On Thu, Jan 18, 2024 at 03:42:28PM +0200, Heikki Linnakangas wrote: > > > Given that commit 728f86fec6 that introduced this issue was not strictly > > > required, perhaps we should just revert it for v16. > > > > Is there a point in keeping 728f86fec6 as well on HEAD? That does not > > strike me as wise to keep that in the tree for now. If it needs to be > > reworked, looking at this problem from scratch would be a safer > > approach. > > IDK, I think we'll introduce this type of bug over and over if we don't fix it > properly. Just to clarify my position, I thought that 728f86fec6 was heading the right direction. Considering the current approach to signal handling in walreceiver, I believed that it would be better to further generalize in this direction rather than reverting. That's why I proposed that patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: In-placre persistance change of a relation
At Mon, 22 Jan 2024 15:36:31 +1100, Peter Smith wrote in > 2024-01 Commitfest. > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > there was a CFbot test failure last time it was run [2]. Please have a > look and post an updated version if necessary. Thanks! I have added the necessary includes to the header file this patch adds. With this change, "make headerscheck" now passes. However, when I run "make cpluspluscheck" in my environment, it generates a large number of errors in other areas, but I didn't find one related to this patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 9a2b6fbda882587c127d3e50bccf89508837d1a5 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Mon, 15 Jan 2024 15:57:53 +0900 Subject: [PATCH v32 1/3] Export wal_sync_method related functions Export several functions related to wal_sync_method for use in subsequent commits. Since PG_O_DIRECT cannot be used in those commits, the new function XLogGetSyncBit() will mask PG_O_DIRECT. --- src/backend/access/transam/xlog.c | 73 +-- src/include/access/xlog.h | 2 + 2 files changed, 52 insertions(+), 23 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 478377c4a2..c5f51849ee 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8403,21 +8403,29 @@ assign_wal_sync_method(int new_wal_sync_method, void *extra) } } +/* + * Exported version of get_sync_bit() + * + * Do not expose PG_O_DIRECT for uses outside xlog.c. + */ +int +XLogGetSyncBit(void) +{ + return get_sync_bit(wal_sync_method) & ~PG_O_DIRECT; +} + /* - * Issue appropriate kind of fsync (if any) for an XLOG output file. + * Issue appropriate kind of fsync (if any) according to wal_sync_method. * - * 'fd' is a file descriptor for the XLOG file to be fsync'd. - * 'segno' is for error reporting purposes. + * 'fd' is a file descriptor for the file to be fsync'd. */ -void -issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) +const char * +XLogFsyncFile(int fd) { - char *msg = NULL; + const char *msg = NULL; instr_time start; - Assert(tli != 0); - /* * Quick exit if fsync is disabled or write() has already synced the WAL * file. @@ -8425,7 +8433,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) if (!enableFsync || wal_sync_method == WAL_SYNC_METHOD_OPEN || wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC) - return; + return NULL; /* Measure I/O timing to sync the WAL file */ if (track_wal_io_timing) @@ -8460,19 +8468,6 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) break; } - /* PANIC if failed to fsync */ - if (msg) - { - char xlogfname[MAXFNAMELEN]; - int save_errno = errno; - - XLogFileName(xlogfname, tli, segno, wal_segment_size); - errno = save_errno; - ereport(PANIC, -(errcode_for_file_access(), - errmsg(msg, xlogfname))); - } - pgstat_report_wait_end(); /* @@ -8486,7 +8481,39 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start); } - PendingWalStats.wal_sync++; + if (msg != NULL) + PendingWalStats.wal_sync++; + + return msg; +} + +/* + * Issue appropriate kind of fsync (if any) for an XLOG output file. + * + * 'fd' is a file descriptor for the XLOG file to be fsync'd. + * 'segno' is for error reporting purposes. + */ +void +issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) +{ + const char *msg; + + Assert(tli != 0); + + msg = XLogFsyncFile(fd); + + /* PANIC if failed to fsync */ + if (msg) + { + char xlogfname[MAXFNAMELEN]; + int save_errno = errno; + + XLogFileName(xlogfname, tli, segno, wal_segment_size); + errno = save_errno; + ereport(PANIC, +(errcode_for_file_access(), + errmsg(msg, xlogfname))); + } } /* diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 301c5fa11f..2a0d65b537 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -217,6 +217,8 @@ extern void xlog_redo(struct XLogReaderState *record); extern void xlog_desc(StringInfo buf, struct XLogReaderState *record); extern const char *xlog_identify(uint8 info); +extern int XLogGetSyncBit(void); +extern const char *XLogFsyncFile(int fd); extern void issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli); extern bool RecoveryInProgress(void); -- 2.39.3 >From c464013071dedc15b838e573ae828f150b3b60f7 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 31 Aug 2023 11:49:10 +0900 Subject: [PATCH v32 2/3] Introduce undo log implementation This patch adds a simple implementation of UNDO log feature. --- src/backend/access/transam/Makefile| 1 + src/backend/access/transam/meson.build | 1 + src/backend/access/transam/rmgr.c | 4 +- src/backend/access/transam/simpleundolog.c | 362 +
Re: Make mesage at end-of-recovery less scary.
At Wed, 17 Jan 2024 14:32:00 +0900, Michael Paquier wrote in > On Tue, Jan 16, 2024 at 02:46:02PM +0300, Aleksander Alekseev wrote: > >> For now, let me explain the basis for this patch. The fundamental > >> issue is that these warnings that always appear are, in practice, not > >> a problem in almost all cases. Some of those who encounter them for > >> the first time may feel uneasy and reach out with inquiries. On the > >> other hand, those familiar with these warnings tend to ignore them and > >> only pay attention to details when actual issues arise. Therefore, the > >> intention of this patch is to label them as "no issue" unless a > >> problem is blatantly evident, in order to prevent unnecessary concern. > > > > I agree and don't mind affecting the error message per se. > > > > However I see that the actual logic of how WAL is processed is being > > changed. If we do this, at very least it requires thorough thinking. I > > strongly suspect that the proposed code is wrong and/or not safe > > and/or less safe than it is now for the reasons named above. > > FWIW, that pretty much sums up my feeling regarding this patch, > because an error, basically any error, would hurt back very badly. > Sure, the error messages we generate now when reaching the end of WAL > can sound scary, and they are (I suspect that's not really the case > for anybody who has history doing support with PostgreSQL because a > bunch of these messages are old enough to vote, but I can understand > that anybody would freak out the first time they see that). > > However, per the recent issues we've had in this area, like > cd7f19da3468 but I'm more thinking about 6b18b3fe2c2f and > bae868caf222, I am of the opinion that the header validation, the > empty page case in XLogReaderValidatePageHeader() and the record read > changes are risky enough that I am not convinced that the gains are > worth the risks taken. > > The error stack in the WAL reader is complicated enough that making it > more complicated as the patch proposes does not sound like not a good > tradeoff to me to make the reports related to the end of WAL cleaner > for the end-user. I agree that we should do something, but the patch > does not seem like a good step towards this goal. Perhaps somebody > would be more excited about this proposal than I am, of course. Thank you both for the comments. The criticism seems valid. The approach to identifying the end-of-WAL state in this patch is quite heuristic, and its validity or safety can certainly be contested. On the other hand, if we seek perfection in this area of judgment, we may need to have the WAL format itself more robust. In any case, since the majority of the feedback on this patch seems to be negative, I am going to withdraw it if no supportive opinions emerge during this commit-fest. The attached patch addresses the errors reported by CF-bot. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 933d10fa6c7b71e4684f5ba38e85177afaa56f58 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 7 Mar 2023 14:55:58 +0900 Subject: [PATCH v29] Make End-Of-Recovery error less scary When recovery in any type ends, we see a bit scary error message like "invalid record length" that suggests something serious is happening. Actually if recovery meets a record with length = 0, that usually means it finished applying all available WAL records. Make this message less scary as "reached end of WAL". Instead, raise the error level for other kind of WAL failure to WARNING. To make sure that the detection is correct, this patch checks if all trailing bytes in the same page are zeroed in that case. --- src/backend/access/transam/xlogreader.c | 147 ++ src/backend/access/transam/xlogrecovery.c | 96 ++ src/backend/replication/walreceiver.c | 7 +- src/bin/pg_waldump/pg_waldump.c | 13 +- src/bin/pg_waldump/t/001_basic.pl | 5 +- src/include/access/xlogreader.h | 1 + src/test/recovery/t/035_recovery.pl | 130 +++ src/test/recovery/t/039_end_of_wal.pl | 47 +-- 8 files changed, 383 insertions(+), 63 deletions(-) create mode 100644 src/test/recovery/t/035_recovery.pl diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 7190156f2f..94861969eb 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -48,6 +48,8 @@ static int ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen); static void XLogReaderInvalReadState(XLogReaderState *state); static XLogPageReadResult XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking); +static bool ValidXLogRecordLe
Re: Make mesage at end-of-recovery less scary.
At Mon, 22 Jan 2024 16:09:28 +1100, Peter Smith wrote in > 2024-01 Commitfest. > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > there were CFbot test failures last time it was run [2]. Please have a > look and post an updated version if necessary. > > == > [1] https://commitfest.postgresql.org/46/2490/ > [2] > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/2490 Thanks for noticing of that. Will repost a new version. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Network failure may prevent promotion
At Sun, 31 Dec 2023 20:07:41 +0900 (JST), Kyotaro Horiguchi wrote in > We've noticed that when walreceiver is waiting for a connection to > complete, standby does not immediately respond to promotion > requests. In PG14, upon receiving a promotion request, walreceiver > terminates instantly, but in PG16, it waits for connection > timeout. This behavior is attributed to commit 728f86fec65, where a > part of libpqrcv_connect was simply replaced with a call to > libpqsrc_connect_params. This behavior can be verified by simply > dropping packets from the standby to the primary. Apologize for the inconvenience on my part, but I need to fix this behavior. To continue this discussion, I'm providing a repro script here. With the script, the standby is expected to promote immediately, emitting the following log lines: standby.log: > 2024-01-18 16:25:22.245 JST [31849] LOG: received promote request > 2024-01-18 16:25:22.245 JST [31850] FATAL: terminating walreceiver process > due to administrator command > 2024-01-18 16:25:22.246 JST [31849] LOG: redo is not required > 2024-01-18 16:25:22.246 JST [31849] LOG: selected new timeline ID: 2 > 2024-01-18 16:25:22.274 JST [31849] LOG: archive recovery complete > 2024-01-18 16:25:22.275 JST [31847] LOG: checkpoint starting: force > 2024-01-18 16:25:22.277 JST [31846] LOG: database system is ready to accept > connections > 2024-01-18 16:25:22.280 JST [31847] LOG: checkpoint complete: wrote 3 > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, > sync=0.001 s, total=0.005 s; sync files=2, longest=0.001 s, average=0.001 s; > distance=0 kB, estimate=0 kB; lsn=0/1548E98, redo lsn=0/1548E40 > 2024-01-18 16:25:22.356 JST [31846] LOG: received immediate shutdown request > 2024-01-18 16:25:22.361 JST [31846] LOG: database system is shut down After 728f86fec65 was introduced, promotion does not complete with the same operation, as follows. The patch attached to the previous mail fixes this behavior to the old behavior above. > 2024-01-18 16:47:53.314 JST [34515] LOG: received promote request > 2024-01-18 16:48:03.947 JST [34512] LOG: received immediate shutdown request > 2024-01-18 16:48:03.952 JST [34512] LOG: database system is shut down The attached script requires that sudo is executable. And there's another point to note. The script attempts to establish a replication connection to $primary_address:$primary_port. To packet-filter can work, it must be a remote address that is accessible when no packet-filter setting is set up. The firewall-cmd setting, need to be configured to block this connection. If simply an inaccessible IP address is set, the process will fail immediately with a "No route to host" error before the first packet is sent out, and it will not be blocked as intended. regards. -- Kyotaro Horiguchi NTT Open Source Software Center #! /bin/perl use Cwd; # This IP address must be a valid and accessible remote address, # otherwise replication connection will immediately fail with 'No # route to host', while we want to wait for replication connection to # complete. $primary_addr = '192.168.56.1'; $primary_port = 5432; $fwzone = 'public'; $rootdir = cwd(); $standby_dir = "$rootdir/standby"; $standby_port = 5432; $standby_logfile= "standby.log"; $rich_rule = "'rule family=\"ipv4\" destination address=\"$primary_addr\" port port=\"$primary_port\" protocol=\"tcp\" drop'"; $add_cmd = "sudo firewall-cmd --zone=$fwzone --add-rich-rule=$rich_rule"; $del_cmd = "sudo firewall-cmd --zone=$fwzone --remove-rich-rule=$rich_rule"; # Remove old entities system('killall -9 postgres'); system("rm -rf $standby_dir $standby_logfile"); # Setup packet drop system($add_cmd) == 0 or die "failed to exec \"$add_cmd\": $!\n"; # Setup a standby. system("initdb -D $standby_dir -c primary_conninfo='host=$primary_addr port=$primary_port'"); system("touch $standby_dir/standby.signal"); # Start it. system("pg_ctl start -D $standby_dir -o \"-p $standby_port\" -l standby.log"); sleep 1; # Try promoting standby, waiting for 10 seconds. system("pg_ctl promote -t 10 -D $standby_dir"); # Stop servers system("pg_ctl stop -m i -D $standby_dir"); # Remove packet-drop setting system($del_cmd) == 0 or die "failed to exec \"$del_cmd\": $!\n";
Re: initdb's -c option behaves wrong way?
Thank you for upicking this up. At Wed, 17 Jan 2024 23:47:41 +0100, Daniel Gustafsson wrote in > > On 17 Jan 2024, at 21:33, Tom Lane wrote: > > > > I wrote: > >> However ... I don't like the patch much. It seems to have left > >> the code in a rather random state. Why, for example, didn't you > >> keep all the code that constructs the "newline" value together? > > > > After thinking about it a bit more, I don't see why you didn't just > > s/strncmp/strncasecmp/ and call it good. The messiness seems to be > > a result of your choice to replace the GUC's case as shown in the > > file with the case used on the command line, which is not better > > IMO. We don't change our mind about the canonical spelling of a > > GUC because somebody varied the case in a SET command. > > The original patch was preserving the case of the file throwing away the case > from the commandline. The attached is a slimmed down version which only moves > the assembly of newline to the different cases (replace vs. new) keeping the > rest of the code intact. Keeping it in one place gets sort of messy too since > it needs to use different values for a replacement and a new variable. Not > sure if there is a cleaner approach? Just to clarify, the current code breaks out after processing the first matching line. I haven't changed that behavior. The reason I moved the rewrite processing code out of the loop was I wanted to avoid adding more lines that are executed only once into the loop. However, it is in exchange of additional complexity to remember the original spelling of the parameter name. Personally, I believe separating the search and rewrite processing is better, but I'm not particularly attached to the approach, so either way is fine with me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
At Wed, 17 Jan 2024 14:38:54 +0900, torikoshia wrote in > Hi, > > Thanks for applying! > > > + errmsg_plural("%zd row were skipped due to data type > > incompatibility", > > Sorry, I just noticed it, but 'were' should be 'was' here? > > >> BTW I'm thinking we should add a column to pg_stat_progress_copy that > >> counts soft errors. I'll suggest this in another thread. > > Please do! > > I've started it here: > > https://www.postgresql.org/message-id/d12fd8c99adcae2744212cb23feff...@oss.nttdata.com Switching topics, this commit (9e2d870119) adds the following help message: > "COPY { %s [ ( %s [, ...] ) ] | ( %s ) }\n" > "TO { '%s' | PROGRAM '%s' | STDOUT }\n" > ... > "SAVE_ERROR_TO '%s'\n" > ... > _("location"), On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which indicate "immediately error out" and 'just ignore the failure' respectively, but these options hardly seem to denote a 'location', and appear more like an 'action'. I somewhat suspect that this parameter name intially conceived with the assupmtion that it would take file names or similar parameters. I'm not sure if others will agree, but I think the parameter name might not be the best choice. For instance, considering the addition of the third value 'log', something like on_error_action (error, ignore, log) would be more intuitively understandable. What do you think? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Make mesage at end-of-recovery less scary.
Thank you for the comments. At Fri, 12 Jan 2024 15:03:26 +0300, Aleksander Alekseev wrote in > ``` > +p = (char *) record; > +pe = p + XLOG_BLCKSZ - (RecPtr & (XLOG_BLCKSZ - 1)); > + > +while (p < pe && *p == 0) > +p++; > + > +if (p == pe) > ``` > > Just as a random thought: perhaps we should make this a separate > function, as a part of src/port/. It seems to me that this code could > benefit from using vector instructions some day, similarly to > memcmp(), memset() etc. Surprisingly there doesn't seem to be a > standard C function for this. Alternatively one could argue that one > cycle doesn't make much code to reuse and that the C compiler will > place SIMD instructions for us. However a counter-counter argument > would be that we could use a macro or even better an inline function > and have the same effect except getting a slightly more readable code. Creating a function with a name like memcmp_byte() should be straightforward, but implementing it with SIMD right away seems a bit challenging. Similar operations are already being performed elsewhere in the code, probably within the stats collector, where memcmp is used with a statically allocated area that's filled with zeros. If we can achieve a performance equivalent to memcmp with this new function, then it definitely seems worth pursuing. > ``` > - * This is just a convenience subroutine to avoid duplicated code in > + * This is just a convenience subroutine to avoid duplicate code in > ``` > > This change doesn't seem to be related to the patch. Personally I > don't mind it though. Ah, I'm sorry. That was something I mistakenly thought I had written at the last moment and made modifications to. > All in all I find v28 somewhat scary. It does much more than "making > one message less scary" as it was initially intended and what bugged > me personally, and accordingly touches many more places including > xlogreader.c, xlogrecovery.c, etc. > > Particularly I have mixed feeling about this: > > ``` > +/* > + * Consider it as end-of-WAL if all subsequent bytes of this page > + * are zero. We don't bother checking the subsequent pages since > + * they are not zeroed in the case of recycled segments. > + */ > ``` > > If I understand correctly, if somehow several FS blocks end up being > zeroed (due to OS bug, bit rot, restoring from a corrupted for > whatever reason backup, hardware failures, ...) there is non-zero > chance that PG will interpret this as a normal situation. To my > knowledge this is not what we typically do - typically PG would report > an error and ask a human to figure out what happened. Of course the > possibility of such a scenario is small, but I don't think that as > DBMS developers we can ignore it. For now, let me explain the basis for this patch. The fundamental issue is that these warnings that always appear are, in practice, not a problem in almost all cases. Some of those who encounter them for the first time may feel uneasy and reach out with inquiries. On the other hand, those familiar with these warnings tend to ignore them and only pay attention to details when actual issues arise. Therefore, the intention of this patch is to label them as "no issue" unless a problem is blatantly evident, in order to prevent unnecessary concern. > Does anyone agree or maybe I'm making things up? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Error "initial slot snapshot too large" in create replication slot
At Fri, 12 Jan 2024 11:28:09 -0500, Robert Haas wrote in > However, I wonder whether this whole area is in need of a bigger > rethink. There seem to be a number of situations in which the split > into xip and subxip arrays is not very convenient, and also some > situations where it's quite important. Sometimes we want to record > what's committed, and sometimes what isn't. It's all a bit messy and > inconsistent. The necessity of limiting snapshot size is annoying, > too. I have no real idea what can be done about all of this, but what > strikes me is that the current system has grown up incrementally: we > started with a data structure designed for the original use case, and > now by gradually adding new use cases things have gotten complicated. > If we were designing things over from scratch, maybe we'd do it > differently and end up with something less messy. And maybe someone > can imagine a redesign that is likewise less messy. > > But on the other hand, maybe not. Perhaps we can't really do any > better than what we have. Then the question becomes whether this case > is important enough to justify additional code complexity. I don't > think I've personally seen users run into this problem so I have no > special reason to think that it's important, but if it's causing > issues for other people then maybe it is. Thank you for the deep insights. I have understood your points. As I can't think of any further simple modifications on this line, I will withdraw this CF entry. At the moment, I also lack a fundamental, comprehensive solution, but should if I or anyone else come up with such a solution in the future, I believe it would worth a separate discussion. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: In-placre persistance change of a relation
At Tue, 9 Jan 2024 15:07:20 +0530, vignesh C wrote in > CFBot shows compilation issues at [1] with: Thanks! The reason for those errors was that I didn't consider Meson at the time. Additionally, the signature change of reindex_index() caused the build failure. I fixed both issues. While addressing these issues, I modified the simpleundolog module to honor wal_sync_method. Previously, the sync method for undo logs was determined independently, separate from xlog.c. However, I'm still not satisfied with the method for handling PG_O_DIRECT. In this version, I have added the changes to enable the use of wal_sync_method outside of xlog.c as the first part of the patchset. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 40749357f24adf89dc79db9b34f5c053288489bb Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Mon, 15 Jan 2024 15:57:53 +0900 Subject: [PATCH v31 1/3] Export wal_sync_method related functions Export several functions related to wal_sync_method for use in subsequent commits. Since PG_O_DIRECT cannot be used in those commits, the new function XLogGetSyncBit() will mask PG_O_DIRECT. --- src/backend/access/transam/xlog.c | 73 +-- src/include/access/xlog.h | 2 + 2 files changed, 52 insertions(+), 23 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 478377c4a2..c5f51849ee 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8403,21 +8403,29 @@ assign_wal_sync_method(int new_wal_sync_method, void *extra) } } +/* + * Exported version of get_sync_bit() + * + * Do not expose PG_O_DIRECT for uses outside xlog.c. + */ +int +XLogGetSyncBit(void) +{ + return get_sync_bit(wal_sync_method) & ~PG_O_DIRECT; +} + /* - * Issue appropriate kind of fsync (if any) for an XLOG output file. + * Issue appropriate kind of fsync (if any) according to wal_sync_method. * - * 'fd' is a file descriptor for the XLOG file to be fsync'd. - * 'segno' is for error reporting purposes. + * 'fd' is a file descriptor for the file to be fsync'd. */ -void -issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) +const char * +XLogFsyncFile(int fd) { - char *msg = NULL; + const char *msg = NULL; instr_time start; - Assert(tli != 0); - /* * Quick exit if fsync is disabled or write() has already synced the WAL * file. @@ -8425,7 +8433,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) if (!enableFsync || wal_sync_method == WAL_SYNC_METHOD_OPEN || wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC) - return; + return NULL; /* Measure I/O timing to sync the WAL file */ if (track_wal_io_timing) @@ -8460,19 +8468,6 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) break; } - /* PANIC if failed to fsync */ - if (msg) - { - char xlogfname[MAXFNAMELEN]; - int save_errno = errno; - - XLogFileName(xlogfname, tli, segno, wal_segment_size); - errno = save_errno; - ereport(PANIC, -(errcode_for_file_access(), - errmsg(msg, xlogfname))); - } - pgstat_report_wait_end(); /* @@ -8486,7 +8481,39 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start); } - PendingWalStats.wal_sync++; + if (msg != NULL) + PendingWalStats.wal_sync++; + + return msg; +} + +/* + * Issue appropriate kind of fsync (if any) for an XLOG output file. + * + * 'fd' is a file descriptor for the XLOG file to be fsync'd. + * 'segno' is for error reporting purposes. + */ +void +issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) +{ + const char *msg; + + Assert(tli != 0); + + msg = XLogFsyncFile(fd); + + /* PANIC if failed to fsync */ + if (msg) + { + char xlogfname[MAXFNAMELEN]; + int save_errno = errno; + + XLogFileName(xlogfname, tli, segno, wal_segment_size); + errno = save_errno; + ereport(PANIC, +(errcode_for_file_access(), + errmsg(msg, xlogfname))); + } } /* diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 301c5fa11f..2a0d65b537 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -217,6 +217,8 @@ extern void xlog_redo(struct XLogReaderState *record); extern void xlog_desc(StringInfo buf, struct XLogReaderState *record); extern const char *xlog_identify(uint8 info); +extern int XLogGetSyncBit(void); +extern const char *XLogFsyncFile(int fd); extern void issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli); extern bool RecoveryInProgress(void); -- 2.39.3 >From 5c120b94c407b971485ab52133399305e5e81a88 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 31 Aug 2023 11:49:10 +0900 Subject: [PATCH v31 2/3] Introduce undo log implementation This patch adds a simple implementation of UNDO log feature. --- src/backend/access/transam/Makefile| 1 + src/backend/access/transam/meson.build | 1 + src/backend/access/transam/rmgr.c
Re: initdb --no-locale=C doesn't work as specified when the environment is not C
At Wed, 10 Jan 2024 18:16:03 -0500, Tom Lane wrote in > Kyotaro Horiguchi writes: > > It seems somewhat intentional that only lc_messages references the > > environment at boot time. On the other hand, previously, in the > > absence of a specified locale, initdb would embed the environmental > > value in the configuration file, as it seems to be documented. Given > > that initdb is always used for cluster creation, it's unlikey that > > systems depend on this boot-time default for their operation. > > Yeah, after further reflection there doesn't seem to be a lot of value > in leaving these entries commented-out, even in the cases where that's > technically correct. Let's just go back to the old behavior of always > uncommenting them; that stood for years without complaints. So I > committed your latest patch as-is. I'm glad you understand. Thank you for commiting. > > By the way, the lines around lc_* in the sample file seem to have > > somewhat inconsistent indentations. Wouldnt' it be preferable to fix > > this? (The attached doesn't that.) > > They look all right if you assume the tab width is 8, which seems to > be what is used elsewhere in the file. I think there's been some > prior discussion about whether to ban use of tabs at all in these > sample files, so as to reduce confusion about how wide the tabs are. > But I'm not touching that question today. Ah, I see, I understood. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Error "initial slot snapshot too large" in create replication slot
Thank you for the comments. This will help move the discussion forward. At Fri, 5 Jan 2024 15:17:11 -0500, Robert Haas wrote in > On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi > wrote: > > [ new patch ] > > Well, I guess nobody is too excited about fixing this, because it's > been another 10 months with no discussion. Andres doesn't even seem to > think this is as much a bug as it is a limitation, for all that it's > filed in the CF application under bug fixes. I kind of wonder if we > should just close this entry in the CF, but I'll hold off on that for > now. Perhaps you are correct. Ultimately, this issue is largely theoretical, and I don't believe anyone would be inconvenienced by imposing this contraint. > * note: all ids in xip[] satisfy xmin <= xip[i] < xmax > */ > > I have to say that I don't like this at all. It's bad enough that we > already use the xip/subxip arrays in two different ways depending on > the situation. Increasing that to three different ways seems painful. > How is anyone supposed to keep track of how the array is being used at > which point in the code? I understand. So, summirizing the current state briefly, I believe it as follows: a. snapbuild lacks a means to differentiate between top and sub xids during snapshot building. b. Abusing takenDuringRecovery could lead to potential issues, so it has been rejected. c. Dynamic sizing of xip is likely to have a significant impact on performance (as mentioned in the comments of GetSnapshotData). d. (new!) Adding a third storing method is not favored. As a method to satisfy these prerequisites, I think one direction could be to split takenDuringRecovery into flags indicating the storage method and creation timing. I present this as a last-ditch effort. It's a rough proposal, and further validation should be necessary. If this direction is also not acceptable, I'll proceed with removing the CF entry. > If we are going to do that, I suspect it needs comment updates in more > places than what the patch does currently. For instance: > > + if (newxcnt < xiplen) > + newxip[newxcnt++] = xid; > + else > + newsubxip[newsubxcnt++] = xid; > > Just imagine coming across this code in 5 or 10 years and finding that > it had no comment explaining anything. Yikes! ^^; > Aside from the details of the patch, and perhaps more seriously, I'm > not really clear that we have consensus on an approach. A few > different proposals seem to have been floated, and it doesn't seem > like anybody hates anybody else's idea completely, but it doesn't > really seem like everyone agrees on what to do, either. I don't fully agree with that.It's not so much that I dislike other proposals, but rather that we haven't been able to find a definitive solution that stands out. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From cd310711c1565b686416e8bf5a3a5105002e35c3 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 12 Jan 2024 11:36:41 +0900 Subject: [PATCH v9] Lift initial snapshot limit for logical replication The replication command CREATE_REPLICATION_SLOT often fails with "snapshot too large" error when numerous subtransactions are active. This issue stems from SnapBuildInitialSnapshot attempting to generate a snapshot that includes all active XIDs, including subxids, stored within the xip array. This patch mitigates the constraint by utilizing the same storing method as takenDuringRecovery. --- src/backend/replication/logical/snapbuild.c | 22 + src/backend/storage/ipc/procarray.c | 1 + src/backend/utils/time/snapmgr.c| 15 +- src/include/utils/snapshot.h| 3 ++- 4 files changed, 27 insertions(+), 14 deletions(-) diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index a0b7947d2f..528bdbb662 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -388,6 +388,7 @@ SnapBuildFreeSnapshot(Snapshot snap) Assert(snap->curcid == FirstCommandId); Assert(!snap->suboverflowed); Assert(!snap->takenDuringRecovery); + Assert(!snap->mixed); Assert(snap->regd_count == 0); /* slightly more likely, so it's checked even without c-asserts */ @@ -464,6 +465,7 @@ SnapBuildSnapDecRefcount(Snapshot snap) Assert(snap->curcid == FirstCommandId); Assert(!snap->suboverflowed); Assert(!snap->takenDuringRecovery); + Assert(!snap->mixed); Assert(snap->regd_count == 0); @@ -550,6 +552,7 @@ SnapBuildBuildSnapshot(SnapBuild *builder) snapshot->suboverflowed = false; snapshot->takenDuringRecovery = false; + snapshot->mixed = false; snapshot->copied = false; snapshot->curcid = FirstCommandId; snapshot->active_count = 0; @@ -572,8 +575,8 @@ SnapBuildInitia
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows
Thanks for restarting this thread. At Tue, 9 Jan 2024 09:40:23 +0900, Michael Paquier wrote in > On Fri, Jan 05, 2024 at 02:58:55PM -0500, Robert Haas wrote: > > I'm not a Windows expert, but my guess is that 0001 is a very good > > idea. I hope someone who is a Windows expert will comment on that. > > I am +1 on 0001. It is just something we've never anticipated when > these wrappers around cmd in pg_ctl were written. Thanks for committing it. > > 0002 seems problematic to me. One potential issue is that it would > > break if someone renamed postgres.exe to something else -- although > > that's probably not really a serious problem. > > We do a find_other_exec_or_die() on "postgres" with what could be a > custom execution path. So we're actually sure that the binary will be > there in the start path, no? I don't like much the hardcoded > dependency to .exe here. The patch doesn't care of the path for postgres.exe. If you are referring to the code you cited below, it's for another reason. I'll describe that there. > > A bigger issue is that > > it seems like it would break if someone used pg_ctl to start several > > instances in different data directories on the same machine. If I'm > > understanding correctly, that currently mostly works, and this would > > break it. > > Not having the guarantee that a single shell_pid is associated to a > single postgres.exe would be a problem. Now the patch includes this > code: > + if (ppe.th32ParentProcessID == shell_pid && > + strcmp("postgres.exe", ppe.szExeFile) == 0) > + { > + if (pm_pid != ppe.th32ProcessID && pm_pid != 0) > + multiple_children = true; > + pm_pid = ppe.th32ProcessID; > + } > > Which is basically giving this guarantee? multiple_children should > never happen once the autorun part is removed. Is that right? The patch indeed ensures the relationship between the parent pg_ctl.exe and postgres.exe. However, for some reason, in my Windows 11 environment with the /D option specified, I observed that another cmd.exe is spawned as the second child process of the parent cmd.exe. This is why there is a need to verify the executable file name. I have no idea how the second cmd.exe is being spawned. > +* The launcher shell might start other cmd.exe instances or programs > +* besides postgres.exe. Veryfying the program file name is essential. > > With the autorun part of cmd.exe removed, what's still relevant here? Yes, if the strcmp() is commented out, multiple_children sometimes becomes true.. > s/Veryfying/Verifying/. Oops! > Perhaps 0002 should make more efforts in documenting things like > th32ProcessID and th32ParentProcessID. Is it correct to understand that you are requesting changes as follows? --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -1995,11 +1995,14 @@ pgwin32_find_postmaster_pid(pid_t shell_pid) * * Check for duplicate processes to ensure reliability. * -* The launcher shell might start other cmd.exe instances or programs -* besides postgres.exe. Verifying the program file name is essential. -* -* The launcher shell process isn't checked in this function. It will be -* checked by the caller. +* The ppe entry to be examined is identified by th32ParentProcessID, which +* should correspond to the cmd.exe process that executes the postgres.exe +* binary. Additionally, th32ProcessID in the same entry should be the PID +* of the launched postgres.exe. However, even though we have launched the +* parent cmd.exe with the /D option specified, it is sometimes observed +* that another cmd.exe is launched for unknown reasons. Therefore, it is +* crucial to verify the program file name to avoid returning the wrong + * PID. */ regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From f9f2486f18502357fb76f2f1da00e19db40b2bc9 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 24 Oct 2023 14:46:50 +0900 Subject: [PATCH v6 1/2] Improve pg_ctl postmaster process check on Windows Currently pg_ctl on Windows does not verify that it actually executed a postmaster process due to lack of process ID knowledge. This can lead to false positives in cases where another pg_ctl instance starts a different server simultaneously. This patch adds the capability to identify the process ID of the launched postmaster on Windows, similar to other OS versions, ensuring more reliable detection of concurrent server startups. --- src/bin/pg_ctl/pg_ctl.c | 105 1 file changed, 96 insertions(+), 9
Re: Make mesage at end-of-recovery less scary.
At Fri, 5 Jan 2024 16:02:24 +0530, vignesh C wrote in > On Wed, 22 Nov 2023 at 13:01, Kyotaro Horiguchi > wrote: > > > > Anyway, this requires rebsaing, and done. > > Few tests are failing at [1], kindly post an updated patch: Thanks! The errors occurred in a part of the tests for end-of-WAL detection added in the master branch. These failures were primarily due to changes in the message contents introduced by this patch. During the revision, I discovered an issue with the handling of empty pages that appear in the middle of reading continuation records. In the previous version, such empty pages were mistakenly identified as indicating a clean end-of-WAL (that is a LOG). However, they should actually be handled as a WARNING, since the record curently being read is broken at the empty pages. The following changes have been made in this version: 1. Adjusting the test to align with the error message changes introduced by this patch. 2. Adding tests for the newly added messages. 3. Correcting the handling of empty pages encountered during the reading of continuation records. (XLogReaderValidatePageHeader) 4. Revising code comments. 5. Changing the term "log segment" to "WAL segment". (XLogReaderValidatePageHeader) regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 47cc39a212d7fd6857f30c35c76bcdd0d26bbc3f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 7 Mar 2023 14:55:58 +0900 Subject: [PATCH v28] Make End-Of-Recovery error less scary When recovery in any type ends, we see a bit scary error message like "invalid record length" that suggests something serious is happening. Actually if recovery meets a record with length = 0, that usually means it finished applying all available WAL records. Make this message less scary as "reached end of WAL". Instead, raise the error level for other kind of WAL failure to WARNING. To make sure that the detection is correct, this patch checks if all trailing bytes in the same page are zeroed in that case. --- src/backend/access/transam/xlogreader.c | 147 ++ src/backend/access/transam/xlogrecovery.c | 96 ++ src/backend/replication/walreceiver.c | 7 +- src/bin/pg_waldump/pg_waldump.c | 13 +- src/include/access/xlogreader.h | 1 + src/test/recovery/t/035_recovery.pl | 130 +++ src/test/recovery/t/039_end_of_wal.pl | 47 +-- 7 files changed, 380 insertions(+), 61 deletions(-) create mode 100644 src/test/recovery/t/035_recovery.pl diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 7190156f2f..94861969eb 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -48,6 +48,8 @@ static int ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen); static void XLogReaderInvalReadState(XLogReaderState *state); static XLogPageReadResult XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking); +static bool ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr, + XLogRecord *record); static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr, XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess); static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record, @@ -149,6 +151,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir, pfree(state); return NULL; } + state->EndOfWAL = false; state->errormsg_buf[0] = '\0'; /* @@ -553,6 +556,7 @@ XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking) /* reset error state */ state->errormsg_buf[0] = '\0'; decoded = NULL; + state->EndOfWAL = false; state->abortedRecPtr = InvalidXLogRecPtr; state->missingContrecPtr = InvalidXLogRecPtr; @@ -636,16 +640,12 @@ restart: Assert(pageHeaderSize <= readOff); /* - * Read the record length. + * Verify the record header. * * NB: Even though we use an XLogRecord pointer here, the whole record - * header might not fit on this page. xl_tot_len is the first field of the - * struct, so it must be on this page (the records are MAXALIGNed), but we - * cannot access any other fields until we've verified that we got the - * whole header. + * header might not fit on this page. */ record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ); - total_len = record->xl_tot_len; /* * If the whole record header is on this page, validate it immediately. @@ -664,19 +664,19 @@ restart: } else { - /* There may be no next page if it's too small. */ - if (total_len < SizeOfXLogRecord) - { - report_invalid_record(state, - "invalid record length at %X/%X: expected at least %u, got %u", - LSN_FORMAT_ARGS(RecPtr), - (uint32) SizeOfXLogRecord, total_len); + /* + * xl_tot_len is
Network failure may prevent promotion
(Apology for resubmitting due to poor subject of the previous mail) --- Hello. We've noticed that when walreceiver is waiting for a connection to complete, standby does not immediately respond to promotion requests. In PG14, upon receiving a promotion request, walreceiver terminates instantly, but in PG16, it waits for connection timeout. This behavior is attributed to commit 728f86fec65, where a part of libpqrcv_connect was simply replaced with a call to libpqsrc_connect_params. This behavior can be verified by simply dropping packets from the standby to the primary. By a simple thought, in walreceiver, libpqsrv_connect_internal could just call ProcessWalRcvInterrupts() instead of CHECK_FOR_INTERRUPTS(), but this approach is quite ugly. Since ProcessWalRcvInterrupts() originally calls CHECK_FOR_INTERRUPTS() and there are no standalone calls to CHECK_FOR_INTERRUPTS() within walreceiver, I think it might be better to use ProcDiePending instead of ShutdownRequestPending. I added a subset function of die() as the SIGTERM handler in walsender in a crude patch attached. What do you think about the issue, and the approach? If there are no issues or objections with this method, I will continue to refine this patch. For now, I plan to register it for the upcoming commitfest. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
libpqsrv_connect_params should call ProcessWalRcvInterrupts
Hello. We've noticed that when walreceiver is waiting for a connection to complete, standby does not immediately respond to promotion requests. In PG14, upon receiving a promotion request, walreceiver terminates instantly, but in PG16, it waits for connection timeout. This behavior is attributed to commit 728f86fec65, where a part of libpqrcv_connect was simply replaced with a call to libpqsrc_connect_params. This behavior can be verified by simply dropping packets from the standby to the primary. By a simple thought, in walreceiver, libpqsrv_connect_internal could just call ProcessWalRcvInterrupts() instead of CHECK_FOR_INTERRUPTS(), but this approach is quite ugly. Since ProcessWalRcvInterrupts() originally calls CHECK_FOR_INTERRUPTS() and there are no standalone calls to CHECK_FOR_INTERRUPTS() within walreceiver, I think it might be better to use ProcDiePending instead of ShutdownRequestPending. I added a subset function of die() as the SIGTERM handler in walsender in a crude patch attached. What do you think about the issue, and the approach? If there are no issues or objections with this method, I will continue to refine this patch. For now, I plan to register it for the upcoming commitfest. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 693b3669ba..e503799bd8 100644 --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c @@ -738,7 +738,7 @@ libpqrcv_PQgetResult(PGconn *streamConn) if (rc & WL_LATCH_SET) { ResetLatch(MyLatch); - ProcessWalRcvInterrupts(); + CHECK_FOR_INTERRUPTS(); } /* Consume whatever data is available from the socket */ @@ -1042,7 +1042,7 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres, { char *cstrs[MaxTupleAttributeNumber]; - ProcessWalRcvInterrupts(); + CHECK_FOR_INTERRUPTS(); /* Do the allocations in temporary context. */ oldcontext = MemoryContextSwitchTo(rowcontext); diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 26ded928a7..c53a8e6c89 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -147,39 +147,34 @@ static void XLogWalRcvSendReply(bool force, bool requestReply); static void XLogWalRcvSendHSFeedback(bool immed); static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime); static void WalRcvComputeNextWakeup(WalRcvWakeupReason reason, TimestampTz now); +static void WalRcvShutdownSignalHandler(SIGNAL_ARGS); -/* - * Process any interrupts the walreceiver process may have received. - * This should be called any time the process's latch has become set. - * - * Currently, only SIGTERM is of interest. We can't just exit(1) within the - * SIGTERM signal handler, because the signal might arrive in the middle of - * some critical operation, like while we're holding a spinlock. Instead, the - * signal handler sets a flag variable as well as setting the process's latch. - * We must check the flag (by calling ProcessWalRcvInterrupts) anytime the - * latch has become set. Operations that could block for a long time, such as - * reading from a remote server, must pay attention to the latch too; see - * libpqrcv_PQgetResult for example. - */ void -ProcessWalRcvInterrupts(void) +WalRcvShutdownSignalHandler(SIGNAL_ARGS) { - /* - * Although walreceiver interrupt handling doesn't use the same scheme as - * regular backends, call CHECK_FOR_INTERRUPTS() to make sure we receive - * any incoming signals on Win32, and also to make sure we process any - * barrier events. - */ - CHECK_FOR_INTERRUPTS(); + int save_errno = errno; - if (ShutdownRequestPending) + /* Don't joggle the elbow of proc_exit */ + if (!proc_exit_inprogress) { - ereport(FATAL, -(errcode(ERRCODE_ADMIN_SHUTDOWN), - errmsg("terminating walreceiver process due to administrator command"))); + InterruptPending = true; + ProcDiePending = true; } + + SetLatch(MyLatch); + + errno = save_errno; + } +/* + * Is current process a wal receiver? + */ +bool +IsWalReceiver(void) +{ + return WalRcv != NULL; +} /* Main entry point for walreceiver process */ void @@ -277,7 +272,7 @@ WalReceiverMain(void) pqsignal(SIGHUP, SignalHandlerForConfigReload); /* set flag to read config * file */ pqsignal(SIGINT, SIG_IGN); - pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request shutdown */ + pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown */ /* SIGQUIT handler was already set up by InitPostmasterChild */ pqsignal(SIGALRM, SIG_IGN); pqsignal(SIGPIPE, SIG_IGN); @@ -456,7 +451,7 @@ WalReceiverMain(void) errmsg("cannot continue WAL streaming, recovery has already ended"))); /* Process any requests or s
Re: A typo in a messsage?
At Wed, 27 Dec 2023 13:34:50 +0700, John Naylor wrote in > > Shouldn't the "is" following "LSN" be "in"? > > Pushed. At Wed, 27 Dec 2023 13:35:24 +0700, John Naylor wrote in >> I think "crc" should be in all uppercase in general and a brief >> grep'ing told me that it is almost always or consistently used in >> uppercase in our tree. > I pushed this also. Thanks for pushing them! regads. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_basebackup has an accidentaly separated help message
At Tue, 26 Dec 2023 19:04:53 +0900, Michael Paquier wrote in > On Mon, Dec 25, 2023 at 05:07:28PM +0900, Kyotaro Horiguchi wrote: > > Yes. So, it turns out that they're found after they have been > > committed. > > No problem. I've just applied what you had. I hope this makes your > life a bit easier ;) Thanks for committing this! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_basebackup has an accidentaly separated help message
At Mon, 25 Dec 2023 15:42:41 +0900, Michael Paquier wrote in > On Mon, Dec 25, 2023 at 02:39:16PM +0900, Kyotaro Horiguchi wrote: > > The attached patch contains both of the above fixes. > > Good catches, let's fix them. You have noticed that while translating > these new messages, I guess? Yes. So, it turns out that they're found after they have been committed. Because handling a large volume of translations all at once is daunting, I am maintaining translations locally to avoid that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Should "CRC" be in uppercase?
A new function check_control_file() in pg_combinebackup.c has the following message. > pg_fatal("%s: crc is incorrect", controlpath); I think "crc" should be in all uppercase in general and a brief grep'ing told me that it is almost always or consistently used in uppercase in our tree. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c index b6ae6f2aef..049c60cbf8 100644 --- a/src/bin/pg_combinebackup/pg_combinebackup.c +++ b/src/bin/pg_combinebackup/pg_combinebackup.c @@ -538,7 +538,7 @@ check_control_files(int n_backups, char **backup_dirs) /* Control file contents not meaningful if CRC is bad. */ if (!crc_ok) - pg_fatal("%s: crc is incorrect", controlpath); + pg_fatal("%s: CRC is incorrect", controlpath); /* Can't interpret control file if not current version. */ if (control_file->pg_control_version != PG_CONTROL_VERSION)
Re: pg_basebackup has an accidentaly separated help message
> > printf(_(" -i, --incremental=OLDMANIFEST\n")); > > printf(_(" take incremental backup\n")); > > I'd suggest merging these lines as follows (and the attached patch). > > > + printf(_(" -i, --incremental=OLDMANIFEST\n" > > +" take incremental backup\n")); Sorry, but I found another instance of this. > printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n")); > printf(_("relocate tablespace in OLDDIR to > NEWDIR\n")); The attached patch contains both of the above fixes. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 5795b91261..28d2ee435b 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -396,8 +396,8 @@ usage(void) printf(_("\nOptions controlling the output:\n")); printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n")); printf(_(" -F, --format=p|t output format (plain (default), tar)\n")); - printf(_(" -i, --incremental=OLDMANIFEST\n")); - printf(_(" take incremental backup\n")); + printf(_(" -i, --incremental=OLDMANIFEST\n" + " take incremental backup\n")); printf(_(" -r, --max-rate=RATEmaximum transfer rate to transfer data directory\n" " (in kB/s, or use suffix \"k\" or \"M\")\n")); printf(_(" -R, --write-recovery-conf\n" diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c index b6ae6f2aef..49e97fcca8 100644 --- a/src/bin/pg_combinebackup/pg_combinebackup.c +++ b/src/bin/pg_combinebackup/pg_combinebackup.c @@ -669,8 +669,8 @@ help(const char *progname) printf(_(" -n, --dry-run don't actually do anything\n")); printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n")); printf(_(" -o, --output output directory\n")); - printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n")); - printf(_("relocate tablespace in OLDDIR to NEWDIR\n")); + printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n" + "relocate tablespace in OLDDIR to NEWDIR\n")); printf(_(" --manifest-checksums=SHA{224,256,384,512}|CRC32C|NONE\n" "use algorithm for manifest checksums\n")); printf(_(" --no-manifest suppress generation of backup manifest\n"));
pg_basebackup has an accidentaly separated help message
Hello. pg_basebackup.c: got the following message lines: > printf(_(" -i, --incremental=OLDMANIFEST\n")); > printf(_(" take incremental backup\n")); I'd suggest merging these lines as follows (and the attached patch). > + printf(_(" -i, --incremental=OLDMANIFEST\n" > + " take incremental backup\n")); regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 5795b91261..28d2ee435b 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -396,8 +396,8 @@ usage(void) printf(_("\nOptions controlling the output:\n")); printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n")); printf(_(" -F, --format=p|t output format (plain (default), tar)\n")); - printf(_(" -i, --incremental=OLDMANIFEST\n")); - printf(_(" take incremental backup\n")); + printf(_(" -i, --incremental=OLDMANIFEST\n" + " take incremental backup\n")); printf(_(" -r, --max-rate=RATEmaximum transfer rate to transfer data directory\n" " (in kB/s, or use suffix \"k\" or \"M\")\n")); printf(_(" -R, --write-recovery-conf\n"
A typo in a messsage?
I found the following message introduced by a recent commit. > errdetail("The first unsummarized LSN is this range is %X/%X.", Shouldn't the "is" following "LSN" be "in"? diff --git a/src/backend/backup/basebackup_incremental.c b/src/backend/backup/basebackup_incremental.c index 42bbe564e2..22b861ce52 100644 --- a/src/backend/backup/basebackup_incremental.c +++ b/src/backend/backup/basebackup_incremental.c @@ -575,7 +575,7 @@ PrepareForIncrementalBackup(IncrementalBackupInfo *ib, tle->tli, LSN_FORMAT_ARGS(tli_start_lsn), LSN_FORMAT_ARGS(tli_end_lsn)), -errdetail("The first unsummarized LSN is this range is %X/%X.", +errdetail("The first unsummarized LSN in this range is %X/%X.", LSN_FORMAT_ARGS(tli_missing_lsn; } regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: gai_strerror() is not thread-safe on Windows
At Thu, 7 Dec 2023 09:43:37 +1300, Thomas Munro wrote in > On Tue, Dec 5, 2023 at 3:43 PM Kyotaro Horiguchi > wrote: > > Windows' gai_strerror outputs messages that correspond to the language > > environment. Similarly, I think that the messages that the messages > > returned by our version should be translatable. > > Hmm, that is a good point. Wow, POSIX has given us a terrible > interface here, in terms of resource management. Let's see what glibc > does: > > https://github.com/lattera/glibc/blob/master/sysdeps/posix/gai_strerror.c > https://github.com/lattera/glibc/blob/master/sysdeps/posix/gai_strerror-strs.h It is quite a sight for sore eyes... > It doesn't look like it knows about locales at all. And a test > program seems to confirm: .. > setlocale(LC_MESSAGES, "ja_JP.UTF-8"); > printf("%s\n", gai_strerror(EAI_MEMORY)); > > That prints: > > Memory allocation failure > > FreeBSD tries harder, and prints: > > メモリ割り当て失敗 > > We can see that it has a thread-local variable that holds a copy of > that localised string until the next call to gai_strerror() in the > same thread: > > https://github.com/freebsd/freebsd-src/blob/main/lib/libc/net/gai_strerror.c > https://github.com/freebsd/freebsd-src/blob/main/lib/libc/nls/ja_JP.UTF-8.msg > > FreeBSD's message catalogues would provide a read-made source of > translations, bu... hmm, if glibc doesn't bother and the POSIX > interface is unhelpful and Windows' own implementation is so willfully > unusable, I don't really feel inclined to build a whole thread-local > cache thing on our side just to support this mess. I agree, I wouldn't want to do it either. > So I think we should just hard-code the error messages in English and > move on. However, English is my language so perhaps I should abstain > and leave it to others to decide how important that is. I also think that would be a good way. > > These messages may add extra line-end periods to the parent (or > > cotaining) messages when appended. This looks as follows. > > > > (auth.c:517 : errdetail_log() : sub (detail) message) > > > Could not translate client host name "hoge" to IP address: An address > > > incompatible with the requested protocol was used.. > > > > (hba.c:1562 : errmsg() : main message) > > > invalid IP address "192.0.2.1": This is usually a temporary error during > > > hostname resolution and means that the local server did not receive a > > > response from an authoritative server. > > > > When I first saw the first version, I thought it would be better to > > use Windows' own messages, just like you did. However, considering the > > content of the message above, wouldn't it be better to adhere to > > Linux-style messages overall? > > Yeah, I agree that either the glibc or the FreeBSD messages would be > better than those now that I've seen them. They are short and sweet. > > > A slightly subtler point is that the second example seems to have a > > misalignment between the descriptions before and after the colon, but > > do you think it's not something to be concerned about to this extent? > > I didn't understand what you meant here. If it was just a temporary error that couldn't be resolved, it doesn't mean that the IP address is invalid. If such a cause is possible, then probabyly an error message saying "failed to resolve" would be more appropriate. However, I wrote it meaning that there is no need to go to great length to ensure consistency with this message. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: gai_strerror() is not thread-safe on Windows
At Tue, 5 Dec 2023 08:26:54 +1300, Thomas Munro wrote in > On second thoughts, I guess it would make more sense to use the exact > messages Windows' own implementation would return instead of whatever > we had in the past (probably cribbed from some other OS or just made > up?). I asked CI to spit those out[1]. Updated patch attached. Will > add to CF. > > [1] https://cirrus-ci.com/task/5816802207334400?logs=main#L15 Windows' gai_strerror outputs messages that correspond to the language environment. Similarly, I think that the messages that the messages returned by our version should be translatable. These messages may add extra line-end periods to the parent (or cotaining) messages when appended. This looks as follows. (auth.c:517 : errdetail_log() : sub (detail) message) > Could not translate client host name "hoge" to IP address: An address > incompatible with the requested protocol was used.. (hba.c:1562 : errmsg() : main message) > invalid IP address "192.0.2.1": This is usually a temporary error during > hostname resolution and means that the local server did not receive a > response from an authoritative server. When I first saw the first version, I thought it would be better to use Windows' own messages, just like you did. However, considering the content of the message above, wouldn't it be better to adhere to Linux-style messages overall? A slightly subtler point is that the second example seems to have a misalignment between the descriptions before and after the colon, but do you think it's not something to be concerned about to this extent? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: GUC names in messages
At Thu, 30 Nov 2023 14:59:21 +0900, Michael Paquier wrote in > > Patch 0003 now uses a "%s%s%s" format specifier with GUC_FORMAT macro > > in guc.c, as recently suggested by Michael [1]. > > I cannot think about a better idea as these strings need to be > translated so they need three %s. In this patch, the quotation marks cannot be changed from double quotes. After a brief review of the use of quotation marks in various languages, it's observed that French uses guillemets (« »), German uses lower qutation marks („ “), Spanish uses angular quotation marks (« ») or alternatively, lower quotetaion marks. Japanese commonly uses corner brackets (「」), but can also adopt double or single quotation marks in certain contexts. I took a look at the backend's fr.po file for a trial, and it indeed seems that guillemets are being used. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pg_dump/nls.mk is missing a file
Hello. Upon reviewing my translation, I discovered that filter.c was not included in the nls.mk of pg_dump. Additional it appears that two '.h' files have been included for a long time, but they seem unnecessary as their removal does not affect the results. The attached patch is intended to correct these issues. For Meson, on the other hand, I believe there is nothing in particular that needs to be done. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_dump/nls.mk b/src/bin/pg_dump/nls.mk index cd91737f48..ca7ce74275 100644 --- a/src/bin/pg_dump/nls.mk +++ b/src/bin/pg_dump/nls.mk @@ -15,13 +15,12 @@ GETTEXT_FILES= $(FRONTEND_COMMON_GETTEXT_FILES) \ compress_zstd.c \ pg_dump.c \ common.c \ + filter.c \ pg_dump_sort.c \ pg_restore.c \ pg_dumpall.c \ parallel.c \ - parallel.h \ pg_backup_utils.c \ - pg_backup_utils.h \ ../../common/compression.c \ ../../common/exec.c \ ../../common/fe_memutils.c \
Re: about help message for new pg_dump's --filter option
At Thu, 30 Nov 2023 10:20:40 +0900 (JST), Kyotaro Horiguchi wrote in > Hello. > > Recently, a new --filter option was added to pg_dump. I might be > wrong, but the syntax of the help message for this feels off. Is the > word 'on' not necessary after 'based'? > > > --filter=FILENAMEinclude or exclude objects and data from dump > > based expressions in FILENAME Hmm. A similar message is spelled as "based on expression". Thus, the attached correct this message in this line. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 57c6836b88..eea320b731 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -1119,7 +1119,7 @@ help(const char *progname) " including child and partition tables\n")); printf(_(" --extra-float-digits=NUM override default setting for extra_float_digits\n")); printf(_(" --filter=FILENAMEinclude or exclude objects and data from dump\n" - " based expressions in FILENAME\n")); + " based on expressions in FILENAME\n")); printf(_(" --if-exists use IF EXISTS when dropping objects\n")); printf(_(" --include-foreign-data=PATTERN\n" " include data of foreign tables on foreign\n"
Extra periods in pg_dump messages
Sorry for the sequential mails. In the bleeding-edge version of pg_dump, when a conditionspecifying an index, for example, is described in an object filter file, the following message is output. However, there is a period at the end of the line. Shouldn't this be removed? $ pg_dump --filter=/tmp/hoge.filter pg_dump: error: invalid format in filter read from "/tmp/hoge.filter" on line 1: include filter for "index" is not allowed. The attached patch includes modifications related to the calls to pg_log_filter_error(). regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 57c6836b88..ce50566c3a 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -18812,7 +18812,7 @@ read_dump_filters(const char *filename, DumpOptions *dopt) case FILTER_OBJECT_TYPE_TABLE_DATA: case FILTER_OBJECT_TYPE_TABLE_DATA_AND_CHILDREN: case FILTER_OBJECT_TYPE_TRIGGER: - pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."), + pg_log_filter_error(, _("%s filter for \"%s\" is not allowed"), "include", filter_object_type_name(objtype)); exit_nicely(1); @@ -18851,7 +18851,7 @@ read_dump_filters(const char *filename, DumpOptions *dopt) case FILTER_OBJECT_TYPE_TRIGGER: case FILTER_OBJECT_TYPE_EXTENSION: case FILTER_OBJECT_TYPE_FOREIGN_DATA: - pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."), + pg_log_filter_error(, _("%s filter for \"%s\" is not allowed"), "exclude", filter_object_type_name(objtype)); exit_nicely(1); diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c index 1b974cf7e8..92389353a4 100644 --- a/src/bin/pg_dump/pg_dumpall.c +++ b/src/bin/pg_dump/pg_dumpall.c @@ -1969,7 +1969,7 @@ read_dumpall_filters(const char *filename, SimpleStringList *pattern) { if (comtype == FILTER_COMMAND_TYPE_INCLUDE) { - pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."), + pg_log_filter_error(, _("%s filter for \"%s\" is not allowed"), "include", filter_object_type_name(objtype)); exit_nicely(1); @@ -1989,7 +1989,7 @@ read_dumpall_filters(const char *filename, SimpleStringList *pattern) case FILTER_OBJECT_TYPE_SCHEMA: case FILTER_OBJECT_TYPE_TABLE: case FILTER_OBJECT_TYPE_TABLE_AND_CHILDREN: -pg_log_filter_error(, _("unsupported filter object.")); +pg_log_filter_error(, _("unsupported filter object")); exit_nicely(1); break; diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c index 1459e02263..c3beacdec1 100644 --- a/src/bin/pg_dump/pg_restore.c +++ b/src/bin/pg_dump/pg_restore.c @@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts) case FILTER_OBJECT_TYPE_DATABASE: case FILTER_OBJECT_TYPE_EXTENSION: case FILTER_OBJECT_TYPE_FOREIGN_DATA: - pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."), + pg_log_filter_error(, _("%s filter for \"%s\" is not allowed"), "include", filter_object_type_name(objtype)); exit_nicely(1); @@ -581,7 +581,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts) case FILTER_OBJECT_TYPE_TABLE: case FILTER_OBJECT_TYPE_TABLE_AND_CHILDREN: case FILTER_OBJECT_TYPE_TRIGGER: - pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."), + pg_log_filter_error(, _("%s filter for \"%s\" is not allowed"), "exclude", filter_object_type_name(objtype)); exit_nicely(1);
about help message for new pg_dump's --filter option
Hello. Recently, a new --filter option was added to pg_dump. I might be wrong, but the syntax of the help message for this feels off. Is the word 'on' not necessary after 'based'? > --filter=FILENAMEinclude or exclude objects and data from dump > based expressions in FILENAME regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: initdb --no-locale=C doesn't work as specified when the environment is not C
At Wed, 22 Nov 2023 11:04:01 -0500, Tom Lane wrote in > Kyotaro Horiguchi writes: > > Commit 3e51b278db leaves lc_* conf lines as commented-out when > > their value is "C". This leads to the following behavior. > > Hmm ... I see a contributing factor here: this bit in > postgresql.conf.sample is a lie: > > #lc_messages = 'C'# locale for system error message > # strings > > A look in guc_tables.c shows that the actual default is '' (empty > string), which means "use the environment", and that matches how the > variable is documented in config.sgml. Somebody --- quite possibly me > --- was misled by the contents of postgresql.conf.sample into thinking > that the lc_xxx GUCs all default to C, when that's only true for the > others. It seems somewhat intentional that only lc_messages references the environment at boot time. On the other hand, previously, in the absence of a specified locale, initdb would embed the environmental value in the configuration file, as it seems to be documented. Given that initdb is always used for cluster creation, it's unlikey that systems depend on this boot-time default for their operation. > I think that a more correct fix for this would treat lc_messages > differently from the other lc_xxx GUCs. Maybe just eliminate the > hack about not substituting "C" for that one? For example, the --no-locale option for initdb is supposed to set all categories to 'C'. That approach would lead to the postgres referencing the runtime environment for all categories except lc_messages, which I believe contradicts the documentation. In my biew, if lc_messages is exempted from that hack, then all other categories should be similarly excluded as in the second approach among the attached in the previous mail. > In any case, we need to fix this mistake in postgresql.conf.sample. If you are not particularly concerned about the presence of quotation marks, I think it would be fine to go with the second approach and make the necessary modification to the configuration file accordingly. With the attached patch, initdb --no-locale generates the following lines in the configuration file. > lc_messages = C # locale for system error > message > # strings > lc_monetary = C # locale for monetary formatting > lc_numeric = C# locale for number formatting > lc_time = C # locale for time formatting By the way, the lines around lc_* in the sample file seem to have somewhat inconsistent indentations. Wouldnt' it be preferable to fix this? (The attached doesn't that.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index e48c066a5b..133dd3da7d 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -726,7 +726,7 @@ # encoding # These settings are initialized by initdb, but they can be changed. -#lc_messages = 'C' # locale for system error message +#lc_messages = '' # locale for system error message # strings #lc_monetary = 'C' # locale for monetary formatting #lc_numeric = 'C' # locale for number formatting diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 0c6f5ceb0a..646e8f29f6 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -1226,25 +1226,17 @@ setup_config(void) conflines = replace_guc_value(conflines, "shared_buffers", repltok, false); - /* -* Hack: don't replace the LC_XXX GUCs when their value is 'C', because -* replace_guc_value will decide not to quote that, which looks strange. -*/ - if (strcmp(lc_messages, "C") != 0) - conflines = replace_guc_value(conflines, "lc_messages", - lc_messages, false); + conflines = replace_guc_value(conflines, "lc_messages", + lc_messages, false); - if (strcmp(lc_monetary, "C") != 0) - conflines = replace_guc_value(conflines, "lc_monetary", - lc_monetary, false); + conflines = replace_guc_value(conflines, "lc_monetary", +
Re: Make mesage at end-of-recovery less scary.
Anyway, this requires rebsaing, and done. Thanks for John (Naylor) for pointing this out. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From e56f1f24523e3e562a4db166dfeaadc79fd7b27a Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 7 Mar 2023 14:55:58 +0900 Subject: [PATCH v27] Make End-Of-Recovery error less scary When recovery in any type ends, we see a bit scary error message like "invalid record length" that suggests something serious is happening. Actually if recovery meets a record with length = 0, that usually means it finished applying all available WAL records. Make this message less scary as "reached end of WAL". Instead, raise the error level for other kind of WAL failure to WARNING. To make sure that the detection is correct, this patch checks if all trailing bytes in the same page are zeroed in that case. --- src/backend/access/transam/xlogreader.c | 134 ++ src/backend/access/transam/xlogrecovery.c | 94 +++ src/backend/replication/walreceiver.c | 7 +- src/bin/pg_waldump/pg_waldump.c | 13 ++- src/include/access/xlogreader.h | 1 + src/test/recovery/t/035_recovery.pl | 130 + 6 files changed, 327 insertions(+), 52 deletions(-) create mode 100644 src/test/recovery/t/035_recovery.pl diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index e0baa86bd3..ce65f99c60 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -48,6 +48,8 @@ static int ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen); static void XLogReaderInvalReadState(XLogReaderState *state); static XLogPageReadResult XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking); +static bool ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr, + XLogRecord *record); static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr, XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess); static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record, @@ -149,6 +151,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir, pfree(state); return NULL; } + state->EndOfWAL = false; state->errormsg_buf[0] = '\0'; /* @@ -525,6 +528,7 @@ XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking) /* reset error state */ state->errormsg_buf[0] = '\0'; decoded = NULL; + state->EndOfWAL = false; state->abortedRecPtr = InvalidXLogRecPtr; state->missingContrecPtr = InvalidXLogRecPtr; @@ -608,16 +612,12 @@ restart: Assert(pageHeaderSize <= readOff); /* - * Read the record length. + * Verify the record header. * * NB: Even though we use an XLogRecord pointer here, the whole record - * header might not fit on this page. xl_tot_len is the first field of the - * struct, so it must be on this page (the records are MAXALIGNed), but we - * cannot access any other fields until we've verified that we got the - * whole header. + * header might not fit on this page. */ record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ); - total_len = record->xl_tot_len; /* * If the whole record header is on this page, validate it immediately. @@ -636,19 +636,19 @@ restart: } else { - /* There may be no next page if it's too small. */ - if (total_len < SizeOfXLogRecord) - { - report_invalid_record(state, - "invalid record length at %X/%X: expected at least %u, got %u", - LSN_FORMAT_ARGS(RecPtr), - (uint32) SizeOfXLogRecord, total_len); + /* + * xl_tot_len is the first field of the struct, so it must be on this + * page (the records are MAXALIGNed), but we cannot access any other + * fields until we've verified that we got the whole header. + */ + if (!ValidXLogRecordLength(state, RecPtr, record)) goto err; - } - /* We'll validate the header once we have the next page. */ + gotheader = false; } + total_len = record->xl_tot_len; + /* * Try to find space to decode this record, if we can do so without * calling palloc. If we can't, we'll try again below after we've @@ -1091,25 +1091,81 @@ XLogReaderInvalReadState(XLogReaderState *state) state->readLen = 0; } +/* + * Validate record length of an XLOG record header. + * + * This is substantially a part of ValidXLogRecordHeader. But XLogReadRecord + * needs this separate from the function in case of a partial record header. + * + * Returns true if the xl_tot_len header field has a seemingly valid value, + * which means the caller can proceed reading to the following part of the + * record. + */ +static bool +ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr, + XLogRecord *record) +{ + if (record->xl_tot_len == 0) + { + char *p; + char *pe; + + /* + * We are almost
initdb --no-locale=C doesn't work as specified when the environment is not C
Commit 3e51b278db leaves lc_* conf lines as commented-out when their value is "C". This leads to the following behavior. $ echo LANG ja_JP.UTF8 $ initdb --no-locale hoge $ grep lc_ hoge/postgresql.conf #lc_messages = 'C' # locale for system error message #lc_monetary = 'C' # locale for monetary formatting #lc_numeric = 'C' # locale for number formatting #lc_time = 'C' # locale for time formatting In this scenario, the postmaster ends up emitting log massages in Japanese, which contradicts the documentation. https://www.postgresql.org/docs/devel/app-initdb.html > --locale=locale > Sets the default locale for the database cluster. If this option is > not specified, the locale is inherited from the environment that > initdb runs in. Locale support is described in Section 24.1. > .. > --lc-messages=locale > Like --locale, but only sets the locale in the specified category. Here's a somewhat amusing case: $ echo LANG ja_JP.UTF8 $ initdb --lc_messages=C $ grep lc_ hoge/postgresql.conf #lc_messages = 'C' # locale for system error message lc_monetary = 'ja_JP.UTF8' # locale for monetary formatting lc_numeric = 'ja_JP.UTF8' # locale for number formatting lc_time = 'ja_JP.UTF8' # locale for time formatting Hmm. it seems that initdb replaces the values of all categories *except the specified one*. This behavior seems incorrect to me. initdb should replace the value when explicitly specified in the command line. If you use -c lc_messages=C, it does perform the expected behavior to some extent, but I believe this is a separate matter. I have doubts about not replacing these lines for purely cosmetic reasons. In this mail, I've attached three possible solutions for the original issue: the first one enforces replacement only when specified on the command line, the second one simply always performs replacement, and the last one addresses the concern about the absence of quotes around "C" by allowing explicit specification. (FWIW, I prefer the last one.) What do you think about these? regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 0c6f5ceb0a..56a8c5b60e 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -144,6 +144,10 @@ static char *lc_monetary = NULL; static char *lc_numeric = NULL; static char *lc_time = NULL; static char *lc_messages = NULL; +static bool lc_monetary_specified = false; +static bool lc_numeric_specified = false; +static bool lc_time_specified = false; +static bool lc_messages_specified = false; static char locale_provider = COLLPROVIDER_LIBC; static char *icu_locale = NULL; static char *icu_rules = NULL; @@ -1230,19 +1234,19 @@ setup_config(void) * Hack: don't replace the LC_XXX GUCs when their value is 'C', because * replace_guc_value will decide not to quote that, which looks strange. */ - if (strcmp(lc_messages, "C") != 0) + if (strcmp(lc_messages, "C") != 0 || lc_messages_specified) conflines = replace_guc_value(conflines, "lc_messages", lc_messages, false); - if (strcmp(lc_monetary, "C") != 0) + if (strcmp(lc_monetary, "C") != 0 || lc_monetary_specified) conflines = replace_guc_value(conflines, "lc_monetary", lc_monetary, false); - if (strcmp(lc_numeric, "C") != 0) + if (strcmp(lc_numeric, "C") != 0 || lc_numeric_specified) conflines = replace_guc_value(conflines, "lc_numeric", lc_numeric, false); - if (strcmp(lc_time, "C") != 0) + if (strcmp(lc_time, "C") != 0 || lc_time_specified) conflines = replace_guc_value(conflines, "lc_time", lc_time, false); @@ -2374,6 +2378,15 @@ setlocales(void) icu_locale = locale; } + /* +* At this point, null means that the value is not specifed in the command +* line. +*/ + if (lc_numeric != NULL) lc_numeric_specified = true; + if (lc_time != NULL) lc_time_specified = true; + if (lc_monetary != NULL) lc_monetary_specified = true; + if (lc_messages != NULL) lc_messages_specified = true; + /* * canonicalize locale names, and obtain any missing values from our * current environment diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 0c6f5ceb0a..646e8f2
Re: A recent message added to pg_upgade
At Thu, 9 Nov 2023 12:00:59 +0530, Amit Kapila wrote in > I have also proposed that as one of the alternatives but didn't get > many votes. And, I think if the user is passing a special value of > max_slot_wal_keep_size during the upgrade, it has to be a special > case, and rejecting it upfront doesn't seem unreasonable to me. Oops. Sorry, and understood. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: A recent message added to pg_upgade
At Thu, 9 Nov 2023 09:53:07 +0530, Amit Kapila wrote in > Michael, Horiguchi-San, and others, do you have any thoughts on what > is the best way to proceed? As I previously mentioned, I believe that if rejection is to be the course of action, it would be best to proceed with it sooner rather than later. On the other hand, I am concerned about the need for users to perform extra steps depending on the source cluster conrfiguration. Therefore, another possible approach could be to simply ignore the given settings in the assignment hook rather than rejecting by the check hook, and forcibuly apply -1. What do you think about this third approach? I haven't checked this with pg_upgrade, but a standalone postmaster would emit the following messages. > postgres -b -c max_slot_wal_keep_size=-1 > LOG: "max_slot_wal_keep_size" is foced to set to -1 during binary upgrade > mode. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b541be8eec..319d4f5a81 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2063,6 +2063,31 @@ check_wal_segment_size(int *newval, void **extra, GucSource source) return true; } +/* + * GUC check_hook for max_slot_wal_keep_size + * + * If WALs needed by logical replication slots are deleted, these slots become + * inoperable. During a binary upgrade, pg_upgrade sets this variable to -1 via + * the command line in an attempt to prevent such deletions, but users have + * ways to override it. To ensure the successful completion of the upgrade, + * it's essential to keep this variable unaltered. See + * InvalidatePossiblyObsoleteSlot() and start_postmaster() in pg_upgrade for + * more details. + */ +bool +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) +{ + if (IsBinaryUpgrade && *newval != -1) + { + ereport(LOG, + errmsg("\"%s\" is foced to set to -1 during binary upgrade mode.", + "max_slot_wal_keep_size")); + *newval = -1; + } + + return true; +} + /* * At a checkpoint, how many WAL segments to recycle as preallocated future * XLOG segments? Returns the highest segment that should be preallocated. diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 99823df3c7..5c3d2b1082 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1424,18 +1424,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, SpinLockRelease(>mutex); /* -* The logical replication slots shouldn't be invalidated as -* max_slot_wal_keep_size GUC is set to -1 during the upgrade. -* -* The following is just a sanity check. +* check_max_slot_wal_keep_size() ensures max_slot_wal_keep_size is set +* to -1, so, slot invalidation for logical slots shouldn't happen +* during an upgrade. At present, only logical slots really require +* this. */ - if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) - { - ereport(ERROR, - errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("replication slots must not be invalidated during the upgrade"), - errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); - } + Assert (!(*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)); if (active_pid != 0) { diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 7605eff9b9..818ffdb2ae 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -2845,7 +2845,7 @@ struct config_int ConfigureNamesInt[] = }, _slot_wal_keep_size_mb, -1, -1, MAX_KILOBYTES, - NULL, NULL, NULL + check_max_slot_wal_keep_size, NULL, NULL }, { diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h index 2a191830a8..3d74483f44 100644 --- a/src/include/utils/guc_hooks.h +++ b/src/include/utils/guc_hooks.h @@ -84,6 +84,8 @@ extern bool check_maintenance_io_concurrency(int *newval, void **extra, extern void assign_maintenance_io_concurrency(int newval, void *extra); extern bool check_max_connections(int *newval, void **extra, GucSource source); extern bool check_max_wal_senders(int *newval, void **extra, GucSource source); +extern bool check_max_slot_wal_
Re: GUC names in messages
At Thu, 9 Nov 2023 12:55:44 +1100, Peter Smith wrote in > The most common pattern there is "You might need to increase %s.". .. > Here is a patch to modify those other similar variations so they share > that common wording. > > PSA. I'm uncertain whether the phrases "Consider doing something" and "You might need to do something" are precisely interchangeable. However, (for me..) it appears that either phrase could be applied for all messages that the patch touches. In short, I'm fine with the patch. By the way, I was left scratching my head after seeing the following message. >ereport(PANIC, >(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), > - errmsg("could not find free replication state, increase > max_replication_slots"))); Being told to increase max_replication_slots in a PANIC message feels somewhat off to me. Just looking at the message, it seems unconvincing to increase "slots" because there is a lack of "state". So, I poked around in the code and found the following comment: > ReplicationOriginShmemSize(void) > { > ... > /* > * XXX: max_replication_slots is arguably the wrong thing to use, as here > * we keep the replay state of *remote* transactions. But for now it seems > * sufficient to reuse it, rather than introduce a separate GUC. > */ I haven't read the related code, but if my understanding based on this comment is correct, wouldn't it mean that a lack of storage space for the state at the location outputting the message indicates a bug in the program, not a user configuration error? In other words, isn't this message something that at least shouldn't be a user-facing message, and might it be more appropriate to use an Assert instead? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add new option 'all' to pg_stat_reset_shared()
At Mon, 6 Nov 2023 14:00:13 +0530, Bharath Rupireddy wrote in > On Mon, Nov 6, 2023 at 11:39 AM torikoshia wrote: > > Since each stats, except wal_prefetch was reset acquiring LWLock, > > attached PoC patch makes the call atomic by using these LWlocks. > > > > If this is the right direction, I'll try to make wal_prefetch also take > > LWLock. I must say, I have reservations about this approach. The main concern is the duplication of reset code, which has been efficiently encapsulated for individual targets, into this location. This practice degrades the maintainability and clarity of the code. > Well, that's a total of ~17 LWLocks this new function takes to make > the stats reset atomic. I'm not sure if this atomicity is worth the > effort which can easily be misused - what if someone runs something > like SELECT pg_stat_reset_shared() FROM generate_series(1, > 10n); to cause heavy lock acquisition and release cycles? ... > IMV, atomicity is not something that applies for the stats reset > operation because stats are approximate numbers by nature after all. > If the pg_stat_reset_shared() resets stats for only a bunch of stats > types and fails, it's the basic application programming style that > when a query fails it's the application that needs to have a retry > mechanism. FWIW, the atomicity doesn't apply today if someone wants to > reset stats in a loop for all stats types. The infrequent use of this feature, coupled with the fact that there is no inherent need for these counters to be reset simultaneoulsy, leads me to think that there is little point in cnetralizing the locks. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Intermittent failure with t/003_logical_slots.pl test on windows
At Mon, 6 Nov 2023 19:42:21 +0530, Nisha Moond wrote in > > Appending '2>&1 test: > > The command still results in NULL and ends up failing as no data is > > returned. Which means even no error message is returned. The error log Thanks for confirmation. So, at least the child process was launced successfully in the cmd.exe's view. Upon a quick check on my end with Windows' _popen, I have obseved the following: - Once a child process is started, it seems to go undetected as an error by _popen or subsequent fgets calls if the process ends abnormally, with a non-zero exit status or even with a SEGV. - After the child process has flushed data to stdout, it is possible to read from the pipe even if the child process crashes or ends thereafter. - Even if fgets is called before the program starts, it will correctly block until the program outputs something. Specifically, when I used popen("sleep 5 & target.exe") and immediately performed fgets on the pipe, I was able to read the output of target.exe as the first line. Therefore, based on the information available, it is conceivable that the child process was killed by something right after it started, or the program terminated on its own without any error messages. By the way, in the case of aforementioned SEGV, Application Errors corresponding to it were identifiable in the Event Viewer. Additionally, regarding the exit statuses, they can be captured by using a wrapper batch file (.bat) that records %ERRORLEVEL% after running the target program. This may yield insights, aothough its effectiveness is not guaranteed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: "box" type description
At Wed, 1 Nov 2023 11:36:01 -0400, Bruce Momjian wrote in > On Wed, Mar 31, 2021 at 01:43:47PM +0200, Christoph Berg wrote: > > Re: Kyotaro Horiguchi > > I like that because it points to the "point" syntax so users can > > figure out how to spell a box. > > I liked Horiguchi-san's patch from 2021, but once I started looking > further, I found a number of improvements that can be made in the \dTS > output beyond Horiguchi-san's changes: > > * boolean outputs 't'/'f', not 'true'/'false' > * Added "format" ... for output > * tid output format was at the start, not the end > * I didn't add space between point x,y because the output has no space > * I spelled out "point" instead of "pt" > * "line" has two very different input formats so I listed both >(more different than others like boolean) > * I didn't see the need to say "datatype" for LSN and UUID > * I improved the txid_snapshot description > * There was no description for table_am_handler > > Patch attached. Thank you for continuing this. The additional changes looks fine. Upon reviewing the table again in this line, further potential improvements and issues have been found. For example: character, varchar: don't follow the rule. - 'char(length)' blank-padded string, fixed storage length + blank-padded string, fixed storage length, format 'char(length)' interval: doesn't follow the rule. - @ , time interval + time interval, format '[@] ' (I think '@' is not necessary here..) pg_snapshot: The description given is just "snapshot", which seems overly simplistic. txid_snapshot: The description reads "transaction snapshot". Is this really accurate, especially in contrast with pg_snapshot? pg_brin_bloom_summary, pg_brin_minmax_multi_summary, pg_mcv_list and many: I'm uncertain whether these types, which lack an input syntax (but have an output format), qualify as pseudo-types. Nevertheless, I believe it would be beneficial to describe that those types differ from ordinary types. Should we consider refining these descriptions in the table? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Intermittent failure with t/003_logical_slots.pl test on windows
At Tue, 31 Oct 2023 18:11:48 +0530, vignesh C wrote in > Few others are also facing this problem with similar code like in: > https://stackoverflow.com/questions/15882799/fgets-returning-error-for-file-returned-by-popen I'm inclined to believe that the pipe won't enter the EOF state until the target command terminates (then the top-level cmd.exe). The target command likely terminated prematurely due to an error before priting any output. If we append "2>&1" to the command line, we can capture the error message through the returned pipe if any. Such error messages will cause the subsequent code to fail with an error such as "unexpected string: 'the output'". I'm not sure, but if this is permissive, the returned error messages could potentially provide insight into the underlying issue, paving the way for a potential solution. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: A recent message added to pg_upgade
Thanks you for the comments! At Wed, 1 Nov 2023 18:08:19 +1100, Peter Smith wrote in > Hi, here are some minor review comments for the v3 patch. > > == > src/backend/access/transam/xlog.c > I asked ChatGPT to suggest alternative wording for that comment, and > it came up with something that I felt was a slight improvement. > > SUGGESTION > ... > If WALs needed by logical replication slots are deleted, these slots > become inoperable. During a binary upgrade, pg_upgrade sets this > variable to -1 via the command line in an attempt to prevent such > deletions, but users have ways to override it. To ensure the > successful completion of the upgrade, it's essential to keep this > variable unaltered. > ... > > ~~~ ChatGPT seems to tend to generate sentences in a slightly different from our usual writing. While I tried to retain the original phrasing in the patch, I don't mind using the suggested version. Used as is. > 2. > + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set to -1 > during binary upgrade mode."); > Some of the other GUC_check_errdetail()'s do not include the GUC name > in the translatable message text. Isn't that a preferred style? > SUGGESTION > GUC_check_errdetail("\"%s\" must be set to -1 during binary upgrade mode.", > "max_slot_wal_keep_size"); I believe that that style was adopted to minimize translatable messages by consolidting identical ones that only differ in variable names. I see both versions in the tree. I didn't find necessity to adopt this approach for this specific message, especially since I'm skeptical about adding new messages that end with "must be set to -1 during binary upgrade mode". (pg_upgrade sets synchronous_commit, fsync and full_page_writes to "off".) However, some unique messages are in this style, so I'm fine with using that style. Revised accordingly. > == > src/backend/replication/slot.c > > 3. InvalidatePossiblyObsoleteSlot > + Assert (!*invalidated || !SlotIsLogical(s) || !IsBinaryUpgrade); > > IMO new Assert became trickier to understand than the original condition. > YMMV. > > SUGGESTION > Assert(!(*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)); Yeah, I also liked that style and considered using it, but I didn't feel it was too hard to read in this particular case, so I ended up using the current way. Just like with the point of other comments, I'm not particularly attached to this style. Thus if someone find it difficult to read, I have no issue with changing it. Revised as suggested. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b541be8eec..46833f6ecd 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2063,6 +2063,29 @@ check_wal_segment_size(int *newval, void **extra, GucSource source) return true; } +/* + * GUC check_hook for max_slot_wal_keep_size + * + * If WALs needed by logical replication slots are deleted, these slots become + * inoperable. During a binary upgrade, pg_upgrade sets this variable to -1 via + * the command line in an attempt to prevent such deletions, but users have + * ways to override it. To ensure the successful completion of the upgrade, + * it's essential to keep this variable unaltered. See + * InvalidatePossiblyObsoleteSlot() and start_postmaster() in pg_upgrade for + * more details. + */ +bool +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) +{ + if (IsBinaryUpgrade && *newval != -1) + { + GUC_check_errdetail("\"%s\" must be set to -1 during binary upgrade mode.", + "max_slot_wal_keep_size"); + return false; + } + return true; +} + /* * At a checkpoint, how many WAL segments to recycle as preallocated future * XLOG segments? Returns the highest segment that should be preallocated. diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 99823df3c7..5c3d2b1082 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1424,18 +1424,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, SpinLockRelease(>mutex); /* -* The logical replication slots shouldn't be invalidated as -* max_slot_wal_keep_size GUC is set to -1 during the upgrade. -* -* The following is just a sanity check. +* check_max_slot_wal_keep_size() ensures max_slot_wal_keep_size is set +* to -1, so, slot invalidation for logical slots shouldn't happen +* d
Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
At Wed, 1 Nov 2023 08:39:17 +0900, Michael Paquier wrote in > See in StartupXLOG(), around the comment "complain if we did not roll > forward far enough to reach". This complains if archive recovery has > been requested *or* if we retrieved a backup end LSN from the > backup_label. Please note that backupStartPoint is not reset even when reaching the backup end point during crash recovery. If backup_label enforces archive recovery, I think this point won't be an issue as you mentioned. For the record, my earlier proposal aimed to detect reaching the end point even during crash recovery. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Wrong function name in pgstatfuncs.c
Since pgstatfuncs.c was refactored, the comments for synthesized function names are significant to find the function body. I happened to find a misspelling among the function name comments. "pg_stat_get_mods_since_analyze" should be "pg_stat_get_mod_since_analyze". Upon checking the file using a rudimentary script, I found no other similar mistakes in the same file. (FWIW, I also feel that these macros might be going a bit too far by synthesizing even the function names.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 28ee97968b..1fb8b31863 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -78,7 +78,7 @@ PG_STAT_GET_RELENTRY_INT64(ins_since_vacuum) /* pg_stat_get_live_tuples */ PG_STAT_GET_RELENTRY_INT64(live_tuples) -/* pg_stat_get_mods_since_analyze */ +/* pg_stat_get_mod_since_analyze */ PG_STAT_GET_RELENTRY_INT64(mod_since_analyze) /* pg_stat_get_numscans */
Re: A recent message added to pg_upgade
At Mon, 30 Oct 2023 14:55:01 +0530, Bharath Rupireddy wrote in > > I get it. I agree to go with just the assert because the GUC > > check_hook kinda tightens the screws against setting > > max_slot_wal_keep_size to a value other than -1 during the binary > > upgrade, Thanks for being on the same page. > A few comments on your inhibit_m_s_w_k_s_during_upgrade_2.txt: > > 1. > + * All WAL files on the publisher node must be retained during an upgrade to > + * maintain connections from downstreams. While pg_upgrade explicitly sets > > It's not just the publisher, anyone using logical slots. Also, no > downstream please. If you want, you can repurpose the comment that's > added by 29d0a77f. > > /* > * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the > * checkpointer process. If WALs required by logical replication slots > * are removed, the slots are unusable. This setting prevents the > * invalidation of slots during the upgrade. We set this option when > * cluster is PG17 or later because logical replication slots can only be > * migrated since then. Besides, max_slot_wal_keep_size is added in PG13. > */ It is helpful. Thanks! > 2. > At present, only logical slots really require > + * this. > > Can we remove the above comment as the code with SlotIsLogical(s) > explains it all? max_slot_wal_keep_size affects both logical and physical slots. Therefore, if we are interested only one of the two types of slots, it's important to clarify the rationale. Regardless of any potential exntension to physical slots, I believe it's essential to clarify the rationale. I couldn't determine from that extensive thread whether there's a possible extension to physical slots. Could you inform me if such an extension can happen and, if not, provide the reason? > 3. > +GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set > to -1 during the upgrade."); > +return false; > > How about we be explicit like the following which helps users reason > about this restriction instead of them looking at the comments/docs? > > GUC_check_errcode(ERRCODE_INVALID_PARAMETER_VALUE); > GUC_check_errmsg(""\"max_slot_wal_keep_size\" must be set > to -1 when in binary upgrade mode"); > GUC_check_errdetail("A value of -1 prevents the removal of > WAL required for logical slots upgrade."); > return false; I don't quite see the reason to provide such a detailed explanation just for this error. Additionally, since this check is performed regardless of the presence or absense of logical slots, I think the errdetail message might potentially confuse those whosee it. Adding "binary" looks fine as is and done in the attached. > 4. I think a test case to hit the error in the check hook in > 003_logical_slots.pl will help a lot here - not only covers the code > but also helps demonstrate how one can reach the error. Yeah, of course. I was planning to add tests once the direction of the discussion became clear. I will add them in the next version. > 5. I think the check_hook is better defined in xlog.c the place where > it's actually being declared and in action. IMV, there's no reason for > it to be in slot.c as it doesn't deal with any slot related > variables/structs. This also avoids an unnecessary "utils/guc_hooks.h" > inclusion in slot.c. > +bool > +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) > +{ Sounds reasonable. Moved. I simply moved it to xlog.c, but the function comment was thoroughly written only for this moved function, making it somewhat stand out.. > 5. A possible problem with this check_hook approach is that it doesn't > let anyone setting max_slot_wal_keep_size to a value other than -1 > during pg_ugprade even if someone doesn't have logical slots or > doesn't want to upgrade logical slots in which case the WAL file > growth during pg_upgrade may be huge (transiently) unless the > pg_resetwal step of pg_upgrade removes it at the end. While I doubt anyone wishes to set the variable to a specific value during upgrade, think there are individuals who might be reluctant to edit the config file due to unclear reasons. While we could consider an alternative - checking for logical slots during binary upgrade- it's debatable if the effort is justified. (I haven't verified its feasibility, however.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b541be8eec..d83b55da68 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2063,6 +2063,27 @@ check_wal_segment_
Re: Add new option 'all' to pg_stat_reset_shared()
At Mon, 30 Oct 2023 14:15:53 +0530, Bharath Rupireddy wrote in > > > I imagine there are cases where people want to initialize all of them > > > at the same time in addition to initializing one at a time. > > > > FWIW, I fairly often wanted it, but forgot about that:p > > Isn't calling pg_stat_reset_shared() for all stats types helping you > out? Is there any problem with it? Can you be more specific about the > use-case? Oh. Sorry, it's my bad. pg_stat_reset_shared() is sufficient. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add new option 'all' to pg_stat_reset_shared()
At Mon, 30 Oct 2023 16:35:19 +0900, torikoshia wrote in > Hi, > > After 96f052613f3, we have below 6 types of parameter for > pg_stat_reset_shared(). > > "archiver", "bgwriter", "checkpointer", "io", "recovery_prefetch", > "wal" > > How about adding a new option 'all' to delete all targets above? > > I imagine there are cases where people want to initialize all of them > at the same time in addition to initializing one at a time. FWIW, I fairly often wanted it, but forgot about that:p regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: A recent message added to pg_upgade
At Mon, 30 Oct 2023 12:38:47 +0530, Bharath Rupireddy wrote in > Will the check_hook approach work correctly? I haven't checked that by > myself, but I see InitializeGUCOptions() getting called before > IsBinaryUpgrade is set to true and the passed-in config options ('c') > are parsed. I'm not sure about the wanted behavior exactly, but the fast you pointed doesn't matter because the check is required after parsing the command line options. On the other hand, I'm not sure about the behavior that a setting in postgresql.conf is rejected. > If the check_hook approach works correctly, I think we must add a test > hitting the error in check_max_slot_wal_keep_size for the > IsBinaryUpgrade case. And, I agree with Amit to have a detailed > messaging with GUC_check_errmsg/GUC_check_errdetail. Also, IMV, > leaving the error message in InvalidatePossiblyObsoleteSlot() there > (if required with a better wording as discussed initially in this > thread) does no harm. Actually, it acts as another safety net given > that max_slot_wal_keep_size GUC is reloadable via SIGHUP. The error message, which is deemed impossible, adds an additional message translation. In another thread, we are discussing the reduction of translatable messages. Therefore, I suggest using elog() for the condition at the very least. Whether it should be elog() or Assert() remains open for discussion, as I don't have a firm stance on it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center