from:"Kyotaro Horiguchi"

Re: Allow non-superuser to cancel superuser tasks.

2024-04-11 Thread Kyotaro Horiguchi

At Fri, 12 Apr 2024 09:10:35 +0900, Michael Paquier  wrote 
in 
> On Thu, Apr 11, 2024 at 04:55:59PM +0500, Kirill Reshke wrote:
> > The test doesn't fail because pg_terminate_backend actually meets his
> > point: autovac is killed. But while dying, autovac also receives
> > segfault. Thats because of injections points:
> > 
> > (gdb) bt
> > #0  0x56361c3379ea in tas (lock=0x7fbcb9632224  > access memory at address 0x7fbcb9632224>) at
> > ../../../../src/include/storage/s_lock.h:228
> > #1  ConditionVariableCancelSleep () at condition_variable.c:238
...
> > #3  0x56361c330a40 in CleanupProcSignalState (status=, arg=) at procsignal.c:240
> > #4  0x56361c328801 in shmem_exit (code=code@entry=1) at ipc.c:276
> > #9  0x56361c3378d7 in ConditionVariableTimedSleep
> > (cv=0x7fbcb9632224, timeout=timeout@entry=-1,
...
> I can see this stack trace as well.  Capturing a bit more than your
> own stack, this is crashing in the autovacuum worker while waiting on
> a condition variable when processing a ProcessInterrupts().
> 
> That may point to a legit bug with condition variables in this
> context, actually?  From what I can see, issuing a signal on a backend
> process waiting with a condition variable is able to process the
> interrupt correctly.

ProcSignalInit sets up CleanupProcSignalState to be called via
on_shmem_exit. If the CV is allocated in a dsm segment, shmem_exit
should have detached the region for the CV.  CV cleanup code should be
invoked via before_shmem_exit.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: NLS doesn't work for pg_combinebackup

2024-04-09 Thread Kyotaro Horiguchi

At Tue, 9 Apr 2024 15:00:27 +0900, Michael Paquier  wrote 
in 
> I've checked the whole tree, and the two you are pointing at are the
> only incorrect paths.  So applied, thanks!

Thank you for cross-checking and committing!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: NLS doesn't work for pg_combinebackup

2024-04-08 Thread Kyotaro Horiguchi

At Mon, 08 Apr 2024 16:27:02 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Hello.
> 
> I noticed that NLS doesn't work for pg_combinebackup. The cause is
> that the tool forgets to call set_pglocale_pgservice().
> 
> This issue is fixed by the following chage.
> 
> diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c 
> b/src/bin/pg_combinebackup/pg_combinebackup.c
> index 1b07ca3fb6..2788c78fdd 100644
> --- a/src/bin/pg_combinebackup/pg_combinebackup.c
> +++ b/src/bin/pg_combinebackup/pg_combinebackup.c
> @@ -154,6 +154,7 @@ main(int argc, char *argv[])
>  
>   pg_logging_init(argv[0]);
>   progname = get_progname(argv[0]);
> + set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_combinebackup"));
>   handle_help_version_opts(argc, argv, progname, help);
>  
>   memset(, 0, sizeof(opt));

Forgot to mention, but pg_walsummary has the same issue.

diff --git a/src/bin/pg_walsummary/pg_walsummary.c 
b/src/bin/pg_walsummary/pg_walsummary.c
index 5e41b376d7..daf6cd14ce 100644
--- a/src/bin/pg_walsummary/pg_walsummary.c
+++ b/src/bin/pg_walsummary/pg_walsummary.c
@@ -67,6 +67,7 @@ main(int argc, char *argv[])
 
pg_logging_init(argv[0]);
progname = get_progname(argv[0]);
+   set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_walsummary"));
handle_help_version_opts(argc, argv, progname, help);
 
/* process command-line options */


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

NLS doesn't work for pg_combinebackup

2024-04-08 Thread Kyotaro Horiguchi

Hello.

I noticed that NLS doesn't work for pg_combinebackup. The cause is
that the tool forgets to call set_pglocale_pgservice().

This issue is fixed by the following chage.

diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c 
b/src/bin/pg_combinebackup/pg_combinebackup.c
index 1b07ca3fb6..2788c78fdd 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -154,6 +154,7 @@ main(int argc, char *argv[])
 
pg_logging_init(argv[0]);
progname = get_progname(argv[0]);
+   set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_combinebackup"));
handle_help_version_opts(argc, argv, progname, help);
 
memset(, 0, sizeof(opt));


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: remaining sql/json patches

2024-03-21 Thread Kyotaro Horiguchi

At Fri, 22 Mar 2024 11:44:08 +0900, Amit Langote  
wrote in 
> Thanks for the heads up.
> 
> My bad, will push a fix shortly.

No problem. Thank you for the prompt correction.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: remaining sql/json patches

2024-03-21 Thread Kyotaro Horiguchi

At Wed, 20 Mar 2024 21:53:52 +0900, Amit Langote  
wrote in 
> I'll push 0001 tomorrow.

This patch (v44-0001-Add-SQL-JSON-query-functions.patch) introduced the 
following new erro message:

+errmsg("can only specify 
constant, non-aggregate"
+   " function, or 
operator expression for"
+   " DEFAULT"),

I believe that our convention here is to write an error message in a
single string literal, not split into multiple parts, for better
grep'ability.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Inconsistent printf placeholders

2024-03-21 Thread Kyotaro Horiguchi

Thank you for looking this.

At Tue, 19 Mar 2024 10:50:23 +0100, Peter Eisentraut  
wrote in 
> On 15.03.24 08:20, Kyotaro Horiguchi wrote:
> > diff --git a/src/backend/access/transam/twophase.c
> > b/src/backend/access/transam/twophase.c
> > @@ -1369,8 +1369,8 @@ ReadTwoPhaseFile(TransactionId xid, bool
> > missing_ok)
> >  errmsg("could not read file \"%s\":
> >  %m", path)));
> > else
> > ereport(ERROR,
> > -   (errmsg("could not read file \"%s\":
> > -   read %d of %lld",
> > -   path, r, (long long
> > -   int) stat.st_size)));
> > + (errmsg("could not read file \"%s\": read %zd of %zu",
> > + path, r, (Size) stat.st_size)));
> > }
> > pgstat_report_wait_end();
> 
> This might be worse, because stat.st_size is of type off_t, which
> could be smaller than Size/size_t.

I think you were trying to mention that off_t could be wider than
size_t and you're right in that point. I thought that it is safe since
we are trying to read the whole content of the file at once here into
palloc'ed memory.

However, on second thought, if st_size is out of the range of ssize_t,
and palloc accepts that size, at least on Linux, read(2) reads only
0x7000 bytes and raches the error reporting. Addition to that,
this size was closer to the memory allocation size limit than I had
thought.

As the result, I removed the change. However, I kept the change of the
type of variable "r" and corresponding placeholder %zd.

> > diff --git a/src/backend/libpq/be-secure-gssapi.c
> > b/src/backend/libpq/be-secure-gssapi.c
> > index bc04e78abb..68645b4519 100644
> > --- a/src/backend/libpq/be-secure-gssapi.c
> > +++ b/src/backend/libpq/be-secure-gssapi.c
> > @@ -572,9 +572,9 @@ secure_open_gssapi(Port *port)
> > if (input.length > PQ_GSS_RECV_BUFFER_SIZE)
> > {
> > ereport(COMMERROR,
> > -   (errmsg("oversize GSSAPI packet sent
> > -   by the client (%zu > %d)",
> > + (errmsg("oversize GSSAPI packet sent by the client (%zu > %zu)",
> > (size_t) input.length,
> > -   
> > PQ_GSS_RECV_BUFFER_SIZE)));
> > + (size_t) PQ_GSS_RECV_BUFFER_SIZE)));
> > return -1;
> > }
> >   
> 
> Might be better to add that cast to the definition of
> PQ_GSS_RECV_BUFFER_SIZE instead, so that all code can benefit.

As far as I see, the only exceptional uses I found were a comparison
with int values, and being passed as an OM_uint32 (to one of the
parameters of gss_wrap_size_limit()). Therefore, I agree that it is
beneficial. By the way, we currently define Size as the same as size_t
(since 1998). Is it correct to consider Size as merely for backward
compatibility and we should use size_t for new code?  I used size_t in
the modified part in the attached patch.

> > diff --git a/src/backend/replication/repl_gram.y
> > b/src/backend/replication/repl_gram.y
> > index 7474f5bd67..baa76280b9 100644
> > --- a/src/backend/replication/repl_gram.y
> > +++ b/src/backend/replication/repl_gram.y
> > @@ -312,11 +312,6 @@ timeline_history:
> > {
> > TimeLineHistoryCmd *cmd;
> >   - if ($2 <= 0)
> > -   ereport(ERROR,
> > -   
> > (errcode(ERRCODE_SYNTAX_ERROR),
> > -errmsg("invalid
> > -timeline %u",
> > -$2)));
> > -
...
> I don't think this is correct.  It loses the check for == 0.

Ugh. It's my mistake. So we need to consider unifying the messages
again. In walsummaryfuncs.c, %lld is required, but it's silly for the
uses in repl_gram.y. Finally, I chose not to change anything here.

> > diff --git a/src/backend/tsearch/to_tsany.c
> > b/src/backend/tsearch/to_tsany.c
> > index 88cba58cba..9d21178107 100644
> > --- a/src/backend/tsearch/to_tsany.c
> > +++ b/src/backend/tsearch/to_tsany.c
> > @@ -191,7 +191,8 @@ make_tsvector(Pa

Re: Add new error_action COPY ON_ERROR "log"

2024-03-17 Thread Kyotaro Horiguchi

At Fri, 15 Mar 2024 16:57:25 +0900, Michael Paquier  wrote 
in 
> On Wed, Mar 13, 2024 at 07:32:37PM +0530, Bharath Rupireddy wrote:
> > I'm attaching the v8 patch set implementing the above idea. With this,
> > [1] is sent to the client, [2] is sent to the server log. This
> > approach not only reduces the duplicate info in the NOTICE and CONTEXT
> > messages, but also makes it easy for users to get all the necessary
> > info in the NOTICE message without having to set extra parameters to
> > get CONTEXT message.
> 
> In terms of eliminating the information duplication while allowing
> clients to know the value, the attribute and the line involved in the
> conversion failure, the approach of tweaking the context information
> has merits, I guess.
> 
> How do others feel about this kind of tuning?

If required, I think that we have already included the minimum
information necessary for the primary diagnosis, including locations,
within the primary messages. Here is an example:

> LOG:  0: invalid record length at 0/18049F8: expected at least 24, got 0

And I believe that CONTEXT, if it exists, is augmentation information
to the primary messages. The objective of the precedent for the use of
relname_only was somewhat different, but this use also seems legit.

In short, I think the distribution between message types (primary and
context) is fine as it is in the latest patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Inconsistent printf placeholders

2024-03-15 Thread Kyotaro Horiguchi

At Fri, 15 Mar 2024 16:20:27 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> I checked for that kind of msgids in a bit more intensive way. The
> numbers are the line numbers in ja.po of backend. I didn't check the
> same for other modules.
> 
> > ###: invalid timeline %lld
> >@ backup/walsummaryfuncs.c:95
> >  invalid timeline %u
> >@ repl_gram.y:318 repl_gram.y:359

"The numbers are the line numbers in ja.po" is wrong. Correctly, it
should be written as:

The information consists of two lines: the first of them is the msgid,
and the second indicates the locations where they appear.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Inconsistent printf placeholders

2024-03-15 Thread Kyotaro Horiguchi

At Fri, 15 Mar 2024 16:01:28 +1300, David Rowley  wrote 
in 
> On Fri, 15 Mar 2024 at 15:27, Kyotaro Horiguchi  
> wrote:
> > I have considered only the two messages. Actually, buffile.c and md.c
> > are already like that. The attached aligns the messages in
> > pg_combinebackup.c and reconstruct.c with the precedents.
> 
> This looks like a worthy cause to make translator work easier.
> 
> I don't want to widen the goalposts or anything, but just wondering if
> you'd searched for any others that could get similar treatment?
> 
> I only just had a quick look at the following.
> 
> $ cat src/backend/po/fr.po | grep -E "^msgid\s" | sed -E
> 's/%[a-zA-Z]+/\%/g' | sort | uniq -d -c
>  31 msgid ""
>   2 msgid "could not accept SSL connection: %"
>   2 msgid "could not initialize LDAP: %"
>   2 msgid "could not look up local user ID %: %"
...
> I've not looked at how hard it would be with any of the above to
> determine how hard it would be to make the formats consistent.  The
> 3rd last one seems similar enough that it might be worth doing
> together with this?


I checked for that kind of msgids in a bit more intensive way. The
numbers are the line numbers in ja.po of backend. I didn't check the
same for other modules.

> ###: invalid timeline %lld
>@ backup/walsummaryfuncs.c:95
>  invalid timeline %u
>@ repl_gram.y:318 repl_gram.y:359

In the first case, the %lld can be more than 2^31.

In the second case, %u is uint32. However, the bigger issue is
checking if the uint32 value is negative:

repl_gram.c: 147
>   if ((yyvsp[0].uintval) <= 0)
>   ereport(ERROR,
>   (errcode(ERRCODE_SYNTAX_ERROR),
>errmsg("invalid timeline %u", 
> (yyvsp[0].uintval;

which cannot happen. I think we can simply remove the useless error
check.

> ###: could not read file \"%s\": read %d of %zu
>@ ../common/controldata_utils.c:116 ../common/controldata_utils.c:119 
> access/transam/xlog.c:3417 access/transam/xlog.c:4278 
> replication/logical/origin.c:750 replication/logical/origin.c:789 
> replication/logical/snapbuild.c:2040 replication/slot.c:2218 
> replication/slot.c:2259 replication/walsender.c:660 
> utils/cache/relmapper.c:833
>  could not read file \"%s\": read %d of %lld
>@ access/transam/twophase.c:1372
>  could not read file \"%s\": read %zd of %zu
>@ backup/basebackup.c:2102
> ###: oversize GSSAPI packet sent by the client (%zu > %zu)
>@ libpq/be-secure-gssapi.c:351
>  oversize GSSAPI packet sent by the client (%zu > %d)
>@ libpq/be-secure-gssapi.c:575
> ###: compressed segment file \"%s\" has incorrect uncompressed size %d, 
> skipping
>@ pg_receivewal.c:362
>  compressed segment file \"%s\" has incorrect uncompressed size %zu, 
> skipping
>@ pg_receivewal.c:448
> ###: could not read file \"%s\": read only %zd of %lld bytes
>@ pg_combinebackup.c:1304
>  could not read file \"%s\": read only %d of %u bytes
>@ reconstruct.c:514


We can "fix" them the same way.  I debated whether to use ssize_t for
read() and replace all instances of size_t with Size. However, in the
end, I decided to only keep it consistent with the surrounding code.

> ###: index %d out of valid range, 0..%d
>@ utils/adt/varlena.c:3220 utils/adt/varlena.c:3287
>  index %lld out of valid range, 0..%lld
>@ utils/adt/varlena.c:3251 utils/adt/varlena.c:3323
> ###: string is too long for tsvector (%d bytes, max %d bytes)
>@ tsearch/to_tsany.c:194 utils/adt/tsvector.c:277 
> utils/adt/tsvector_op.c:1126
>  string is too long for tsvector (%ld bytes, max %ld bytes)
>@ utils/adt/tsvector.c:223

We can unify them and did in the attached, but I'm not sure that's
sensible..

> ###: could not look up local user ID %d: %s
>@ ../port/user.c:43 ../port/user.c:79
>  could not look up local user ID %ld: %s
>@ libpq/auth.c:1886

Both of the above use uid_t (defined as int) as user ID and it is
explicitly cast to "int" in the first case and to long in the second,
which seems mysterious. Although I'm not sure if there's a possibility
of uid_t being widened in the future, I unified them to the latter way
for robustness.

> ###: error while cloning relation \"%s.%s\" (\"%s\" to \"%s\"): %m
>@ file.c:44
>  error while cloning relation \"%s.%s\" (\"%s\" to \"%s\"): %s
>@ file.c:65
> 
> The latter seems to be changed to %m by reassiging save_

Re: Typos in reorderbuffer.c.

2024-03-14 Thread Kyotaro Horiguchi

At Thu, 14 Mar 2024 11:23:38 +0530, Amit Kapila  wrote 
in 
> On Thu, Mar 14, 2024 at 9:58 AM Kyotaro Horiguchi
>  wrote:
> >
> > While examining reorderbuffer.c, I found several typos. I'm not sure
> > if fixing them is worthwhile, but I've attached a fix just in case.
> >
> 
> LGTM. I'll push this in some time.

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Inconsistent printf placeholders

2024-03-14 Thread Kyotaro Horiguchi

Thank you for the suggestions.

At Thu, 14 Mar 2024 13:45:41 +0100, Daniel Gustafsson  wrote 
in 
> I've only skimmed this so far but +1 on keeping the messages the same where
> possible to reduce translation work.  Adding a comment on the message where 
> the
> casting is done to indicate that it is for translation might reduce the risk 
> of
> it "getting fixed" down the line.

Added a comment "/* cast xxx to avoid extra translatable messages */.

At Thu, 14 Mar 2024 14:02:46 +0100, Peter Eisentraut  
wrote in 
> If you want to make them uniform, then I suggest the error messages

Yeah. Having the same messages with only the placeholders changed is
not very pleasing during translation. If possible, I would like to
align them.

> should both be "%zd of %zu bytes", which are the actual types read()
> deals with.

I have considered only the two messages. Actually, buffile.c and md.c
are already like that. The attached aligns the messages in
pg_combinebackup.c and reconstruct.c with the precedents.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
index 6f0814d9ac..feb4d5dcf4 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -1301,8 +1301,9 @@ slurp_file(int fd, char *filename, StringInfo buf, int maxlen)
 		if (rb < 0)
 			pg_fatal("could not read file \"%s\": %m", filename);
 		else
-			pg_fatal("could not read file \"%s\": read only %zd of %lld bytes",
-	 filename, rb, (long long int) st.st_size);
+			/* cast st_size to avoid extra translatable messages */
+			pg_fatal("could not read file \"%s\": read only %zd of %zu bytes",
+	 filename, rb, (size_t) st.st_size);
 	}
 
 	/* Adjust buffer length for new data and restore trailing-\0 invariant */
diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c
index 41f06bb26b..a4badb90e2 100644
--- a/src/bin/pg_combinebackup/reconstruct.c
+++ b/src/bin/pg_combinebackup/reconstruct.c
@@ -504,15 +504,16 @@ make_rfile(char *filename, bool missing_ok)
 static void
 read_bytes(rfile *rf, void *buffer, unsigned length)
 {
-	int			rb = read(rf->fd, buffer, length);
+	ssize_t		rb = read(rf->fd, buffer, length);
 
 	if (rb != length)
 	{
 		if (rb < 0)
 			pg_fatal("could not read file \"%s\": %m", rf->filename);
 		else
-			pg_fatal("could not read file \"%s\": read only %d of %u bytes",
-	 rf->filename, rb, length);
+			/* cast length to avoid extra translatable messages */
+			pg_fatal("could not read file \"%s\": read only %zd of %zu bytes",
+	 rf->filename, rb, (size_t) length);
 	}
 }

Typos in reorderbuffer.c.

2024-03-13 Thread Kyotaro Horiguchi

Hello.

While examining reorderbuffer.c, I found several typos. I'm not sure
if fixing them is worthwhile, but I've attached a fix just in case.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 001f901ee6..92cf39ff74 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -32,7 +32,7 @@
  *
  *	  In order to cope with large transactions - which can be several times as
  *	  big as the available memory - this module supports spooling the contents
- *	  of a large transactions to disk. When the transaction is replayed the
+ *	  of large transactions to disk. When the transaction is replayed the
  *	  contents of individual (sub-)transactions will be read from disk in
  *	  chunks.
  *
@@ -3078,10 +3078,10 @@ ReorderBufferImmediateInvalidation(ReorderBuffer *rb, uint32 ninvalidations,
  * least once for every xid in XLogRecord->xl_xid (other places in records
  * may, but do not have to be passed through here).
  *
- * Reorderbuffer keeps some datastructures about transactions in LSN order,
+ * Reorderbuffer keeps some data structures about transactions in LSN order,
  * for efficiency. To do that it has to know about when transactions are seen
  * first in the WAL. As many types of records are not actually interesting for
- * logical decoding, they do not necessarily pass though here.
+ * logical decoding, they do not necessarily pass through here.
  */
 void
 ReorderBufferProcessXid(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
@@ -3513,11 +3513,11 @@ ReorderBufferLargestTXN(ReorderBuffer *rb)
  * snapshot because we don't decode such transactions.  Also, we do not select
  * the transaction which doesn't have any streamable change.
  *
- * Note that, we skip transactions that contains incomplete changes. There
+ * Note that, we skip transactions that contain incomplete changes. There
  * is a scope of optimization here such that we can select the largest
  * transaction which has incomplete changes.  But that will make the code and
  * design quite complex and that might not be worth the benefit.  If we plan to
- * stream the transactions that contains incomplete changes then we need to
+ * stream the transactions that contain incomplete changes then we need to
  * find a way to partially stream/truncate the transaction changes in-memory
  * and build a mechanism to partially truncate the spilled files.
  * Additionally, whenever we partially stream the transaction we need to
@@ -4701,8 +4701,8 @@ ReorderBufferToastAppendChunk(ReorderBuffer *rb, ReorderBufferTXN *txn,
 }
 
 /*
- * Rejigger change->newtuple to point to in-memory toast tuples instead to
- * on-disk toast tuples that may not longer exist (think DROP TABLE or VACUUM).
+ * Rejigger change->newtuple to point to in-memory toast tuples instead of
+ * on-disk toast tuples that may no longer exist (think DROP TABLE or VACUUM).
  *
  * We cannot replace unchanged toast tuples though, so those will still point
  * to on-disk toast data.
@@ -4713,7 +4713,7 @@ ReorderBufferToastAppendChunk(ReorderBuffer *rb, ReorderBufferTXN *txn,
  * at unexpected times.
  *
  * We simply subtract size of the change before rejiggering the tuple, and
- * then adding the new size. This makes it look like the change was removed
+ * then add the new size. This makes it look like the change was removed
  * and then added back, except it only tweaks the accounting info.
  *
  * In particular it can't trigger serialization, which would be pointless

Inconsistent printf placeholders

2024-03-13 Thread Kyotaro Horiguchi

Hello.

A recent commit 6612185883 introduced two error messages that are
identical in text but differ in their placeholders.

-   pg_fatal("could not read file \"%s\": read only %d of 
%d bytes",
-filename, (int) rb, (int) st.st_size);
+   pg_fatal("could not read file \"%s\": read only %zd of 
%lld bytes",
+filename, rb, (long long int) 
st.st_size);
...
-   pg_fatal("could not read file \"%s\": read only %d of 
%d bytes",
+   pg_fatal("could not read file \"%s\": read only %d of 
%u bytes",
 rf->filename, rb, length);

I'd be happy if the two messages kept consistency. I suggest aligning
types instead of making the messages different, as attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_combinebackup/reconstruct.c b/src/bin/pg_combinebackup/reconstruct.c
index 41f06bb26b..e3b8e84289 100644
--- a/src/bin/pg_combinebackup/reconstruct.c
+++ b/src/bin/pg_combinebackup/reconstruct.c
@@ -504,15 +504,15 @@ make_rfile(char *filename, bool missing_ok)
 static void
 read_bytes(rfile *rf, void *buffer, unsigned length)
 {
-	int			rb = read(rf->fd, buffer, length);
+	ssize_t		rb = read(rf->fd, buffer, length);
 
 	if (rb != length)
 	{
 		if (rb < 0)
 			pg_fatal("could not read file \"%s\": %m", rf->filename);
 		else
-			pg_fatal("could not read file \"%s\": read only %d of %u bytes",
-	 rf->filename, rb, length);
+			pg_fatal("could not read file \"%s\": read only %zd of %lld bytes",
+	 rf->filename, rb, (long long int) length);
 	}
 }

Re: Infinite loop in XLogPageRead() on standby

2024-03-12 Thread Kyotaro Horiguchi

At Mon, 11 Mar 2024 16:43:32 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Oh, I once saw the fix work, but seems not to be working after some
> point. The new issue was a corruption of received WAL records on the
> first standby, and it may be related to the setting.

I identified the cause of the second issue. When I tried to replay the
issue, the second standby accidentally received the old timeline's
last page-spanning record till the end while the first standby was
promoting (but it had not been read by recovery). In addition to that,
on the second standby, there's a time window where the timeline
increased but the first segment of the new timeline is not available
yet. In this case, the second standby successfully reads the
page-spanning record in the old timeline even after the second standby
noticed that the timeline ID has been increased, thanks to the
robustness of XLogFileReadAnyTLI().

I think the primary change to XLogPageRead that I suggested is correct
(assuming the use of wal_segment_size instead of the
constant). However, still XLogFileReadAnyTLI() has a chance to read
the segment from the old timeline after the second standby notices a
timeline switch, leading to the second issue. The second issue was
fixed by preventing XLogFileReadAnyTLI from reading segments from
older timelines than those suggested by the latest timeline
history. (In other words, disabling the "AnyTLI" part).

I recall that there was a discussion for commit 4bd0ad9e44, about the
objective of allowing reading segments from older timelines than the
timeline history suggests. In my faint memory, we concluded to
postpone making the decision to remove the feature due to uncertainity
about the objective. If there's no clear reason to continue using
XLogFileReadAnyTLI(), I suggest we stop its use and instead adopt
XLogFileReadOnTLHistory(), which reads segments that align precisely
with the timeline history.

Of course, regardless of the changes above, if recovery on the second
standby had reached the end of the page-spanning record before
redirection to the first standby, it would need pg_rewind to connect
to the first standby.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Infinite loop in XLogPageRead() on standby

2024-03-11 Thread Kyotaro Horiguchi

At Wed, 6 Mar 2024 11:34:29 +0100, Alexander Kukushkin  
wrote in 
> Hmm, I think you meant to use wal_segment_size, because 0x10 is just
> 1MB. As a result, currently it works for you by accident.

Oh, I once saw the fix work, but seems not to be working after some
point. The new issue was a corruption of received WAL records on the
first standby, and it may be related to the setting.

> > Thus, I managed to reproduce precisely the same situation as you
> > described utilizing your script with modifications and some core
> > tweaks, and with the change above, I saw that the behavior was
> > fixed. However, for reasons unclear to me, it shows another issue, and
> > I am running out of time and need more caffeine. I'll continue
> > investigating this tomorrow.
> >
> 
> Thank you for spending your time on it!

You're welcome, but I aplogize for the delay in the work..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Infinite loop in XLogPageRead() on standby

2024-03-06 Thread Kyotaro Horiguchi

At Tue, 5 Mar 2024 09:36:44 +0100, Alexander Kukushkin  
wrote in 
> Please find attached the patch fixing the problem and the updated TAP test
> that addresses Nit.

Record-level retries happen when the upper layer detects errors. In my
previous mail, I cited code that is intended to prevent this at
segment boundaries. However, the resulting code applies to all page
boundaries, since we judged that the difference doen't significanty
affects the outcome.

> * Check the page header immediately, so that we can retry immediately if
> * it's not valid. This may seem unnecessary, because ReadPageInternal()
> * validates the page header anyway, and would propagate the failure up to

So, the following (tentative) change should also work.

xlogrecovery.c:
@@ -3460,8 +3490,10 @@ retry:
 * responsible for the validation.
 */
if (StandbyMode &&
+   targetPagePtr % 0x10 == 0 &&
!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, 
readBuf))
{

Thus, I managed to reproduce precisely the same situation as you
described utilizing your script with modifications and some core
tweaks, and with the change above, I saw that the behavior was
fixed. However, for reasons unclear to me, it shows another issue, and
I am running out of time and need more caffeine. I'll continue
investigating this tomorrow.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: initdb's -c option behaves wrong way?

2024-03-04 Thread Kyotaro Horiguchi

At Mon, 4 Mar 2024 09:39:39 +0100, Daniel Gustafsson  wrote in 
> 
> 
> > On 4 Mar 2024, at 02:01, Tom Lane  wrote:
> > 
> > Daniel Gustafsson  writes:
> >> I took the liberty to add this to the upcoming CF to make sure we don't 
> >> lose
> >> track of it.
> > 
> > Thanks for doing that, because the cfbot pointed out a problem:
> > I should have written pg_strncasecmp not strncasecmp.  If this
> > version tests cleanly, I'll push it.
> 
> +1, LGTM.

Thank you for fixing this, Tom!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Infinite loop in XLogPageRead() on standby

2024-02-29 Thread Kyotaro Horiguchi

At Fri, 01 Mar 2024 12:37:55 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Anyway, our current policy here is to avoid record-rereads beyond
> source switches. However, fixing this seems to require that source
> switches cause record rereads unless some additional information is
> available to know if replay LSN needs to back up.

It seems to me that the error messages are related to commit 0668719801.

XLogPageRead:
> * Check the page header immediately, so that we can retry immediately if
> * it's not valid. This may seem unnecessary, because ReadPageInternal()
> * validates the page header anyway, and would propagate the failure up to
> * ReadRecord(), which would retry. However, there's a corner case with
> * continuation records, if a record is split across two pages such that
> * we would need to read the two pages from different sources. For
...
>   if (StandbyMode &&
>   !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, 
> readBuf))
>   {
>   /*
>* Emit this error right now then retry this page immediately. 
> Use
>* errmsg_internal() because the message was already translated.
>*/
>   if (xlogreader->errormsg_buf[0])
>   ereport(emode_for_corrupt_record(emode, 
> xlogreader->EndRecPtr),
>   (errmsg_internal("%s", 
> xlogreader->errormsg_buf)));

This code intends to prevent a page header error from causing a record
reread, when a record is required to be read from multiple sources. We
could restrict this to only fire at segment boundaries. At segment
boundaries, we won't let LSN back up by using XLP_FIRST_IS_CONTRECORD.

Having thought up to this point, I now believe that we should
completely prevent LSN from going back in any case. One drawback is
that the fix cannot be back-patched.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Infinite loop in XLogPageRead() on standby

2024-02-29 Thread Kyotaro Horiguchi

At Fri, 01 Mar 2024 12:04:31 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Fri, 01 Mar 2024 10:29:12 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > After reading this, I came up with a possibility that walreceiver
> > recovers more quickly than the calling interval to
> > WaitForWALtoBecomeAvailable(). If walreceiver disconnects after a call
> > to the function WaitForWAL...(), and then somehow recovers the
> > connection before the next call, the function doesn't notice the
> > disconnection and returns XLREAD_SUCCESS wrongly. If this assumption
> > is correct, the correct fix might be for us to return XLREAD_FAIL when
> > reconnection happens after the last call to the WaitForWAL...()
> > function.
> 
> That's my stupid. The function performs reconnection by itself.

Anyway, our current policy here is to avoid record-rereads beyond
source switches. However, fixing this seems to require that source
switches cause record rereads unless some additional information is
available to know if replay LSN needs to back up.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Infinite loop in XLogPageRead() on standby

2024-02-29 Thread Kyotaro Horiguchi

At Fri, 01 Mar 2024 10:29:12 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> After reading this, I came up with a possibility that walreceiver
> recovers more quickly than the calling interval to
> WaitForWALtoBecomeAvailable(). If walreceiver disconnects after a call
> to the function WaitForWAL...(), and then somehow recovers the
> connection before the next call, the function doesn't notice the
> disconnection and returns XLREAD_SUCCESS wrongly. If this assumption
> is correct, the correct fix might be for us to return XLREAD_FAIL when
> reconnection happens after the last call to the WaitForWAL...()
> function.

That's my stupid. The function performs reconnection by itself.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Infinite loop in XLogPageRead() on standby

2024-02-29 Thread Kyotaro Horiguchi

At Fri, 1 Mar 2024 08:17:04 +0900, Michael Paquier  wrote 
in 
> On Thu, Feb 29, 2024 at 05:44:25PM +0100, Alexander Kukushkin wrote:
> > On Thu, 29 Feb 2024 at 08:18, Kyotaro Horiguchi 
> > wrote:
> >> In the first place, it's important to note that we do not guarantee
> >> that an async standby can always switch its replication connection to
> >> the old primary or another sibling standby. This is due to the
> >> variations in replication lag among standbys. pg_rewind is required to
> >> adjust such discrepancies.
> > 
> > Sure, I know. But in this case the async standby received and flushed
> > absolutely the same amount of WAL as the promoted one.
> 
> Ugh.  If it can happen transparently to the user without the user
> knowing directly about it, that does not sound good to me.  I did not
> look very closely at monitoring tools available out there, but if both
> standbys flushed the same WAL locations a rewind should not be
> required.  It is not something that monitoring tools would be able to
> detect because they just look at LSNs.
> 
> >> In the repro script, the replication connection of the second standby
> >> is switched from the old primary to the first standby after its
> >> promotion. After the switching, replication is expected to continue
> >> from the beginning of the last replayed segment.
> > 
> > Well, maybe, but apparently the standby is busy trying to decode a record
> > that spans multiple pages, and it is just infinitely waiting for the next
> > page to arrive. Also, the restart "fixes" the problem, because indeed it is
> > reading the file from the beginning.
> 
> What happens if the continuation record spawns across multiple segment
> files boundaries in this case?  We would go back to the beginning of
> the segment where the record spawning across multiple segments began,
> right?  (I may recall this part of contrecords incorrectly, feel free
> to correct me if necessary.)

After reading this, I came up with a possibility that walreceiver
recovers more quickly than the calling interval to
WaitForWALtoBecomeAvailable(). If walreceiver disconnects after a call
to the function WaitForWAL...(), and then somehow recovers the
connection before the next call, the function doesn't notice the
disconnection and returns XLREAD_SUCCESS wrongly. If this assumption
is correct, the correct fix might be for us to return XLREAD_FAIL when
reconnection happens after the last call to the WaitForWAL...()
function.

> >> But with the script,
> >> the second standby copies the intentionally broken file, which differs
> >> from the data that should be received via streaming.
> > 
> > As I already said, this is a simple way to emulate the primary crash while
> > standbys receiving WAL.
> > It could easily happen that the record spans on multiple pages is not fully
> > received and flushed.
> 
> I think that's OK to do so at test level to force a test in the
> backend, FWIW, because that's cheaper, and 039_end_of_wal.pl has
> proved that this can be designed to be cheap and stable across the
> buildfarm fleet.

Yeah, I agree that it clearly illustrates the final state after the
issue happened, but if my assumption above is correct, the test
doesn't manifest the real issue.

> For anything like that, the result had better have solid test
> coverage, where perhaps we'd better refactor some of the routines of
> 039_end_of_wal.pl into a module to use them here, rather than
> duplicate the code.  The other test has a few assumptions with the
> calculation of page boundaries, and I'd rather not duplicate that
> across the tree.

I agree to the point.

> Seeing that Alexander is a maintainer of Patroni, which is very
> probably used by his employer across a large set of PostgreSQL
> instances, if he says that he's seen that in the field, that's good
> enough for me to respond to the problem, especially if reconnecting a
> standby to a promoted node where both flushed the same LSN.  Now the
> level of response also depends on the invasiness of the change, and we
> need a very careful evaluation here.

I don't mean to say that we should respond with DNF to this "issue" at
all. I simply wanted to make the real issue clear.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Infinite loop in XLogPageRead() on standby

2024-02-28 Thread Kyotaro Horiguchi

At Thu, 29 Feb 2024 14:05:15 +0900, Michael Paquier  wrote 
in 
> On Wed, Feb 28, 2024 at 11:19:41AM +0100, Alexander Kukushkin wrote:
> > I spent some time debugging an issue with standby not being able to
> > continue streaming after failover.
> >
> > The problem happens when standbys received only the first part of the WAL
> > record that spans multiple pages.
> > In this case the promoted standby discards the first part of the WAL record
> > and writes END_OF_RECOVERY instead. If in addition to that someone will
> > call pg_switch_wal(), then there are chances that SWITCH record will also
> > fit to the page where the discarded part was settling, As a result the
> > other standby (that wasn't promoted) will infinitely try making attempts to
> > decode WAL record span on multiple pages by reading the next page, which is
> > filled with zero bytes. And, this next page will never be written, because
> > the new primary will be writing to the new WAL file after pg_switch_wal().

In the first place, it's important to note that we do not guarantee
that an async standby can always switch its replication connection to
the old primary or another sibling standby. This is due to the
variations in replication lag among standbys. pg_rewind is required to
adjust such discrepancies.

I might be overlooking something, but I don't understand how this
occurs without purposefully tweaking WAL files. The repro script
pushes an incomplete WAL file to the archive as a non-partial
segment. This shouldn't happen in the real world.

In the repro script, the replication connection of the second standby
is switched from the old primary to the first standby after its
promotion. After the switching, replication is expected to continue
from the beginning of the last replayed segment. But with the script,
the second standby copies the intentionally broken file, which differs
from the data that should be received via streaming. A similar problem
to the issue here was seen at segment boundaries, before we introduced
the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag, which prevents overwriting
a WAL file that is already archived. However, in this case, the second
standby won't see the broken record because it cannot be in a
non-partial segment in the archive, and the new primary streams
END_OF_RECOVERY instead of the broken record.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-28 Thread Kyotaro Horiguchi

At Tue, 27 Feb 2024 18:33:18 +0100, Alvaro Herrera  
wrote in 
> Here's the complete set, with these two names using the singular.

The commit by the second patch added several GUC descriptions:

> Sets the size of the dedicated buffer pool used for the commit timestamp 
> cache.

Some of them, commit_timestamp_buffers, transaction_buffers,
subtransaction_buffers use 0 to mean auto-tuning based on
shared-buffer size. I think it's worth adding an extra_desc such as "0
to automatically determine this value based on the shared buffer
size".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: MultiXact\SLRU buffers configuration

2024-02-28 Thread Kyotaro Horiguchi

At Sat, 3 Feb 2024 22:32:45 +0500, "Andrey M. Borodin"  
wrote in 
> Here's the test draft. This test reliably reproduces sleep on CV when waiting 
> next multixact to be filled into "members" SLRU.

By the way, I raised a question about using multiple CVs
simultaneously [1]. That is, I suspect that the current CV
implementation doesn't allow us to use multiple condition variables at
the same time, because all CVs use the same PCPROC member cvWaitLink
to accommodate different waiter sets. If this assumption is correct,
we should resolve the issue before spreading more uses of CVs.

[1] 
https://www.postgresql.org/message-id/20240227.150709.1766217736683815840.horikyota.ntt%40gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: A failure in t/001_rep_changes.pl

2024-02-26 Thread Kyotaro Horiguchi

At Fri, 23 Feb 2024 15:50:21 +0530, vignesh C  wrote in 
> By any chance do you have the log files when this failure occurred, if
> so please share it.

In my understanding, within a single instance, no two proclists can
simultaneously share the same waitlink member of PGPROC.

On the other hand, a publisher uses two condition variables for slots
and WAL waiting, which work on the same PGPROC member cvWaitLink. I
suspect this issue arises from the configuration. However, although it
is unlikly related to this specific issue, a similar problem can arise
in instances that function both as logical publisher and physical
primary.

Regardless of this issue, I think we should provide separate waitlink
members for condition variables that can possibly be used
simultaneously.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: About a recently-added message

2024-02-21 Thread Kyotaro Horiguchi

At Thu, 22 Feb 2024 10:51:07 +0530, Amit Kapila  wrote 
in 
> > Do you think some additional tests for the rest of the messages are
> > worth the trouble?
> >
> 
> We have discussed this during development and didn't find it worth
> adding tests for all misconfigured parameters. However, in the next
> patch where we are planning to add a slot sync worker that will
> automatically sync slots, we are adding a test for one more parameter.
> I am not against adding tests for all the parameters but it didn't
> appeal to add more test cycles for this.

Thanks for the explanation. I'm fine with that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: About a recently-added message

2024-02-21 Thread Kyotaro Horiguchi

At Thu, 22 Feb 2024 09:36:43 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Yes, I'm happy with all of the changes.  The proposed patch appears to
> cover all instances related to slotsync.c, and it looks fine to
> me. Thanks!

I'd like to raise another potential issue outside the patch. The patch
needed to change only one test item even though it changed nine
messages. This means eigh out of nine messages that the patch changed
are not covered by our test. I doubt all of them are worth additional
test items; however, I think we want to increase coverage.

Do you think some additional tests for the rest of the messages are
worth the trouble?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: About a recently-added message

2024-02-21 Thread Kyotaro Horiguchi

At Wed, 21 Feb 2024 14:57:42 +0530, Amit Kapila  wrote 
in 
> On Tue, Feb 20, 2024 at 3:21 PM shveta malik  wrote:
> >
> > okay, attached v2 patch with changed error msgs and double quotes
> > around logical.
> >
> 
> Horiguchi-San, does this address all your concerns related to
> translation with these new messages?

Yes, I'm happy with all of the changes.  The proposed patch appears to
cover all instances related to slotsync.c, and it looks fine to
me. Thanks!

I found that logica.c is also using the policy that I complained
about, but it is a separate issue.

./logical.c122:errmsg("logical decoding requires wal_level >= 
logical")));

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

2024-02-20 Thread Kyotaro Horiguchi

At Tue, 20 Feb 2024 19:56:10 +0530, Robert Haas  wrote 
in 
> It seems like maybe somebody should look into why this is happening,
> and perhaps fix it.

GetConnection()@streamutil.c wants to ensure conninfo has a fallback
database name ("replication"). However, the function seems to be
ignoring the case where neither dbname nor connection string is given,
which is the first case Kuroda-san raised. The second case is the
intended behavior of the function.

>   /* pg_recvlogical uses dbname only; others use connection_string only. 
> */
>   Assert(dbname == NULL || connection_string == NULL);

And the function incorrectly assumes that the connection string
requires "dbname=replication".

>* Merge the connection info inputs given in form of connection string,
>* options and default values (dbname=replication, replication=true, 
> etc.)

But the name is a pseudo database name only used by pg_hba.conf
(maybe) , which cannot be used as an actual database name (without
quotation marks, or unless it is actually created). The function
should not add the fallback database name because the connection
string for physical replication connection doesn't require the dbname
parameter. (attached patch)

About the proposed patch, pg_basebackup cannot verify the validity of
the dbname. It could be problematic.

Although I haven't looked the original thread, it seems that the
dbname is used only by pg_sync_replication_slots(). If it is true,
couldn't we make the SQL function require a database name to make a
connection, instead of requiring it in physical-replication conninfo?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_basebackup/streamutil.c 
b/src/bin/pg_basebackup/streamutil.c
index 56d1b15951..da63a04de4 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -78,7 +78,7 @@ GetConnection(void)
 
/*
 * Merge the connection info inputs given in form of connection string,
-* options and default values (dbname=replication, replication=true, 
etc.)
+* options and default values (replication=true, etc.)
 */
i = 0;
if (connection_string)
@@ -96,14 +96,6 @@ GetConnection(void)
keywords = pg_malloc0((argcount + 1) * sizeof(*keywords));
values = pg_malloc0((argcount + 1) * sizeof(*values));
 
-   /*
-* Set dbname here already, so it can be overridden by a dbname 
in the
-* connection string.
-*/
-   keywords[i] = "dbname";
-   values[i] = "replication";
-   i++;
-
for (conn_opt = conn_opts; conn_opt->keyword != NULL; 
conn_opt++)
{
if (conn_opt->val != NULL && conn_opt->val[0] != '\0')

Re: Do away with zero-padding assumption before WALRead()

2024-02-18 Thread Kyotaro Horiguchi

At Mon, 19 Feb 2024 11:02:39 +0530, Bharath Rupireddy 
 wrote in 
> > After all, the patch looks good to me.
> 
> Thanks. It was committed -
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=73f0a1326608ac3a7d390706fdeec59fe4dc42c0.

Yeah. I realied that after I had already sent the mail.. No harm done:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: A new message seems missing a punctuation

2024-02-18 Thread Kyotaro Horiguchi

At Mon, 19 Feb 2024 10:31:33 +0530, Robert Haas  wrote 
in 
> On Mon, Feb 19, 2024 at 10:10 AM Kyotaro Horiguchi
>  wrote:
> > A recent commit (7a424ece48) added the following message:
> >
> > > could not sync slot information as remote slot precedes local slot:
> > > remote slot "%s": LSN (%X/%X), catalog xmin (%u) local slot: LSN
> > > (%X/%X), catalog xmin (%u)
> >
> > Since it is a bit overloaded but doesn't have a separator between
> > "catalog xmin (%u)" and "local slot:", it is somewhat cluttered. Don't
> > we need something, for example a semicolon there to improve
> > readability and reduce clutter?
> 
> I think maybe we should even revise the message even more. In most
> places we do not just print out a whole bunch of values, but use a
> sentence to connect them.

Mmm. Something like this?:

"could not sync slot information: local slot LSN (%X/%X) or xmin(%u)
 behind remote (%X/%X, %u)"

Or I thought the values could be moved to DETAILS: line.

By the way, the code around the message is as follows.

> /*
>  * The remote slot didn't catch up to locally reserved position.
>  *
>  * We do not drop the slot because the restart_lsn can be ahead of the
>  * current location when recreating the slot in the next cycle. It may
>  * take more time to create such a slot. Therefore, we keep this slot
>  * and attempt the synchronization in the next cycle.
>  *
>  * XXX should this be changed to elog(DEBUG1) perhaps?
>  */
> ereport(LOG,
>   errmsg("could not sync slot information as remote slot precedes 
> local slot:"
>  " remote slot \"%s\": LSN (%X/%X), 
> catalog xmin (%u) local slot: LSN (%X/%X), catalog xmin (%u)",

If we change this message to DEBUG1, a timeout feature to show a
WARNING message might be needed for DBAs to notice that something bad
is ongoing. However, it doesn't seem appropriate as a LOG message to
me. Perhaps, a LOG message should be more like:

> "LOG:  waiting for local slot to catch up to remote"
> "DETAIL:  remote slot LSN (%X/%X); "

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

A new message seems missing a punctuation

2024-02-18 Thread Kyotaro Horiguchi

A recent commit (7a424ece48) added the following message:

> could not sync slot information as remote slot precedes local slot:
> remote slot "%s": LSN (%X/%X), catalog xmin (%u) local slot: LSN
> (%X/%X), catalog xmin (%u)

Since it is a bit overloaded but doesn't have a separator between
"catalog xmin (%u)" and "local slot:", it is somewhat cluttered. Don't
we need something, for example a semicolon there to improve
readability and reduce clutter?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Do away with zero-padding assumption before WALRead()

2024-02-18 Thread Kyotaro Horiguchi

At Mon, 19 Feb 2024 11:56:22 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Yeah, perhaps I was overly concerned. The removed comment made me
> think that someone could add code relying on the incorrect assumption
> that the remaining bytes beyond the returned count are cleared out. On
> the flip side, SimpleXLogPageRead always reads a whole page and
> returns XLOG_BLCKSZ. However, as you know, the returned buffer doesn't
> contain random garbage bytes. Therefore, it's safe as long as the

Forgot to mention that there is a case involving non-initialized
pages, but it doesn't affect the correctness of the description you
pointed out.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Do away with zero-padding assumption before WALRead()

2024-02-18 Thread Kyotaro Horiguchi

At Fri, 16 Feb 2024 19:50:00 +0530, Bharath Rupireddy 
 wrote in 
> On Fri, Feb 16, 2024 at 7:10 AM Kyotaro Horiguchi
>  wrote:
> > 1. It's useless to copy the whole page regardless of the 'count'. It's
> >   enough to copy only up to the 'count'. The patch looks good in this
> >   regard.
> 
> Yes, it's not needed to copy the whole page. Per my understanding
> about page transfers between disk and OS page cache - upon OS page
> cache miss, the whole page (of disk block size) gets fetched from disk
> even if we just read 'count' bytes (< disk block size).

Right, but with a possibly-different block size. Anyway that behavior
doesn't affect the result of this change. (Could affect performance
hereafter if it were not the case, though..)

> > 2. Maybe we need a comment that states the page_read callback
> >   functions leave garbage bytes beyond the returned count, due to
> >   partial copying without clearing the unused portion.
> 
> Isn't the comment around page_read callback at
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/access/xlogreader.h;h=2e9e5f43eb2de1ca9ba81afe76d21357065c61aa;hb=d57b7cc3338e9d9aa1d7c5da1b25a17c5a72dcce#l78
> enough?
> 
> "The callback shall return the number of bytes read (never more than
> XLOG_BLCKSZ), or -1 on failure."

Yeah, perhaps I was overly concerned. The removed comment made me
think that someone could add code relying on the incorrect assumption
that the remaining bytes beyond the returned count are cleared out. On
the flip side, SimpleXLogPageRead always reads a whole page and
returns XLOG_BLCKSZ. However, as you know, the returned buffer doesn't
contain random garbage bytes. Therefore, it's safe as long as the
caller doesn't access beyond the returned count. As a result, the
description you pointed out seems to be enough.

After all, the patch looks good to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Do away with zero-padding assumption before WALRead()

2024-02-15 Thread Kyotaro Horiguchi

At Tue, 13 Feb 2024 11:47:06 +0530, Bharath Rupireddy 
 wrote in 
> Hi,
> 
> I noticed an assumption [1] at WALRead() call sites expecting the
> flushed WAL page to be zero-padded after the flush LSN. I think this
> can't always be true as the WAL can get flushed after determining the
> flush LSN before reading it from the WAL file using WALRead(). I've
> hacked the code up a bit to check if that's true -

Good catch! The comment seems wrong also to me. The subsequent bytes
can be written simultaneously, and it's very normal that there are
unflushed bytes are in OS's page buffer. The objective of the comment
seems to be to declare that there's no need to clear out the remaining
bytes, here. I agree that it's not a problem for now. However, I think
we need two fixes here.

1. It's useless to copy the whole page regardless of the 'count'. It's
  enough to copy only up to the 'count'. The patch looks good in this
  regard.

2. Maybe we need a comment that states the page_read callback
  functions leave garbage bytes beyond the returned count, due to
  partial copying without clearing the unused portion.

> I'm wondering why the WALRead() callers are always reading XLOG_BLCKSZ
> despite knowing exactly how much to read. Is it to tell the OS to
> explicitly fetch the whole page from the disk? If yes, the OS will do
> that anyway because the page transfers from disk to OS page cache are
> always in terms of disk block sizes, no?

If I understand your question correctly, I guess that the whole-page
copy was expected to clear out the remaining bytes, as I mentioned
earlier.

> Although, there's no immediate problem with it right now, the
> assumption is going to be incorrect when reading WAL from WAL buffers
> using WALReadFromBuffers -
> https://www.postgresql.org/message-id/CALj2ACV=C1GZT9XQRm4iN1NV1T=hla_hsgwnx2y5-g+mswd...@mail.gmail.com.
>
> If we have no reason, can the WALRead() callers just read how much
> they want like walsender for physical replication? Attached a patch
> for the change.
> 
> Thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: About a recently-added message

2024-02-14 Thread Kyotaro Horiguchi

At Thu, 15 Feb 2024 09:22:23 +0530, shveta malik  wrote 
in 
> On Thu, Feb 15, 2024 at 8:26 AM Amit Kapila  wrote:
> >
> > On Wed, Feb 14, 2024 at 7:51 PM Euler Taveira  wrote:
> > >
> > > On Wed, Feb 14, 2024, at 8:45 AM, Amit Kapila wrote:
> > >
> > > Now, I am less clear about whether to quote "logical" or not in the
> > > above message. Do you have any suggestions?
> > >
> > >
> > > The possible confusion comes from the fact that the sentence contains 
> > > "must be"
> > > in the middle of a comparison expression. For pg_createsubscriber, we are 
> > > using
> > >
> > > publisher requires wal_level >= logical
> > >
> > > I suggest to use something like
> > >
> > > slot synchronization requires wal_level >= logical
> > >
> >
> > This sounds like a better idea but shouldn't we directly use this in
> > 'errmsg' instead of a separate 'errhint'? Also, if change this, then
> > we should make a similar change for other messages in the same
> > function.
> 
> +1 on changing the msg(s) suggested way. Please find the patch for the
> same. It also removes double quotes around the variable names

Thanks for the discussion.

With a translator hat on, I would be happy if I could determine
whether a word requires translation with minimal background
information. In this case, a translator needs to know which values
wal_level can take. It's relatively easy in this case, but I'm not
sure if this is always the case. Therefore, I would be slightly
happier if "logical" were double-quoted.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: About a recently-added message

2024-02-13 Thread Kyotaro Horiguchi

At Wed, 14 Feb 2024 16:26:52 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > "wal_level" must be >= logical.
..
> > wal_level must be set to "replica" or "logical" at server start.
..
> I suggest making the quoting policy consistent.

Just after this, I found another inconsistency regarding quotation.

> 'dbname' must be specified in "%s".

The use of single quotes doesn't seem to comply with our standard.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

About a recently-added message

2024-02-13 Thread Kyotaro Horiguchi

A recent commit added the following message:

> "wal_level" must be >= logical.

The use of the term "logical" here is a bit confusing, as it's unclear
whether it's meant to be a natural language word or a token. (I
believe it to be a token.)

On the contrary, we already have the following message:

> wal_level must be set to "replica" or "logical" at server start.

This has the conflicting policy about quotation of variable names and
enum values.

I suggest making the quoting policy consistent.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: A comment in DropRole() contradicts the actual behavior

2024-02-08 Thread Kyotaro Horiguchi

At Thu, 8 Feb 2024 16:39:23 +0900, Michael Paquier  wrote 
in 
> On Thu, Feb 08, 2024 at 09:00:01AM +0300, Alexander Lakhin wrote:
> > Hello,
> > 
> > Please look at errors, which produced by the following script, starting
> > from 6566133c5:
> > for i in `seq 100`; do (echo "CREATE USER u; DROP USER u;"); done | psql 
> > >psql-1.log 2>&1 &
> > for i in `seq 100`; do (echo "CREATE USER u; DROP USER u;"); done | psql 
> > >psql-2.log 2>&1 &
> > wait
> > 
> > ERROR:  could not find tuple for role 16387
> > ERROR:  could not find tuple for role 16390
> > ERROR:  could not find tuple for role 16394
> > ...
> > 
> > Maybe these errors are expected, but then I'm confused by the comment in
> > DropRole():
> 
> Well, these errors should never happen, IMHO, so it seems to me that
> the comment is right and that the code lacks locking, contrary to what
> the comment tells.

The function acquires a lock, but it does not perform an existence
check until it first attempts to fetch the tuple, believing in its
presence. However, I doubt performing an additional existence check
right after acquiring the lock is worthwhile.

By the way, I saw the following error with the provided script:

> ERROR:  duplicate key value violates unique constraint 
> "pg_authid_rolname_index"
> DETAIL:  Key (rolname)=(u) already exists.
> STATEMENT:  CREATE USER u;

This seems to be another instance of a similar thinko.

I vaguely think that we should just regard the absence as a concurrent
drop and either adjust or remove the message, then fix the
comment. The situation is slightly different for the duplication
case. We shouldn't ignore it but rather need to adjust the error
message.  As long as these behaviors don't lead to inconsistencies.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: InstallXLogFileSegment() vs concurrent WAL flush

2024-02-05 Thread Kyotaro Horiguchi

At Fri, 2 Feb 2024 14:42:46 +0100, Thomas Munro  wrote 
in 
> On Fri, Feb 2, 2024 at 12:56 PM Yugo NAGATA  wrote:
> > On Fri, 2 Feb 2024 11:18:18 +0100
> > Thomas Munro  wrote:
> > > One simple way to address that would be to make XLogFileInitInternal()
> > > wait for InstallXLogFileSegment() to finish.  It's a little
> >
> > Or, can we make sure the rename is durable by calling fsync before
> > returning the fd, as a patch attached here?
> 
> Right, yeah, that works too.  I'm not sure which way is better.

I'm not sure I like issuing spurious syncs unconditionally. Therefore,
I prefer Thomas' approach in that regard.  0002 would be beneficial,
considering the case of a very large max_wal_size, and the code seems
to be the minimal required. I don't think it matters that the lock
attempts occur uselessly until the first segment installation. That
being said, we could avoid it by initializing
last_known_installed_segno properly.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-05 Thread Kyotaro Horiguchi

At Mon, 5 Feb 2024 17:22:56 +0900, Yugo NAGATA  wrote in 
> On Mon, 05 Feb 2024 11:28:59 +0900
> torikoshia  wrote:
> 
> > > Based on this, I've made a patch.
> > > based on COPY Synopsis: ON_ERROR 'error_action'
> > > on_error 'null', the  keyword NULL should be single quoted.
> > 
> > As you mentioned, single quotation seems a little odd..
> > 
> > I'm not sure what is the best name and syntax for this feature, but 
> > since current error_action are verbs('stop' and 'ignore'), I feel 'null' 
> > might not be appropriate.
> 
> I am not in favour of using 'null' either, so I suggested to use
> "set_to_null" or more generic syntax like  "set_to (col, val)" in my
> previous post[1], although I'm not convinced what is the best either.
> 
> [1] 
> https://www.postgresql.org/message-id/20240129172858.ccb6c77c3be95a295e2b2b44%40sraoss.co.jp

Tom sugggested using a separate option, and I agree with the
suggestion. Taking this into consideration, I imagined something like
the following, for example.  Although I'm not sure we are actually
going to do whole-tuple replacement, the action name in this example
has the suffix '-column'.

COPY (on_error 'replace-colomn', replacement 'null') ..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: More new SQL/JSON item methods

2024-01-31 Thread Kyotaro Horiguchi

Sorry for a minor correction, but..

At Thu, 01 Feb 2024 14:53:57 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Ah.. Understood. "NaN or Infinity" cannot be used in those
> cases. Additionally, for jpiBoolean and jpiBigint, we lack the text
> representation of the value.

This "Additionally" was merely left in by mistake.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: More new SQL/JSON item methods

2024-01-31 Thread Kyotaro Horiguchi

At Thu, 1 Feb 2024 09:22:22 +0530, Jeevan Chalke 
 wrote in 
> On Thu, Feb 1, 2024 at 7:24 AM Kyotaro Horiguchi 
> wrote:
> 
> > At Thu, 01 Feb 2024 10:49:57 +0900 (JST), Kyotaro Horiguchi <
> > horikyota@gmail.com> wrote in
> > > By the way, while playing with this feature, I noticed the following
> > > error message:
> > >
> > > > select jsonb_path_query('1.1' , '$.boolean()');
> > > > ERROR:  numeric argument of jsonpath item method .boolean() is out of
> > range for type boolean
> > >
> > > The error message seems a bit off to me. For example, "argument '1.1'
> > > is invalid for type [bB]oolean" seems more appropriate for this
> > > specific issue. (I'm not ceratin about our policy on the spelling of
> > > Boolean..)
> >
> > Or, following our general convention, it would be spelled as:
> >
> > 'invalid argument for type Boolean: "1.1"'
> >
> 
> jsonpath way:

Hmm. I see.

> ERROR:  argument of jsonpath item method .boolean() is invalid for type
> boolean
> 
> or, if we add input value, then
> 
> ERROR:  argument "1.1" of jsonpath item method .boolean() is invalid for
> type boolean
> 
> And this should work for all the error types, like out of range, not valid,
> invalid input, etc, etc. Also, we don't need separate error messages for
> string input as well, which currently has the following form:
> 
> "string argument of jsonpath item method .%s() is not a valid
> representation.."

Agreed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: More new SQL/JSON item methods

2024-01-31 Thread Kyotaro Horiguchi

At Thu, 1 Feb 2024 09:19:40 +0530, Jeevan Chalke 
 wrote in 
> > As Tom suggested, given that similar situations have already been
> > disregarded elsewhere, worrying about excessively long input strings
> > in this specific instance won't notably improve safety in total.
> >
> > > Also, for non-string input, we need to convert numeric to string just for
> > > the error message, which seems overkill.
> >
> > As I suggested and you seem to agree, using literally "Nan or
> > Infinity" would be sufficient.
> >
> 
> I am more concerned about .bigint() and .integer(). We can have errors when
> the numeric input is out of range, but not NaN or Infinity. At those
> places, we need to convert numeric to string to put that value into the
> error.
> Do you mean we should still put "Nan or Infinity" there?
> 
> This is the case:
>  select jsonb_path_query('12345678901', '$.integer()');
>  ERROR:  numeric argument of jsonpath item method .integer() is out of
> range for type integer

Ah.. Understood. "NaN or Infinity" cannot be used in those
cases. Additionally, for jpiBoolean and jpiBigint, we lack the text
representation of the value.

By a quick grepping, I found that the following functions call
numeric_out to convert the jbvNumeric values back into text
representation.

JsonbValueAstext, populate_scalar, iterate_jsonb_values,
executeItemOptUnrwapTarget, jsonb_put_escaped_value

The function iterate_jsonb_values(), in particular, iterates over a
values array, calling numeric_out on each iteration.

The following functions re-converts the converted numeric into another type.

jsonb_int[248]() converts the numeric value into int2 using numeric_int[248]().
jsonb_float[48]() converts it into float4 using numeric_float[48]().

Given these facts, it seems more efficient for jbvNumber to retain the
original scalar value, converting it only when necessary. If needed,
we could also add a numeric struct member as a cache for better
performance. I'm not sure we refer the values more than once, though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: More new SQL/JSON item methods

2024-01-31 Thread Kyotaro Horiguchi

At Thu, 01 Feb 2024 10:49:57 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> By the way, while playing with this feature, I noticed the following
> error message:
> 
> > select jsonb_path_query('1.1' , '$.boolean()');
> > ERROR:  numeric argument of jsonpath item method .boolean() is out of range 
> > for type boolean
> 
> The error message seems a bit off to me. For example, "argument '1.1'
> is invalid for type [bB]oolean" seems more appropriate for this
> specific issue. (I'm not ceratin about our policy on the spelling of
> Boolean..)

Or, following our general convention, it would be spelled as:

'invalid argument for type Boolean: "1.1"'

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: More new SQL/JSON item methods

2024-01-31 Thread Kyotaro Horiguchi

Thank you for the fix!

At Tue, 30 Jan 2024 13:46:17 +0530, Jeevan Chalke 
 wrote in 
> On Mon, Jan 29, 2024 at 11:03 AM Tom Lane  wrote:
> 
> > Kyotaro Horiguchi  writes:
> > > Having said that, I'm a bit concerned about the case where an overly
> > > long string is given there. However, considering that we already have
> > > "invalid input syntax for type xxx: %x" messages (including for the
> > > numeric), this concern might be unnecessary.
> >
> > Yeah, we have not worried about that in the past.
> >
> > > Another concern is that the input value is already a numeric, not a
> > > string. This situation occurs when the input is NaN of +-Inf. Although
> > > numeric_out could be used, it might cause another error. Therefore,
> > > simply writing out as "argument NaN and Infinity.." would be better.
> >
> > Oh!  I'd assumed that we were discussing a string that we'd failed to
> > convert to numeric.  If the input is already numeric, then either
> > the error is unreachable or what we're really doing is rejecting
> > special values such as NaN on policy grounds.  I would ask first
> > if that policy is sane at all.  (I'd lean to "not" --- if we allow
> > it in the input JSON, why not in the output?)  If it is sane, the
> > error message needs to be far more specific.
> >
> > regards, tom lane
> >
> 
> *Consistent error message related to type:*
...
> What if we use phrases like "for type double precision" at both places,
> like:
> numeric argument of jsonpath item method .%s() is out of range for type
> double precision
> string argument of jsonpath item method .%s() is not a valid representation
> for type double precision
> 
> With this, the rest will be like:
> for type numeric
> for type bigint
> for type integer
> 
> Suggestions?

FWIW, I prefer consistently using "for type hoge".

> ---
> 
> *Showing input string in the error message:*
> 
> argument "...input string here..." of jsonpath item method .%s() is out of
> range for type numeric
> 
> If we add the input string in the error, then for some errors, it will be
> too big, for example:
> 
> -ERROR:  numeric argument of jsonpath item method .double() is out of range
> for type double precision
> +ERROR:  argument
> "10"
> of jsonpath item method .double() is out of range for type double precision

As Tom suggested, given that similar situations have already been
disregarded elsewhere, worrying about excessively long input strings
in this specific instance won't notably improve safety in total.

> Also, for non-string input, we need to convert numeric to string just for
> the error message, which seems overkill.

As I suggested and you seem to agree, using literally "Nan or
Infinity" would be sufficient.

> On another note, irrespective of these changes, is it good to show the
> given input in the error messages? Error messages are logged and may leak
> some details.
> 
> I think the existing way seems ok.

In my opinion, it is quite common to include the error-causing value
in error messages. Also, we have already many functions that impliy
the possibility for revealing input values when converting text
representation into internal format, such as with int4in. However, I
don't stick to that way.

> ---
> 
> *NaN and Infinity restrictions:*
> 
> I am not sure why NaN and Infinity are not allowed in conversion to double
> precision (.double() method). I have used the same restriction for
> .decimal() and .number(). However, as you said, we should have error
> messages more specific. I tried that in the attached patch; please have
> your views. I have the following wordings for that error message:
> "NaN or Infinity is not allowed for jsonpath item method .%s()"
> 
> Suggestions...

They seem good to *me*.

By the way, while playing with this feature, I noticed the following
error message:

> select jsonb_path_query('1.1' , '$.boolean()');
> ERROR:  numeric argument of jsonpath item method .boolean() is out of range 
> for type boolean

The error message seems a bit off to me. For example, "argument '1.1'
is invalid for type [bB]oolean" seems more appropriate for this
specific issue. (I'm not ceratin about our policy on the spelling of
Boolean..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

possible inter-gcc-version incompatibility regarding fat objects

2024-01-30 Thread Kyotaro Horiguchi

Hello.

I would like to share some information regarding a recent issue we
encoutered.

While compiling an extension against the RPM package
postgresql16-devel-16.1-2PGDG.rhel9.x86_64.rpm we faced a problem that
appears to be caused by the combination of LTO and constant
folding/propagation, leading to a SEGV. This crash didn't occur when
we gave --fno-lto instead of --flto=auto to the compiler.

To put this case briefly:
postgresql16-devel-16.1-2PGDG.rhel9.x86_64.rpm + gcc 11.3
  => building the extension completes but the binary may crash with SEGV
 --fno-lto prevents this crash from occurring.

The PG package is built with gcc-11.4, and the compiler in our build
environment for the extension was gcc-11.3. A quick search suggests
that the LTO bytecode format was optimized in gcc-11.4. Conversely,
when we built the extension with postgresql-16-16.0 and gc-11.4, the
build fails. This failure seems to stem LTO binary format
incompatibility.

> Error: bad register name `%'
> Error: open CFI at the end of file; missing .cfi_endproc directive
> /usr/bin/ld: error: lto-wrapper failed

To put this case briefly:
postgresql16-devel-16.0-1PGDG.rhel9.x86_64.rpm + gcc 11.4
  => build failure regarding LTO

Given these issues, it seems there might be some issues with including
fat objects in the -devel package.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Network failure may prevent promotion

2024-01-28 Thread Kyotaro Horiguchi

Thank you fixing the issue.

At Tue, 23 Jan 2024 11:43:43 +0200, Heikki Linnakangas  wrote i
n 
> There's an existing AmWalReceiverProcess() macro too. Let's use that.

Mmm. I sought an Is* function becuase "IsLogicalWorker()" is placed on
the previous line. Our convention regarding those functions (macros)
and variables seems inconsistent. However, I can't say for sure that
we should unify all of them.

> (See also
> https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0%40iki.fi
> for refactoring in this area)
> 
> Here's a patch set summarizing the changes so far. They should be
> squashed, but I kept them separate for now to help with review:
> 
> 1. revert the revert of 728f86fec6.
> 2. your walrcv_shutdown_deblocking_v2-2.patch
> 3. Also replace libpqrcv_PQexec() and libpqrcv_PQgetResult() with the
> wrappers from libpq-be-fe-helpers.h

Both replacements look fine. I didn't find another instance of similar
code.

> 4. Replace IsWalReceiver() with AmWalReceiverProcess()

Just look fine.

> >> - pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request
> >> - shutdown */
> >> + pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown
> >> */
> >>
> >> Can't we just use die(), instead?
> > There was a comment explaining the problems associated with exiting
> > within a signal handler;
> > - * Currently, only SIGTERM is of interest.  We can't just exit(1) within
> > - * the
> > - * SIGTERM signal handler, because the signal might arrive in the middle
> > - * of
> > - * some critical operation, like while we're holding a spinlock.
> > - * Instead, the
> > And I think we should keep the considerations it suggests. The patch
> > removes the comment itself, but it does so because it implements our
> > standard process exit procedure, which incorporates points suggested
> > by the now-removed comment.
> 
> die() doesn't call exit(1). Unless DoingCommandRead is set, but it
> never is in the walreceiver. It looks just like the new
> WalRcvShutdownSignalHandler() function. Am I missing something?

Ugh.. Doesn't the name 'die()' suggest exit()?
I agree that die() can be used instad.

> Hmm, but doesn't bgworker_die() have that problem with exit(1)ing in
> the signal handler?

I noticed that but ignored for this time.

> I also wonder if we should replace SignalHandlerForShutdownRequest()
> completely with die(), in all processes? The difference is that
> SignalHandlerForShutdownRequest() uses ShutdownRequestPending, while
> die() uses ProcDiePending && InterruptPending to indicate that the
> signal was received. Or do some of the processes want to check for
> ShutdownRequestPending only at specific places, and don't want to get
> terminated at the any random CHECK_FOR_INTERRUPTS()?

At least, pg_log_backend_memory_context() causes a call to
ProcessInterrupts via "ereport(LOG_SERVER_ONLY" which can lead to an
exit due to ProcDiePending. In this regard, checkpointer clearly
requires the distinction.

Rather than merely consolidating the notification variables and
striving to annihilate CFI calls in the execution path, I
believe we need a shutdown mechanism that CFI doesn't react
to. However, as for the method to achieve this, whether we should keep
the notification variables separate as they are now, or whether it
would be better to introduce a variable that causes CFI to ignore
ProcDiePending, is a matter I think is open to discussion.

Attached patches are the rebased version of v3 (0003 is updated) and
additional 0005 that makes use of die() instead of walreceiver's
custom function.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From daac3aa06bd33f5e8f04a406c8a59fd4cc97 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 23 Jan 2024 11:01:03 +0200
Subject: [PATCH v4 1/5] Revert "Revert "libpqwalreceiver: Convert to
 libpq-be-fe-helpers.h""

This reverts commit 21ef4d4d897563adb2f7920ad53b734950f1e0a4.
---
 .../libpqwalreceiver/libpqwalreceiver.c   | 55 +++
 1 file changed, 8 insertions(+), 47 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 2439733b55..dbee2f8f0e 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -24,6 +24,7 @@
 #include "common/connect.h"
 #include "funcapi.h"
 #include "libpq-fe.h"
+#include "libpq/libpq-be-fe-helpers.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -136,7 +137,6 @@ libpqrcv_connect(const char *conninfo, bool logical, bool must_use_password,
  c

Re: More new SQL/JSON item methods

2024-01-28 Thread Kyotaro Horiguchi

At Sun, 28 Jan 2024 22:47:02 -0500, Tom Lane  
wrote in

Kyotaro Horiguchi  writes:
> They seem to be suggesting that PostgreSQL has the types 
"decimal" and
> "number". I know of the former, but I don't think PostgreSQL has 
the
>  latter type. Perhaps the "number" was intended to refer to 
"numeric"?

Probably.  But I would write just "type numeric".  We do not 
generally
acknowledge "decimal" as a separate type, because for us it's only 
an

alias for numeric (there is not a pg_type entry for it).

Also, that leads to the thought that "numeric argument ... is out of
range for type numeric" seems either redundant or contradictory
depending on how you look at it.  So I suggest wording like

argument "...input string here..." of jsonpath item method .%s() is 
out of range for type numeric

> (And I think it is largely helpful if the given string were shown 
in

> the error message, but it would be another issue.)

Agreed, so I suggest the above.

Having said that, I'm a bit concerned about the case where an overly
long string is given there. However, considering that we already have
"invalid input syntax for type xxx: %x" messages (including for the
numeric), this concern might be unnecessary.

Another concern is that the input value is already a numeric, not a
string. This situation occurs when the input is NaN of +-Inf. Although
numeric_out could be used, it might cause another error. Therefore,
simply writing out as "argument NaN and Infinity.." would be better.

Once we resolve these issues, another question arises regarding on of
the messages. In the case of NaN of Infinity, the message will be as
the follows:

"argument NaN or Infinity of jsonpath item method .%s() is out of 
range for type numeric"

This message appears quite strange and puzzling. I suspect that the
intended message was somewhat different.

> The same commit has introduced the following set of messages:

>> %s format is not recognized: "%s"
>> date format is not recognized: "%s"
>> time format is not recognized: "%s"
>> time_tz format is not recognized: "%s"
>> timestamp format is not recognized: "%s"
>> timestamp_tz format is not recognized: "%s"

> I believe that the first line was intended to cover all the 
others:p

+1

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Re: More new SQL/JSON item methods

2024-01-28 Thread Kyotaro Horiguchi

I have two possible issues in a recent commit.

Commit 66ea94e8e6 has introduced the following messages:
(Aplogizies in advance if the commit is not related to this thread.)

jsonpath_exec.c:1287
> if (numeric_is_nan(num) || numeric_is_inf(num))
>   RETURN_ERROR(ereport(ERROR,
>  (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM),
>   errmsg("numeric argument of jsonpath item method .%s() is out 
> of range for type decimal or number",
>jspOperationName(jsp->type);

:1387
>   noerr = DirectInputFunctionCallSafe(numeric_in, numstr,
...
> if (!noerr || escontext.error_occurred)
>   RETURN_ERROR(ereport(ERROR,
>  (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM),
>   errmsg("string argument of jsonpath item method .%s() is not a 
> valid representation of a decimal or number",

They seem to be suggesting that PostgreSQL has the types "decimal" and
"number". I know of the former, but I don't think PostgreSQL has the
 latter type. Perhaps the "number" was intended to refer to "numeric"?
(And I think it is largely helpful if the given string were shown in
the error message, but it would be another issue.)


The same commit has introduced the following set of messages:

> %s format is not recognized: "%s"
> date format is not recognized: "%s"
> time format is not recognized: "%s"
> time_tz format is not recognized: "%s"
> timestamp format is not recognized: "%s"
> timestamp_tz format is not recognized: "%s"

I believe that the first line was intended to cover all the others:p

They are attached to this message separately.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index 22f598cd35..c10926a8a2 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -1287,7 +1287,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
 	if (numeric_is_nan(num) || numeric_is_inf(num))
 		RETURN_ERROR(ereport(ERROR,
 			 (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM),
-			  errmsg("numeric argument of jsonpath item method .%s() is out of range for type decimal or number",
+			  errmsg("numeric argument of jsonpath item method .%s() is out of range for type decimal or numeric",
 	 jspOperationName(jsp->type);
 
 	if (jsp->type == jpiDecimal)
@@ -1312,14 +1312,14 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
 	if (!noerr || escontext.error_occurred)
 		RETURN_ERROR(ereport(ERROR,
 			 (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM),
-			  errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or number",
+			  errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or numeric",
 	 jspOperationName(jsp->type);
 
 	num = DatumGetNumeric(datum);
 	if (numeric_is_nan(num) || numeric_is_inf(num))
 		RETURN_ERROR(ereport(ERROR,
 			 (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM),
-			  errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or number",
+			  errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or numeric",
 	 jspOperationName(jsp->type);
 
 	res = jperOk;
@@ -1401,7 +1401,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
 	if (!noerr || escontext.error_occurred)
 		RETURN_ERROR(ereport(ERROR,
 			 (errcode(ERRCODE_NON_NUMERIC_SQL_JSON_ITEM),
-			  errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or number",
+			  errmsg("string argument of jsonpath item method .%s() is not a valid representation of a decimal or numeric",
 	 jspOperationName(jsp->type);
 
 	num = DatumGetNumeric(numdatum);
diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out
index eea2af30c8..9d8ce48a25 100644
--- a/src/test/regress/expected/jsonb_jsonpath.out
+++ b/src/test/regress/expected/jsonb_jsonpath.out
@@ -2144,7 +2144,7 @@ select jsonb_path_query('"1.23"', '$.decimal()');
 (1 row)
 
 select jsonb_path_query('"1.23aaa"

Re: Network failure may prevent promotion

2024-01-23 Thread Kyotaro Horiguchi

Thank you for looking this!

At Tue, 23 Jan 2024 15:07:10 +0900, Fujii Masao  wrote 
in 
> Regarding the patch, here are the review comments.
> 
> +/*
> + * Is current process a wal receiver?
> + */
> +bool
> +IsWalReceiver(void)
> +{
> + return WalRcv != NULL;
> +}
> 
> This looks wrong because WalRcv can be non-NULL in processes other
> than walreceiver.

Mmm. Sorry for the silly mistake. We can use B_WAL_RECEIVER
instead. I'm not sure if the new function IsWalReceiver() is
required. The expression "MyBackendType == B_WAL_RECEIVER" is quite
descriptive. However, the function does make ProcessInterrupts() more
aligned regarding process types.

> - pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request shutdown */
> + pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown */
> 
> Can't we just use die(), instead?

There was a comment explaining the problems associated with exiting
within a signal handler;

- * Currently, only SIGTERM is of interest.  We can't just exit(1) within the
- * SIGTERM signal handler, because the signal might arrive in the middle of
- * some critical operation, like while we're holding a spinlock.  Instead, the

And I think we should keep the considerations it suggests. The patch
removes the comment itself, but it does so because it implements our
standard process exit procedure, which incorporates points suggested
by the now-removed comment.

--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 201c36cb22..db779dc6ca 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -749,7 +749,7 @@ libpqrcv_PQgetResult(PGconn *streamConn)
 		if (rc & WL_LATCH_SET)
 		{
 			ResetLatch(MyLatch);
-			ProcessWalRcvInterrupts();
+			CHECK_FOR_INTERRUPTS();
 		}
 
 		/* Consume whatever data is available from the socket */
@@ -1053,7 +1053,7 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 	{
 		char	   *cstrs[MaxTupleAttributeNumber];
 
-		ProcessWalRcvInterrupts();
+		CHECK_FOR_INTERRUPTS();
 
 		/* Do the allocations in temporary context. */
 		oldcontext = MemoryContextSwitchTo(rowcontext);
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 728059518e..e491f7d4c5 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -147,39 +147,34 @@ static void XLogWalRcvSendReply(bool force, bool requestReply);
 static void XLogWalRcvSendHSFeedback(bool immed);
 static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime);
 static void WalRcvComputeNextWakeup(WalRcvWakeupReason reason, TimestampTz now);
+static void WalRcvShutdownSignalHandler(SIGNAL_ARGS);
 
-/*
- * Process any interrupts the walreceiver process may have received.
- * This should be called any time the process's latch has become set.
- *
- * Currently, only SIGTERM is of interest.  We can't just exit(1) within the
- * SIGTERM signal handler, because the signal might arrive in the middle of
- * some critical operation, like while we're holding a spinlock.  Instead, the
- * signal handler sets a flag variable as well as setting the process's latch.
- * We must check the flag (by calling ProcessWalRcvInterrupts) anytime the
- * latch has become set.  Operations that could block for a long time, such as
- * reading from a remote server, must pay attention to the latch too; see
- * libpqrcv_PQgetResult for example.
- */
 void
-ProcessWalRcvInterrupts(void)
+WalRcvShutdownSignalHandler(SIGNAL_ARGS)
 {
-	/*
-	 * Although walreceiver interrupt handling doesn't use the same scheme as
-	 * regular backends, call CHECK_FOR_INTERRUPTS() to make sure we receive
-	 * any incoming signals on Win32, and also to make sure we process any
-	 * barrier events.
-	 */
-	CHECK_FOR_INTERRUPTS();
+	int			save_errno = errno;
 
-	if (ShutdownRequestPending)
+	/* Don't joggle the elbow of proc_exit */
+	if (!proc_exit_inprogress)
 	{
-		ereport(FATAL,
-(errcode(ERRCODE_ADMIN_SHUTDOWN),
- errmsg("terminating walreceiver process due to administrator command")));
+		InterruptPending = true;
+		ProcDiePending = true;
 	}
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+	
 }
 
+/*
+ * Is current process a wal receiver?
+ */
+bool
+IsWalReceiver(void)
+{
+	return MyBackendType == B_WAL_RECEIVER;
+}
 
 /* Main entry point for walreceiver process */
 void
@@ -277,7 +272,7 @@ WalReceiverMain(void)
 	pqsignal(SIGHUP, SignalHandlerForConfigReload); /* set flag to read config
 	 * file */
 	pqsignal(SIGINT, SIG_IGN);
-	pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request shutdown */
+	pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown */
 	/* SIGQUIT handler was already set up by InitP

Fix some errdetail's message format

2024-01-22 Thread Kyotaro Horiguchi

We recently made corrections to the capitalization of DETAIL messages.
For example;

-   GUC_check_errdetail("invalid list syntax in parameter %s",
+   GUC_check_errdetail("Invalid list syntax in parameter %s",

But still it is missing the period at the end.

There are several patterns to this issue, but this time, I have only
fixed the ones that are simple and obvious as follows:

a. GUC_check_errdetail("LITERAL"), errdetail("LITERAL") without a period.
b. psprintf()'ed string that is passed to errdetail_internal()

I didn't touch the following patterns:

c. errdetail_internal("%s")
d. errdetail("Whatever: %s")
e. errdetail("%s versus %s") and alikes
f. errdetail_internal("%s", pchomp(PQerrorMessage()))
g. complex message compilation


The attached patch contains the following fix:

-   GUC_check_errdetail("timestamp out of range: 
\"%s\"", str);
+   GUC_check_errdetail("Timestamp out of range: 
\"%s\".", str);

But I'm not quite confident about whether capitalizing the type name
here is correct.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/cube/cubescan.l b/contrib/cube/cubescan.l
index a30fbfc311..b45dc08f0c 100644
--- a/contrib/cube/cubescan.l
+++ b/contrib/cube/cubescan.l
@@ -82,7 +82,7 @@ cube_yyerror(NDBOX **result, Size scanbuflen,
 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  errmsg("invalid input syntax for cube"),
  /* translator: %s is typically "syntax error" */
- errdetail("%s at end of input", message)));
+ errdetail("%s at end of input.", message)));
 	}
 	else
 	{
@@ -90,7 +90,7 @@ cube_yyerror(NDBOX **result, Size scanbuflen,
 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  errmsg("invalid input syntax for cube"),
  /* translator: first %s is typically "syntax error" */
- errdetail("%s at or near \"%s\"", message, yytext)));
+ errdetail("%s at or near \"%s\".", message, yytext)));
 	}
 }
 
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 19a362526d..2720e91c14 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2633,7 +2633,7 @@ dblink_security_check(PGconn *conn, remoteConn *rconn, const char *connstr)
 	ereport(ERROR,
 			(errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
 			 errmsg("password or GSSAPI delegated credentials required"),
-			 errdetail("Non-superusers may only connect using credentials they provide, eg: password in connection string or delegated GSSAPI credentials"),
+			 errdetail("Non-superusers may only connect using credentials they provide, eg: password in connection string or delegated GSSAPI credentials."),
 			 errhint("Ensure provided credentials match target server's authentication method.")));
 }
 
diff --git a/contrib/seg/segscan.l b/contrib/seg/segscan.l
index 4ad529eccc..f5a72c2496 100644
--- a/contrib/seg/segscan.l
+++ b/contrib/seg/segscan.l
@@ -79,7 +79,7 @@ seg_yyerror(SEG *result, struct Node *escontext, const char *message)
 (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("bad seg representation"),
  /* translator: %s is typically "syntax error" */
- errdetail("%s at end of input", message)));
+ errdetail("%s at end of input.", message)));
 	}
 	else
 	{
@@ -87,7 +87,7 @@ seg_yyerror(SEG *result, struct Node *escontext, const char *message)
 (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("bad seg representation"),
  /* translator: first %s is typically "syntax error" */
- errdetail("%s at or near \"%s\"", message, yytext)));
+ errdetail("%s at or near \"%s\".", message, yytext)));
 	}
 }
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 1b48d7171a..2d90bbfee8 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4882,7 +4882,7 @@ check_recovery_target_time(char **newval, void **extra, GucSource source)
 
 			if (tm2timestamp(tm, fsec, , ) != 0)
 			{
-GUC_check_errdetail("timestamp out of range: \"%s\"", str);
+GUC_check_errdetail("Timestamp out of range: \"%s\".", str);
 return false;
 			}
 		}
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 226f85d0e3..564f79dab9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -2946,7 +2946,7 @@ AlterExtensionNamespace(const char *extensionName, const char *newschema, Oid *o
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("extension \"%s\" does not sup

Re: Network failure may prevent promotion

2024-01-22 Thread Kyotaro Horiguchi

At Mon, 22 Jan 2024 13:29:10 -0800, Andres Freund  wrote in 
> Hi,
> 
> On 2024-01-19 12:28:05 +0900, Michael Paquier wrote:
> > On Thu, Jan 18, 2024 at 03:42:28PM +0200, Heikki Linnakangas wrote:
> > > Given that commit 728f86fec6 that introduced this issue was not strictly
> > > required, perhaps we should just revert it for v16.
> > 
> > Is there a point in keeping 728f86fec6 as well on HEAD?  That does not
> > strike me as wise to keep that in the tree for now.  If it needs to be
> > reworked, looking at this problem from scratch would be a safer
> > approach.
> 
> IDK, I think we'll introduce this type of bug over and over if we don't fix it
> properly.

Just to clarify my position, I thought that 728f86fec6 was heading the
right direction. Considering the current approach to signal handling
in walreceiver, I believed that it would be better to further
generalize in this direction rather than reverting. That's why I
proposed that patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: In-placre persistance change of a relation

2024-01-22 Thread Kyotaro Horiguchi

At Mon, 22 Jan 2024 15:36:31 +1100, Peter Smith  wrote 
in 
> 2024-01 Commitfest.
> 
> Hi, This patch has a CF status of "Needs Review" [1], but it seems
> there was a CFbot test failure last time it was run [2]. Please have a
> look and post an updated version if necessary.

Thanks! I have added the necessary includes to the header file this
patch adds. With this change, "make headerscheck" now passes. However,
when I run "make cpluspluscheck" in my environment, it generates a
large number of errors in other areas, but I didn't find one related
to this patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 9a2b6fbda882587c127d3e50bccf89508837d1a5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 15 Jan 2024 15:57:53 +0900
Subject: [PATCH v32 1/3] Export wal_sync_method related functions

Export several functions related to wal_sync_method for use in
subsequent commits. Since PG_O_DIRECT cannot be used in those commits,
the new function XLogGetSyncBit() will mask PG_O_DIRECT.
---
 src/backend/access/transam/xlog.c | 73 +--
 src/include/access/xlog.h |  2 +
 2 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 478377c4a2..c5f51849ee 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8403,21 +8403,29 @@ assign_wal_sync_method(int new_wal_sync_method, void *extra)
 	}
 }
 
+/*
+ * Exported version of get_sync_bit()
+ *
+ * Do not expose PG_O_DIRECT for uses outside xlog.c.
+ */
+int
+XLogGetSyncBit(void)
+{
+	return get_sync_bit(wal_sync_method) & ~PG_O_DIRECT;
+}
+
 
 /*
- * Issue appropriate kind of fsync (if any) for an XLOG output file.
+ * Issue appropriate kind of fsync (if any) according to wal_sync_method.
  *
- * 'fd' is a file descriptor for the XLOG file to be fsync'd.
- * 'segno' is for error reporting purposes.
+ * 'fd' is a file descriptor for the file to be fsync'd.
  */
-void
-issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
+const char *
+XLogFsyncFile(int fd)
 {
-	char	   *msg = NULL;
+	const char *msg = NULL;
 	instr_time	start;
 
-	Assert(tli != 0);
-
 	/*
 	 * Quick exit if fsync is disabled or write() has already synced the WAL
 	 * file.
@@ -8425,7 +8433,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	if (!enableFsync ||
 		wal_sync_method == WAL_SYNC_METHOD_OPEN ||
 		wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC)
-		return;
+		return NULL;
 
 	/* Measure I/O timing to sync the WAL file */
 	if (track_wal_io_timing)
@@ -8460,19 +8468,6 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 			break;
 	}
 
-	/* PANIC if failed to fsync */
-	if (msg)
-	{
-		char		xlogfname[MAXFNAMELEN];
-		int			save_errno = errno;
-
-		XLogFileName(xlogfname, tli, segno, wal_segment_size);
-		errno = save_errno;
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg(msg, xlogfname)));
-	}
-
 	pgstat_report_wait_end();
 
 	/*
@@ -8486,7 +8481,39 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 		INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
 	}
 
-	PendingWalStats.wal_sync++;
+	if (msg != NULL)
+		PendingWalStats.wal_sync++;
+
+	return msg;
+}
+
+/*
+ * Issue appropriate kind of fsync (if any) for an XLOG output file.
+ *
+ * 'fd' is a file descriptor for the XLOG file to be fsync'd.
+ * 'segno' is for error reporting purposes.
+ */
+void
+issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
+{
+	const char	*msg;
+	
+	Assert(tli != 0);
+
+	msg = XLogFsyncFile(fd);
+
+	/* PANIC if failed to fsync */
+	if (msg)
+	{
+		char		xlogfname[MAXFNAMELEN];
+		int			save_errno = errno;
+
+		XLogFileName(xlogfname, tli, segno, wal_segment_size);
+		errno = save_errno;
+		ereport(PANIC,
+(errcode_for_file_access(),
+ errmsg(msg, xlogfname)));
+	}
 }
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 301c5fa11f..2a0d65b537 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -217,6 +217,8 @@ extern void xlog_redo(struct XLogReaderState *record);
 extern void xlog_desc(StringInfo buf, struct XLogReaderState *record);
 extern const char *xlog_identify(uint8 info);
 
+extern int XLogGetSyncBit(void);
+extern const char *XLogFsyncFile(int fd);
 extern void issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli);
 
 extern bool RecoveryInProgress(void);
-- 
2.39.3

>From c464013071dedc15b838e573ae828f150b3b60f7 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 31 Aug 2023 11:49:10 +0900
Subject: [PATCH v32 2/3] Introduce undo log implementation

This patch adds a simple implementation of UNDO log feature.
---
 src/backend/access/transam/Makefile|   1 +
 src/backend/access/transam/meson.build |   1 +
 src/backend/access/transam/rmgr.c  |   4 +-
 src/backend/access/transam/simpleundolog.c | 362 +

Re: Make mesage at end-of-recovery less scary.

2024-01-22 Thread Kyotaro Horiguchi

At Wed, 17 Jan 2024 14:32:00 +0900, Michael Paquier  wrote 
in 
> On Tue, Jan 16, 2024 at 02:46:02PM +0300, Aleksander Alekseev wrote:
> >> For now, let me explain the basis for this patch. The fundamental
> >> issue is that these warnings that always appear are, in practice, not
> >> a problem in almost all cases. Some of those who encounter them for
> >> the first time may feel uneasy and reach out with inquiries. On the
> >> other hand, those familiar with these warnings tend to ignore them and
> >> only pay attention to details when actual issues arise. Therefore, the
> >> intention of this patch is to label them as "no issue" unless a
> >> problem is blatantly evident, in order to prevent unnecessary concern.
> > 
> > I agree and don't mind affecting the error message per se.
> > 
> > However I see that the actual logic of how WAL is processed is being
> > changed. If we do this, at very least it requires thorough thinking. I
> > strongly suspect that the proposed code is wrong and/or not safe
> > and/or less safe than it is now for the reasons named above.
> 
> FWIW, that pretty much sums up my feeling regarding this patch,
> because an error, basically any error, would hurt back very badly.
> Sure, the error messages we generate now when reaching the end of WAL
> can sound scary, and they are (I suspect that's not really the case
> for anybody who has history doing support with PostgreSQL because a
> bunch of these messages are old enough to vote, but I can understand
> that anybody would freak out the first time they see that).
> 
> However, per the recent issues we've had in this area, like
> cd7f19da3468 but I'm more thinking about 6b18b3fe2c2f and
> bae868caf222, I am of the opinion that the header validation, the
> empty page case in XLogReaderValidatePageHeader() and the record read
> changes are risky enough that I am not convinced that the gains are
> worth the risks taken.
> 
> The error stack in the WAL reader is complicated enough that making it
> more complicated as the patch proposes does not sound like not a good
> tradeoff to me to make the reports related to the end of WAL cleaner
> for the end-user.  I agree that we should do something, but the patch
> does not seem like a good step towards this goal.  Perhaps somebody
> would be more excited about this proposal than I am, of course.

Thank you both for the comments. The criticism seems valid. The
approach to identifying the end-of-WAL state in this patch is quite
heuristic, and its validity or safety can certainly be contested. On
the other hand, if we seek perfection in this area of judgment, we may
need to have the WAL format itself more robust. In any case, since the
majority of the feedback on this patch seems to be negative, I am
going to withdraw it if no supportive opinions emerge during this
commit-fest.

The attached patch addresses the errors reported by CF-bot.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 933d10fa6c7b71e4684f5ba38e85177afaa56f58 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 7 Mar 2023 14:55:58 +0900
Subject: [PATCH v29] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is
happening. Actually if recovery meets a record with length = 0, that
usually means it finished applying all available WAL records.

Make this message less scary as "reached end of WAL". Instead, raise
the error level for other kind of WAL failure to WARNING.  To make
sure that the detection is correct, this patch checks if all trailing
bytes in the same page are zeroed in that case.
---
 src/backend/access/transam/xlogreader.c   | 147 ++
 src/backend/access/transam/xlogrecovery.c |  96 ++
 src/backend/replication/walreceiver.c |   7 +-
 src/bin/pg_waldump/pg_waldump.c   |  13 +-
 src/bin/pg_waldump/t/001_basic.pl |   5 +-
 src/include/access/xlogreader.h   |   1 +
 src/test/recovery/t/035_recovery.pl   | 130 +++
 src/test/recovery/t/039_end_of_wal.pl |  47 +--
 8 files changed, 383 insertions(+), 63 deletions(-)
 create mode 100644 src/test/recovery/t/035_recovery.pl

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 7190156f2f..94861969eb 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -48,6 +48,8 @@ static int	ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
 			 int reqLen);
 static void XLogReaderInvalReadState(XLogReaderState *state);
 static XLogPageReadResult XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking);
+static bool ValidXLogRecordLe

Re: Make mesage at end-of-recovery less scary.

2024-01-22 Thread Kyotaro Horiguchi

At Mon, 22 Jan 2024 16:09:28 +1100, Peter Smith  wrote 
in 
> 2024-01 Commitfest.
> 
> Hi, This patch has a CF status of "Needs Review" [1], but it seems
> there were CFbot test failures last time it was run [2]. Please have a
> look and post an updated version if necessary.
> 
> ==
> [1] https://commitfest.postgresql.org/46/2490/
> [2] 
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/2490

Thanks for noticing of that. Will repost a new version.
regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Network failure may prevent promotion

2024-01-18 Thread Kyotaro Horiguchi

At Sun, 31 Dec 2023 20:07:41 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> We've noticed that when walreceiver is waiting for a connection to
> complete, standby does not immediately respond to promotion
> requests. In PG14, upon receiving a promotion request, walreceiver
> terminates instantly, but in PG16, it waits for connection
> timeout. This behavior is attributed to commit 728f86fec65, where a
> part of libpqrcv_connect was simply replaced with a call to
> libpqsrc_connect_params. This behavior can be verified by simply
> dropping packets from the standby to the primary.

Apologize for the inconvenience on my part, but I need to fix this
behavior. To continue this discussion, I'm providing a repro script
here.

With the script, the standby is expected to promote immediately,
emitting the following log lines:

standby.log:
> 2024-01-18 16:25:22.245 JST [31849] LOG:  received promote request
> 2024-01-18 16:25:22.245 JST [31850] FATAL:  terminating walreceiver process 
> due to administrator command
> 2024-01-18 16:25:22.246 JST [31849] LOG:  redo is not required
> 2024-01-18 16:25:22.246 JST [31849] LOG:  selected new timeline ID: 2
> 2024-01-18 16:25:22.274 JST [31849] LOG:  archive recovery complete
> 2024-01-18 16:25:22.275 JST [31847] LOG:  checkpoint starting: force
> 2024-01-18 16:25:22.277 JST [31846] LOG:  database system is ready to accept 
> connections
> 2024-01-18 16:25:22.280 JST [31847] LOG:  checkpoint complete: wrote 3 
> buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, 
> sync=0.001 s, total=0.005 s; sync files=2, longest=0.001 s, average=0.001 s; 
> distance=0 kB, estimate=0 kB; lsn=0/1548E98, redo lsn=0/1548E40
> 2024-01-18 16:25:22.356 JST [31846] LOG:  received immediate shutdown request
> 2024-01-18 16:25:22.361 JST [31846] LOG:  database system is shut down

After 728f86fec65 was introduced, promotion does not complete with the
same operation, as follows. The patch attached to the previous mail
fixes this behavior to the old behavior above.

> 2024-01-18 16:47:53.314 JST [34515] LOG:  received promote request
> 2024-01-18 16:48:03.947 JST [34512] LOG:  received immediate shutdown request
> 2024-01-18 16:48:03.952 JST [34512] LOG:  database system is shut down

The attached script requires that sudo is executable. And there's
another point to note. The script attempts to establish a replication
connection to $primary_address:$primary_port. To packet-filter can
work, it must be a remote address that is accessible when no
packet-filter setting is set up.  The firewall-cmd setting, need to be
configured to block this connection. If simply an inaccessible IP
address is set, the process will fail immediately with a "No route to
host" error before the first packet is sent out, and it will not be
blocked as intended.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /bin/perl

use Cwd;

# This IP address must be a valid and accessible remote address,
# otherwise replication connection will immediately fail with 'No
# route to host', while we want to wait for replication connection to
# complete.
$primary_addr = '192.168.56.1';
$primary_port = 5432;
$fwzone = 'public';
$rootdir = cwd();
$standby_dir = "$rootdir/standby";
$standby_port = 5432;
$standby_logfile= "standby.log";
$rich_rule = "'rule family=\"ipv4\" destination address=\"$primary_addr\" port 
port=\"$primary_port\" protocol=\"tcp\" drop'";
$add_cmd = "sudo firewall-cmd --zone=$fwzone --add-rich-rule=$rich_rule";
$del_cmd = "sudo firewall-cmd --zone=$fwzone --remove-rich-rule=$rich_rule";

# Remove old entities
system('killall -9 postgres');
system("rm -rf $standby_dir $standby_logfile");

# Setup packet drop 
system($add_cmd) == 0 or die "failed to exec \"$add_cmd\": $!\n";

# Setup a standby.
system("initdb -D $standby_dir -c primary_conninfo='host=$primary_addr 
port=$primary_port'");
system("touch $standby_dir/standby.signal");

# Start it.
system("pg_ctl start -D $standby_dir -o \"-p $standby_port\" -l standby.log");
sleep 1;

# Try promoting standby, waiting for 10 seconds.
system("pg_ctl promote -t 10 -D $standby_dir");

# Stop servers
system("pg_ctl stop -m i -D $standby_dir");

# Remove packet-drop setting
system($del_cmd) == 0 or die "failed to exec \"$del_cmd\": $!\n";

Re: initdb's -c option behaves wrong way?

2024-01-17 Thread Kyotaro Horiguchi

Thank you for upicking this up.

At Wed, 17 Jan 2024 23:47:41 +0100, Daniel Gustafsson  wrote 
in 
> > On 17 Jan 2024, at 21:33, Tom Lane  wrote:
> > 
> > I wrote:
> >> However ... I don't like the patch much.  It seems to have left
> >> the code in a rather random state.  Why, for example, didn't you
> >> keep all the code that constructs the "newline" value together?
> > 
> > After thinking about it a bit more, I don't see why you didn't just
> > s/strncmp/strncasecmp/ and call it good.  The messiness seems to be
> > a result of your choice to replace the GUC's case as shown in the
> > file with the case used on the command line, which is not better
> > IMO.  We don't change our mind about the canonical spelling of a
> > GUC because somebody varied the case in a SET command.
> 
> The original patch was preserving the case of the file throwing away the case
> from the commandline.  The attached is a slimmed down version which only moves
> the assembly of newline to the different cases (replace vs.  new) keeping the
> rest of the code intact.  Keeping it in one place gets sort of messy too since
> it needs to use different values for a replacement and a new variable.  Not
> sure if there is a cleaner approach?

Just to clarify, the current code breaks out after processing the
first matching line. I haven't changed that behavior.  The reason I
moved the rewrite processing code out of the loop was I wanted to
avoid adding more lines that are executed only once into the
loop. However, it is in exchange of additional complexity to remember
the original spelling of the parameter name. Personally, I believe
separating the search and rewrite processing is better, but I'm not
particularly attached to the approach, so either way is fine with me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-16 Thread Kyotaro Horiguchi

At Wed, 17 Jan 2024 14:38:54 +0900, torikoshia  
wrote in 
> Hi,
> 
> Thanks for applying!
> 
> > + errmsg_plural("%zd row were skipped due to data type
> > incompatibility",
> 
> Sorry, I just noticed it, but 'were' should be 'was' here?
> 
> >> BTW I'm thinking we should add a column to pg_stat_progress_copy that
> >> counts soft errors. I'll suggest this in another thread.
> > Please do!
> 
> I've started it here:
> 
> https://www.postgresql.org/message-id/d12fd8c99adcae2744212cb23feff...@oss.nttdata.com

Switching topics, this commit (9e2d870119) adds the following help message:


>   "COPY { %s [ ( %s [, ...] ) ] | ( %s ) }\n"
>   "TO { '%s' | PROGRAM '%s' | STDOUT }\n"
> ...
>   "SAVE_ERROR_TO '%s'\n"
> ...
>   _("location"),

On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
indicate "immediately error out" and 'just ignore the failure'
respectively, but these options hardly seem to denote a 'location',
and appear more like an 'action'. I somewhat suspect that this
parameter name intially conceived with the assupmtion that it would
take file names or similar parameters. I'm not sure if others will
agree, but I think the parameter name might not be the best
choice. For instance, considering the addition of the third value
'log', something like on_error_action (error, ignore, log) would be
more intuitively understandable. What do you think?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Make mesage at end-of-recovery less scary.

2024-01-15 Thread Kyotaro Horiguchi

Thank you for the comments.

At Fri, 12 Jan 2024 15:03:26 +0300, Aleksander Alekseev 
 wrote in 
> ```
> +p = (char *) record;
> +pe = p + XLOG_BLCKSZ - (RecPtr & (XLOG_BLCKSZ - 1));
> +
> +while (p < pe && *p == 0)
> +p++;
> +
> +if (p == pe)
> ```
> 
> Just as a random thought: perhaps we should make this a separate
> function, as a part of src/port/. It seems to me that this code could
> benefit from using vector instructions some day, similarly to
> memcmp(), memset() etc. Surprisingly there doesn't seem to be a
> standard C function for this. Alternatively one could argue that one
> cycle doesn't make much code to reuse and that the C compiler will
> place SIMD instructions for us. However a counter-counter argument
> would be that we could use a macro or even better an inline function
> and have the same effect except getting a slightly more readable code.

Creating a function with a name like memcmp_byte() should be
straightforward, but implementing it with SIMD right away seems a bit
challenging. Similar operations are already being performed elsewhere
in the code, probably within the stats collector, where memcmp is used
with a statically allocated area that's filled with zeros. If we can
achieve a performance equivalent to memcmp with this new function,
then it definitely seems worth pursuing.

> ```
> - * This is just a convenience subroutine to avoid duplicated code in
> + * This is just a convenience subroutine to avoid duplicate code in
> ```
> 
> This change doesn't seem to be related to the patch. Personally I
> don't mind it though.

Ah, I'm sorry. That was something I mistakenly thought I had written
at the last moment and made modifications to.

> All in all I find v28 somewhat scary. It does much more than "making
> one message less scary" as it was initially intended and what bugged
> me personally, and accordingly touches many more places including
> xlogreader.c, xlogrecovery.c, etc.
> 
> Particularly I have mixed feeling about this:
> 
> ```
> +/*
> + * Consider it as end-of-WAL if all subsequent bytes of this page
> + * are zero. We don't bother checking the subsequent pages since
> + * they are not zeroed in the case of recycled segments.
> + */
> ```
> 
> If I understand correctly, if somehow several FS blocks end up being
> zeroed (due to OS bug, bit rot, restoring from a corrupted for
> whatever reason backup, hardware failures, ...) there is non-zero
> chance that PG will interpret this as a normal situation. To my
> knowledge this is not what we typically do - typically PG would report
> an error and ask a human to figure out what happened. Of course the
> possibility of such a scenario is small, but I don't think that as
> DBMS developers we can ignore it.

For now, let me explain the basis for this patch. The fundamental
issue is that these warnings that always appear are, in practice, not
a problem in almost all cases. Some of those who encounter them for
the first time may feel uneasy and reach out with inquiries. On the
other hand, those familiar with these warnings tend to ignore them and
only pay attention to details when actual issues arise. Therefore, the
intention of this patch is to label them as "no issue" unless a
problem is blatantly evident, in order to prevent unnecessary concern.

> Does anyone agree or maybe I'm making things up?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Error "initial slot snapshot too large" in create replication slot

2024-01-15 Thread Kyotaro Horiguchi

At Fri, 12 Jan 2024 11:28:09 -0500, Robert Haas  wrote 
in 
> However, I wonder whether this whole area is in need of a bigger
> rethink. There seem to be a number of situations in which the split
> into xip and subxip arrays is not very convenient, and also some
> situations where it's quite important. Sometimes we want to record
> what's committed, and sometimes what isn't. It's all a bit messy and
> inconsistent. The necessity of limiting snapshot size is annoying,
> too. I have no real idea what can be done about all of this, but what
> strikes me is that the current system has grown up incrementally: we
> started with a data structure designed for the original use case, and
> now by gradually adding new use cases things have gotten complicated.
> If we were designing things over from scratch, maybe we'd do it
> differently and end up with something less messy. And maybe someone
> can imagine a redesign that is likewise less messy.
> 
> But on the other hand, maybe not. Perhaps we can't really do any
> better than what we have. Then the question becomes whether this case
> is important enough to justify additional code complexity. I don't
> think I've personally seen users run into this problem so I have no
> special reason to think that it's important, but if it's causing
> issues for other people then maybe it is.

Thank you for the deep insights. I have understood your points. As I
can't think of any further simple modifications on this line, I will
withdraw this CF entry. At the moment, I also lack a fundamental,
comprehensive solution, but should if I or anyone else come up with
such a solution in the future, I believe it would worth a separate
discussion.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: In-placre persistance change of a relation

2024-01-14 Thread Kyotaro Horiguchi

At Tue, 9 Jan 2024 15:07:20 +0530, vignesh C  wrote in 
> CFBot shows compilation issues at [1] with:

Thanks!

The reason for those errors was that I didn't consider Meson at the
time. Additionally, the signature change of reindex_index() caused the
build failure. I fixed both issues. While addressing these issues, I
modified the simpleundolog module to honor
wal_sync_method. Previously, the sync method for undo logs was
determined independently, separate from xlog.c. However, I'm still not
satisfied with the method for handling PG_O_DIRECT.

In this version, I have added the changes to enable the use of
wal_sync_method outside of xlog.c as the first part of the patchset.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 40749357f24adf89dc79db9b34f5c053288489bb Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 15 Jan 2024 15:57:53 +0900
Subject: [PATCH v31 1/3] Export wal_sync_method related functions

Export several functions related to wal_sync_method for use in
subsequent commits. Since PG_O_DIRECT cannot be used in those commits,
the new function XLogGetSyncBit() will mask PG_O_DIRECT.
---
 src/backend/access/transam/xlog.c | 73 +--
 src/include/access/xlog.h |  2 +
 2 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 478377c4a2..c5f51849ee 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8403,21 +8403,29 @@ assign_wal_sync_method(int new_wal_sync_method, void *extra)
 	}
 }
 
+/*
+ * Exported version of get_sync_bit()
+ *
+ * Do not expose PG_O_DIRECT for uses outside xlog.c.
+ */
+int
+XLogGetSyncBit(void)
+{
+	return get_sync_bit(wal_sync_method) & ~PG_O_DIRECT;
+}
+
 
 /*
- * Issue appropriate kind of fsync (if any) for an XLOG output file.
+ * Issue appropriate kind of fsync (if any) according to wal_sync_method.
  *
- * 'fd' is a file descriptor for the XLOG file to be fsync'd.
- * 'segno' is for error reporting purposes.
+ * 'fd' is a file descriptor for the file to be fsync'd.
  */
-void
-issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
+const char *
+XLogFsyncFile(int fd)
 {
-	char	   *msg = NULL;
+	const char *msg = NULL;
 	instr_time	start;
 
-	Assert(tli != 0);
-
 	/*
 	 * Quick exit if fsync is disabled or write() has already synced the WAL
 	 * file.
@@ -8425,7 +8433,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	if (!enableFsync ||
 		wal_sync_method == WAL_SYNC_METHOD_OPEN ||
 		wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC)
-		return;
+		return NULL;
 
 	/* Measure I/O timing to sync the WAL file */
 	if (track_wal_io_timing)
@@ -8460,19 +8468,6 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 			break;
 	}
 
-	/* PANIC if failed to fsync */
-	if (msg)
-	{
-		char		xlogfname[MAXFNAMELEN];
-		int			save_errno = errno;
-
-		XLogFileName(xlogfname, tli, segno, wal_segment_size);
-		errno = save_errno;
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg(msg, xlogfname)));
-	}
-
 	pgstat_report_wait_end();
 
 	/*
@@ -8486,7 +8481,39 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 		INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
 	}
 
-	PendingWalStats.wal_sync++;
+	if (msg != NULL)
+		PendingWalStats.wal_sync++;
+
+	return msg;
+}
+
+/*
+ * Issue appropriate kind of fsync (if any) for an XLOG output file.
+ *
+ * 'fd' is a file descriptor for the XLOG file to be fsync'd.
+ * 'segno' is for error reporting purposes.
+ */
+void
+issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
+{
+	const char	*msg;
+	
+	Assert(tli != 0);
+
+	msg = XLogFsyncFile(fd);
+
+	/* PANIC if failed to fsync */
+	if (msg)
+	{
+		char		xlogfname[MAXFNAMELEN];
+		int			save_errno = errno;
+
+		XLogFileName(xlogfname, tli, segno, wal_segment_size);
+		errno = save_errno;
+		ereport(PANIC,
+(errcode_for_file_access(),
+ errmsg(msg, xlogfname)));
+	}
 }
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 301c5fa11f..2a0d65b537 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -217,6 +217,8 @@ extern void xlog_redo(struct XLogReaderState *record);
 extern void xlog_desc(StringInfo buf, struct XLogReaderState *record);
 extern const char *xlog_identify(uint8 info);
 
+extern int XLogGetSyncBit(void);
+extern const char *XLogFsyncFile(int fd);
 extern void issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli);
 
 extern bool RecoveryInProgress(void);
-- 
2.39.3

>From 5c120b94c407b971485ab52133399305e5e81a88 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 31 Aug 2023 11:49:10 +0900
Subject: [PATCH v31 2/3] Introduce undo log implementation

This patch adds a simple implementation of UNDO log feature.
---
 src/backend/access/transam/Makefile|   1 +
 src/backend/access/transam/meson.build |   1 +
 src/backend/access/transam/rmgr.c

Re: initdb --no-locale=C doesn't work as specified when the environment is not C

2024-01-11 Thread Kyotaro Horiguchi

At Wed, 10 Jan 2024 18:16:03 -0500, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > It seems somewhat intentional that only lc_messages references the
> > environment at boot time. On the other hand, previously, in the
> > absence of a specified locale, initdb would embed the environmental
> > value in the configuration file, as it seems to be documented. Given
> > that initdb is always used for cluster creation, it's unlikey that
> > systems depend on this boot-time default for their operation.
> 
> Yeah, after further reflection there doesn't seem to be a lot of value
> in leaving these entries commented-out, even in the cases where that's
> technically correct.  Let's just go back to the old behavior of always
> uncommenting them; that stood for years without complaints.  So I
> committed your latest patch as-is.

I'm glad you understand. Thank you for commiting.

> > By the way, the lines around lc_* in the sample file seem to have
> > somewhat inconsistent indentations. Wouldnt' it be preferable to fix
> > this? (The attached doesn't that.)
> 
> They look all right if you assume the tab width is 8, which seems to
> be what is used elsewhere in the file.  I think there's been some
> prior discussion about whether to ban use of tabs at all in these
> sample files, so as to reduce confusion about how wide the tabs are.
> But I'm not touching that question today.

Ah, I see, I understood.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Error "initial slot snapshot too large" in create replication slot

2024-01-11 Thread Kyotaro Horiguchi

Thank you for the comments. This will help move the discussion
forward.

At Fri, 5 Jan 2024 15:17:11 -0500, Robert Haas  wrote in 
> On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi
>  wrote:
> > [ new patch ]
> 
> Well, I guess nobody is too excited about fixing this, because it's
> been another 10 months with no discussion. Andres doesn't even seem to
> think this is as much a bug as it is a limitation, for all that it's
> filed in the CF application under bug fixes. I kind of wonder if we
> should just close this entry in the CF, but I'll hold off on that for
> now.

Perhaps you are correct. Ultimately, this issue is largely
theoretical, and I don't believe anyone would be inconvenienced by
imposing this contraint.

>   * note: all ids in xip[] satisfy xmin <= xip[i] < xmax
>   */
> 
> I have to say that I don't like this at all. It's bad enough that we
> already use the xip/subxip arrays in two different ways depending on
> the situation. Increasing that to three different ways seems painful.
> How is anyone supposed to keep track of how the array is being used at
> which point in the code?

I understand. So, summirizing the current state briefly, I believe it
as follows:

a. snapbuild lacks a means to differentiate between top and sub xids
  during snapshot building.

b. Abusing takenDuringRecovery could lead to potential issues, so it
  has been rejected.

c. Dynamic sizing of xip is likely to have a significant impact on
  performance (as mentioned in the comments of GetSnapshotData).

d. (new!) Adding a third storing method is not favored.

As a method to satisfy these prerequisites, I think one direction
could be to split takenDuringRecovery into flags indicating the
storage method and creation timing. I present this as a last-ditch
effort. It's a rough proposal, and further validation should be
necessary. If this direction is also not acceptable, I'll proceed with
removing the CF entry.

> If we are going to do that, I suspect it needs comment updates in more
> places than what the patch does currently. For instance:
> 
> + if (newxcnt < xiplen)
> + newxip[newxcnt++] = xid;
> + else
> + newsubxip[newsubxcnt++] = xid;
> 
> Just imagine coming across this code in 5 or 10 years and finding that
> it had no comment explaining anything. Yikes!

^^;

> Aside from the details of the patch, and perhaps more seriously, I'm
> not really clear that we have consensus on an approach. A few
> different proposals seem to have been floated, and it doesn't seem
> like anybody hates anybody else's idea completely, but it doesn't
> really seem like everyone agrees on what to do, either.

I don't fully agree with that.It's not so much that I dislike other
proposals, but rather that we haven't been able to find a definitive
solution that stands out.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From cd310711c1565b686416e8bf5a3a5105002e35c3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 12 Jan 2024 11:36:41 +0900
Subject: [PATCH v9] Lift initial snapshot limit for logical replication

The replication command CREATE_REPLICATION_SLOT often fails with
"snapshot too large" error when numerous subtransactions are
active. This issue stems from SnapBuildInitialSnapshot attempting to
generate a snapshot that includes all active XIDs, including subxids,
stored within the xip array. This patch mitigates the constraint by
utilizing the same storing method as takenDuringRecovery.
---
 src/backend/replication/logical/snapbuild.c | 22 +
 src/backend/storage/ipc/procarray.c |  1 +
 src/backend/utils/time/snapmgr.c| 15 +-
 src/include/utils/snapshot.h|  3 ++-
 4 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a0b7947d2f..528bdbb662 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -388,6 +388,7 @@ SnapBuildFreeSnapshot(Snapshot snap)
 	Assert(snap->curcid == FirstCommandId);
 	Assert(!snap->suboverflowed);
 	Assert(!snap->takenDuringRecovery);
+	Assert(!snap->mixed);
 	Assert(snap->regd_count == 0);
 
 	/* slightly more likely, so it's checked even without c-asserts */
@@ -464,6 +465,7 @@ SnapBuildSnapDecRefcount(Snapshot snap)
 	Assert(snap->curcid == FirstCommandId);
 	Assert(!snap->suboverflowed);
 	Assert(!snap->takenDuringRecovery);
+	Assert(!snap->mixed);
 
 	Assert(snap->regd_count == 0);
 
@@ -550,6 +552,7 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 
 	snapshot->suboverflowed = false;
 	snapshot->takenDuringRecovery = false;
+	snapshot->mixed = false;
 	snapshot->copied = false;
 	snapshot->curcid = FirstCommandId;
 	snapshot->active_count = 0;
@@ -572,8 +575,8 @@ SnapBuildInitia

Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

2024-01-11 Thread Kyotaro Horiguchi

Thanks for restarting this thread.

At Tue, 9 Jan 2024 09:40:23 +0900, Michael Paquier  wrote 
in 
> On Fri, Jan 05, 2024 at 02:58:55PM -0500, Robert Haas wrote:
> > I'm not a Windows expert, but my guess is that 0001 is a very good
> > idea. I hope someone who is a Windows expert will comment on that.
> 
> I am +1 on 0001.  It is just something we've never anticipated when
> these wrappers around cmd in pg_ctl were written.

Thanks for committing it.

> > 0002 seems problematic to me. One potential issue is that it would
> > break if someone renamed postgres.exe to something else -- although
> > that's probably not really a serious problem.
> 
> We do a find_other_exec_or_die() on "postgres" with what could be a
> custom execution path.  So we're actually sure that the binary will be
> there in the start path, no?  I don't like much the hardcoded
> dependency to .exe here.

The patch doesn't care of the path for postgres.exe. If you are referring to 
the code you cited below, it's for another reason. I'll describe that there.

> > A bigger issue is that
> > it seems like it would break if someone used pg_ctl to start several
> > instances in different data directories on the same machine. If I'm
> > understanding correctly, that currently mostly works, and this would
> > break it.
> 
> Not having the guarantee that a single shell_pid is associated to a
> single postgres.exe would be a problem.  Now the patch includes this
> code:
> + if (ppe.th32ParentProcessID == shell_pid &&
> + strcmp("postgres.exe", ppe.szExeFile) == 0)
> + {
> + if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
> + multiple_children = true;
> + pm_pid = ppe.th32ProcessID;
> + }
> 
> Which is basically giving this guarantee?  multiple_children should
> never happen once the autorun part is removed.  Is that right?

The patch indeed ensures the relationship between the parent
pg_ctl.exe and postgres.exe.  However, for some reason, in my Windows
11 environment with the /D option specified, I observed that another
cmd.exe is spawned as the second child process of the parent
cmd.exe. This is why there is a need to verify the executable file
name. I have no idea how the second cmd.exe is being spawned.

> +* The launcher shell might start other cmd.exe instances or programs
> +* besides postgres.exe. Veryfying the program file name is essential.
> 
> With the autorun part of cmd.exe removed, what's still relevant here?

Yes, if the strcmp() is commented out, multiple_children sometimes
becomes true..

> s/Veryfying/Verifying/.

Oops!

> Perhaps 0002 should make more efforts in documenting things like
> th32ProcessID and th32ParentProcessID.

Is it correct to understand that you are requesting changes as follows?

--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1995,11 +1995,14 @@ pgwin32_find_postmaster_pid(pid_t shell_pid)
 *
 * Check for duplicate processes to ensure reliability.
 *
-* The launcher shell might start other cmd.exe instances or programs
-* besides postgres.exe. Verifying the program file name is essential.
-*
-* The launcher shell process isn't checked in this function.  It will 
be
-* checked by the caller.
+* The ppe entry to be examined is identified by th32ParentProcessID, 
which
+* should correspond to the cmd.exe process that executes the 
postgres.exe
+* binary. Additionally, th32ProcessID in the same entry should be the 
PID
+* of the launched postgres.exe. However, even though we have launched 
the
+* parent cmd.exe with the /D option specified, it is sometimes observed
+* that another cmd.exe is launched for unknown reasons. Therefore, it 
is
+* crucial to verify the program file name to avoid returning the wrong
+    * PID.
 */


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


 
>From f9f2486f18502357fb76f2f1da00e19db40b2bc9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 24 Oct 2023 14:46:50 +0900
Subject: [PATCH v6 1/2] Improve pg_ctl postmaster process check on Windows

Currently pg_ctl on Windows does not verify that it actually executed
a postmaster process due to lack of process ID knowledge. This can
lead to false positives in cases where another pg_ctl instance starts
a different server simultaneously.
This patch adds the capability to identify the process ID of the
launched postmaster on Windows, similar to other OS versions, ensuring
more reliable detection of concurrent server startups.
---
 src/bin/pg_ctl/pg_ctl.c | 105 
 1 file changed, 96 insertions(+), 9

Re: Make mesage at end-of-recovery less scary.

2024-01-10 Thread Kyotaro Horiguchi

At Fri, 5 Jan 2024 16:02:24 +0530, vignesh C  wrote in 
> On Wed, 22 Nov 2023 at 13:01, Kyotaro Horiguchi  
> wrote:
> >
> > Anyway, this requires rebsaing, and done.
> 
> Few tests are failing at [1], kindly post an updated patch:

Thanks!

The errors occurred in a part of the tests for end-of-WAL detection
added in the master branch. These failures were primarily due to
changes in the message contents introduced by this patch. During the
revision, I discovered an issue with the handling of empty pages that
appear in the middle of reading continuation records. In the previous
version, such empty pages were mistakenly identified as indicating a
clean end-of-WAL (that is a LOG). However, they should actually be
handled as a WARNING, since the record curently being read is broken
at the empty pages. The following changes have been made in this
version:

1. Adjusting the test to align with the error message changes
  introduced by this patch.

2. Adding tests for the newly added messages.

3. Correcting the handling of empty pages encountered during the
  reading of continuation records. (XLogReaderValidatePageHeader)

4. Revising code comments.

5. Changing the term "log segment" to "WAL
  segment". (XLogReaderValidatePageHeader)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 47cc39a212d7fd6857f30c35c76bcdd0d26bbc3f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 7 Mar 2023 14:55:58 +0900
Subject: [PATCH v28] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is
happening. Actually if recovery meets a record with length = 0, that
usually means it finished applying all available WAL records.

Make this message less scary as "reached end of WAL". Instead, raise
the error level for other kind of WAL failure to WARNING.  To make
sure that the detection is correct, this patch checks if all trailing
bytes in the same page are zeroed in that case.
---
 src/backend/access/transam/xlogreader.c   | 147 ++
 src/backend/access/transam/xlogrecovery.c |  96 ++
 src/backend/replication/walreceiver.c |   7 +-
 src/bin/pg_waldump/pg_waldump.c   |  13 +-
 src/include/access/xlogreader.h   |   1 +
 src/test/recovery/t/035_recovery.pl   | 130 +++
 src/test/recovery/t/039_end_of_wal.pl |  47 +--
 7 files changed, 380 insertions(+), 61 deletions(-)
 create mode 100644 src/test/recovery/t/035_recovery.pl

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 7190156f2f..94861969eb 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -48,6 +48,8 @@ static int	ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
 			 int reqLen);
 static void XLogReaderInvalReadState(XLogReaderState *state);
 static XLogPageReadResult XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking);
+static bool ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr,
+  XLogRecord *record);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
   XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
@@ -149,6 +151,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir,
 		pfree(state);
 		return NULL;
 	}
+	state->EndOfWAL = false;
 	state->errormsg_buf[0] = '\0';
 
 	/*
@@ -553,6 +556,7 @@ XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking)
 	/* reset error state */
 	state->errormsg_buf[0] = '\0';
 	decoded = NULL;
+	state->EndOfWAL = false;
 
 	state->abortedRecPtr = InvalidXLogRecPtr;
 	state->missingContrecPtr = InvalidXLogRecPtr;
@@ -636,16 +640,12 @@ restart:
 	Assert(pageHeaderSize <= readOff);
 
 	/*
-	 * Read the record length.
+	 * Verify the record header.
 	 *
 	 * NB: Even though we use an XLogRecord pointer here, the whole record
-	 * header might not fit on this page. xl_tot_len is the first field of the
-	 * struct, so it must be on this page (the records are MAXALIGNed), but we
-	 * cannot access any other fields until we've verified that we got the
-	 * whole header.
+	 * header might not fit on this page.
 	 */
 	record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
-	total_len = record->xl_tot_len;
 
 	/*
 	 * If the whole record header is on this page, validate it immediately.
@@ -664,19 +664,19 @@ restart:
 	}
 	else
 	{
-		/* There may be no next page if it's too small. */
-		if (total_len < SizeOfXLogRecord)
-		{
-			report_invalid_record(state,
-  "invalid record length at %X/%X: expected at least %u, got %u",
-  LSN_FORMAT_ARGS(RecPtr),
-  (uint32) SizeOfXLogRecord, total_len);
+		/*
+		 * xl_tot_len is

Network failure may prevent promotion

2023-12-31 Thread Kyotaro Horiguchi

(Apology for resubmitting due to poor subject of the previous mail)
---
Hello.

We've noticed that when walreceiver is waiting for a connection to
complete, standby does not immediately respond to promotion
requests. In PG14, upon receiving a promotion request, walreceiver
terminates instantly, but in PG16, it waits for connection
timeout. This behavior is attributed to commit 728f86fec65, where a
part of libpqrcv_connect was simply replaced with a call to
libpqsrc_connect_params. This behavior can be verified by simply
dropping packets from the standby to the primary.

By a simple thought, in walreceiver, libpqsrv_connect_internal could
just call ProcessWalRcvInterrupts() instead of CHECK_FOR_INTERRUPTS(),
but this approach is quite ugly. Since ProcessWalRcvInterrupts()
originally calls CHECK_FOR_INTERRUPTS() and there are no standalone
calls to CHECK_FOR_INTERRUPTS() within walreceiver, I think it might
be better to use ProcDiePending instead of ShutdownRequestPending.  I
added a subset function of die() as the SIGTERM handler in walsender
in a crude patch attached.

What do you think about the issue, and the approach?

If there are no issues or objections with this method, I will continue
to refine this patch. For now, I plan to register it for the upcoming
commitfest.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

libpqsrv_connect_params should call ProcessWalRcvInterrupts

2023-12-31 Thread Kyotaro Horiguchi

Hello.

We've noticed that when walreceiver is waiting for a connection to
complete, standby does not immediately respond to promotion
requests. In PG14, upon receiving a promotion request, walreceiver
terminates instantly, but in PG16, it waits for connection
timeout. This behavior is attributed to commit 728f86fec65, where a
part of libpqrcv_connect was simply replaced with a call to
libpqsrc_connect_params. This behavior can be verified by simply
dropping packets from the standby to the primary.

By a simple thought, in walreceiver, libpqsrv_connect_internal could
just call ProcessWalRcvInterrupts() instead of CHECK_FOR_INTERRUPTS(),
but this approach is quite ugly. Since ProcessWalRcvInterrupts()
originally calls CHECK_FOR_INTERRUPTS() and there are no standalone
calls to CHECK_FOR_INTERRUPTS() within walreceiver, I think it might
be better to use ProcDiePending instead of ShutdownRequestPending.  I
added a subset function of die() as the SIGTERM handler in walsender
in a crude patch attached.

What do you think about the issue, and the approach?

If there are no issues or objections with this method, I will continue
to refine this patch. For now, I plan to register it for the upcoming
commitfest.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 693b3669ba..e503799bd8 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -738,7 +738,7 @@ libpqrcv_PQgetResult(PGconn *streamConn)
 		if (rc & WL_LATCH_SET)
 		{
 			ResetLatch(MyLatch);
-			ProcessWalRcvInterrupts();
+			CHECK_FOR_INTERRUPTS();
 		}
 
 		/* Consume whatever data is available from the socket */
@@ -1042,7 +1042,7 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 	{
 		char	   *cstrs[MaxTupleAttributeNumber];
 
-		ProcessWalRcvInterrupts();
+		CHECK_FOR_INTERRUPTS();
 
 		/* Do the allocations in temporary context. */
 		oldcontext = MemoryContextSwitchTo(rowcontext);
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 26ded928a7..c53a8e6c89 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -147,39 +147,34 @@ static void XLogWalRcvSendReply(bool force, bool requestReply);
 static void XLogWalRcvSendHSFeedback(bool immed);
 static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime);
 static void WalRcvComputeNextWakeup(WalRcvWakeupReason reason, TimestampTz now);
+static void WalRcvShutdownSignalHandler(SIGNAL_ARGS);
 
-/*
- * Process any interrupts the walreceiver process may have received.
- * This should be called any time the process's latch has become set.
- *
- * Currently, only SIGTERM is of interest.  We can't just exit(1) within the
- * SIGTERM signal handler, because the signal might arrive in the middle of
- * some critical operation, like while we're holding a spinlock.  Instead, the
- * signal handler sets a flag variable as well as setting the process's latch.
- * We must check the flag (by calling ProcessWalRcvInterrupts) anytime the
- * latch has become set.  Operations that could block for a long time, such as
- * reading from a remote server, must pay attention to the latch too; see
- * libpqrcv_PQgetResult for example.
- */
 void
-ProcessWalRcvInterrupts(void)
+WalRcvShutdownSignalHandler(SIGNAL_ARGS)
 {
-	/*
-	 * Although walreceiver interrupt handling doesn't use the same scheme as
-	 * regular backends, call CHECK_FOR_INTERRUPTS() to make sure we receive
-	 * any incoming signals on Win32, and also to make sure we process any
-	 * barrier events.
-	 */
-	CHECK_FOR_INTERRUPTS();
+	int			save_errno = errno;
 
-	if (ShutdownRequestPending)
+	/* Don't joggle the elbow of proc_exit */
+	if (!proc_exit_inprogress)
 	{
-		ereport(FATAL,
-(errcode(ERRCODE_ADMIN_SHUTDOWN),
- errmsg("terminating walreceiver process due to administrator command")));
+		InterruptPending = true;
+		ProcDiePending = true;
 	}
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+	
 }
 
+/*
+ * Is current process a wal receiver?
+ */
+bool
+IsWalReceiver(void)
+{
+	return WalRcv != NULL;
+}
 
 /* Main entry point for walreceiver process */
 void
@@ -277,7 +272,7 @@ WalReceiverMain(void)
 	pqsignal(SIGHUP, SignalHandlerForConfigReload); /* set flag to read config
 	 * file */
 	pqsignal(SIGINT, SIG_IGN);
-	pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request shutdown */
+	pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown */
 	/* SIGQUIT handler was already set up by InitPostmasterChild */
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
@@ -456,7 +451,7 @@ WalReceiverMain(void)
 			 errmsg("cannot continue WAL streaming, recovery has already ended")));
 
 /* Process any requests or s

Re: A typo in a messsage?

2023-12-31 Thread Kyotaro Horiguchi

At Wed, 27 Dec 2023 13:34:50 +0700, John Naylor  wrote 
in 
> > Shouldn't the "is" following "LSN" be "in"?
> 
> Pushed.

At Wed, 27 Dec 2023 13:35:24 +0700, John Naylor  wrote 
in 
>> I think "crc" should be in all uppercase in general and a brief
>> grep'ing told me that it is almost always or consistently used in
>> uppercase in our tree.
> I pushed this also.

Thanks for pushing them!

regads.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_basebackup has an accidentaly separated help message

2023-12-31 Thread Kyotaro Horiguchi

At Tue, 26 Dec 2023 19:04:53 +0900, Michael Paquier  wrote 
in 
> On Mon, Dec 25, 2023 at 05:07:28PM +0900, Kyotaro Horiguchi wrote:
> > Yes. So, it turns out that they're found after they have been
> > committed.
> 
> No problem.  I've just applied what you had.  I hope this makes your
> life a bit easier ;)

Thanks for committing this!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_basebackup has an accidentaly separated help message

2023-12-25 Thread Kyotaro Horiguchi

At Mon, 25 Dec 2023 15:42:41 +0900, Michael Paquier  wrote 
in 
> On Mon, Dec 25, 2023 at 02:39:16PM +0900, Kyotaro Horiguchi wrote:
> > The attached patch contains both of the above fixes.
> 
> Good catches, let's fix them.  You have noticed that while translating
> these new messages, I guess?

Yes. So, it turns out that they're found after they have been
committed.

Because handling a large volume of translations all at once is
daunting, I am maintaining translations locally to avoid that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Should "CRC" be in uppercase?

2023-12-24 Thread Kyotaro Horiguchi

A new function check_control_file() in pg_combinebackup.c has the
following message.

>   pg_fatal("%s: crc is incorrect", controlpath);

I think "crc" should be in all uppercase in general and a brief
grep'ing told me that it is almost always or consistently used in
uppercase in our tree.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
index b6ae6f2aef..049c60cbf8 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -538,7 +538,7 @@ check_control_files(int n_backups, char **backup_dirs)
 
 		/* Control file contents not meaningful if CRC is bad. */
 		if (!crc_ok)
-			pg_fatal("%s: crc is incorrect", controlpath);
+			pg_fatal("%s: CRC is incorrect", controlpath);
 
 		/* Can't interpret control file if not current version. */
 		if (control_file->pg_control_version != PG_CONTROL_VERSION)

Re: pg_basebackup has an accidentaly separated help message

2023-12-24 Thread Kyotaro Horiguchi

> > printf(_("  -i, --incremental=OLDMANIFEST\n"));
> > printf(_(" take incremental backup\n"));
> 
> I'd suggest merging these lines as follows (and the attached patch).
> 
> > +   printf(_("  -i, --incremental=OLDMANIFEST\n"
> > +" take incremental backup\n"));

Sorry, but I found another instance of this.

>   printf(_("  -T, --tablespace-mapping=OLDDIR=NEWDIR\n"));
>   printf(_("relocate tablespace in OLDDIR to 
> NEWDIR\n"));

The attached patch contains both of the above fixes.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 5795b91261..28d2ee435b 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -396,8 +396,8 @@ usage(void)
 	printf(_("\nOptions controlling the output:\n"));
 	printf(_("  -D, --pgdata=DIRECTORY receive base backup into directory\n"));
 	printf(_("  -F, --format=p|t   output format (plain (default), tar)\n"));
-	printf(_("  -i, --incremental=OLDMANIFEST\n"));
-	printf(_(" take incremental backup\n"));
+	printf(_("  -i, --incremental=OLDMANIFEST\n"
+			 " take incremental backup\n"));
 	printf(_("  -r, --max-rate=RATEmaximum transfer rate to transfer data directory\n"
 			 " (in kB/s, or use suffix \"k\" or \"M\")\n"));
 	printf(_("  -R, --write-recovery-conf\n"
diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
index b6ae6f2aef..49e97fcca8 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -669,8 +669,8 @@ help(const char *progname)
 	printf(_("  -n, --dry-run don't actually do anything\n"));
 	printf(_("  -N, --no-sync do not wait for changes to be written safely to disk\n"));
 	printf(_("  -o, --output  output directory\n"));
-	printf(_("  -T, --tablespace-mapping=OLDDIR=NEWDIR\n"));
-	printf(_("relocate tablespace in OLDDIR to NEWDIR\n"));
+	printf(_("  -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
+			 "relocate tablespace in OLDDIR to NEWDIR\n"));
 	printf(_("  --manifest-checksums=SHA{224,256,384,512}|CRC32C|NONE\n"
 			 "use algorithm for manifest checksums\n"));
 	printf(_("  --no-manifest suppress generation of backup manifest\n"));

pg_basebackup has an accidentaly separated help message

2023-12-24 Thread Kyotaro Horiguchi

Hello.

pg_basebackup.c: got the following message lines:

>   printf(_("  -i, --incremental=OLDMANIFEST\n"));
>   printf(_(" take incremental backup\n"));

I'd suggest merging these lines as follows (and the attached patch).

> + printf(_("  -i, --incremental=OLDMANIFEST\n"
> +  "     take incremental backup\n"));

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 5795b91261..28d2ee435b 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -396,8 +396,8 @@ usage(void)
 	printf(_("\nOptions controlling the output:\n"));
 	printf(_("  -D, --pgdata=DIRECTORY receive base backup into directory\n"));
 	printf(_("  -F, --format=p|t   output format (plain (default), tar)\n"));
-	printf(_("  -i, --incremental=OLDMANIFEST\n"));
-	printf(_(" take incremental backup\n"));
+	printf(_("  -i, --incremental=OLDMANIFEST\n"
+			 " take incremental backup\n"));
 	printf(_("  -r, --max-rate=RATEmaximum transfer rate to transfer data directory\n"
 			 " (in kB/s, or use suffix \"k\" or \"M\")\n"));
 	printf(_("  -R, --write-recovery-conf\n"

A typo in a messsage?

2023-12-21 Thread Kyotaro Horiguchi

I found the following message introduced by a recent commit.

> errdetail("The first unsummarized LSN is this range is %X/%X.",
 
Shouldn't the "is" following "LSN" be "in"?


diff --git a/src/backend/backup/basebackup_incremental.c 
b/src/backend/backup/basebackup_incremental.c
index 42bbe564e2..22b861ce52 100644
--- a/src/backend/backup/basebackup_incremental.c
+++ b/src/backend/backup/basebackup_incremental.c
@@ -575,7 +575,7 @@ PrepareForIncrementalBackup(IncrementalBackupInfo *ib,
tle->tli,

LSN_FORMAT_ARGS(tli_start_lsn),

LSN_FORMAT_ARGS(tli_end_lsn)),
-errdetail("The first 
unsummarized LSN is this range is %X/%X.",
+errdetail("The first 
unsummarized LSN in this range is %X/%X.",
   
LSN_FORMAT_ARGS(tli_missing_lsn;
}
 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: gai_strerror() is not thread-safe on Windows

2023-12-06 Thread Kyotaro Horiguchi

At Thu, 7 Dec 2023 09:43:37 +1300, Thomas Munro  wrote 
in 
> On Tue, Dec 5, 2023 at 3:43 PM Kyotaro Horiguchi
>  wrote:
> > Windows' gai_strerror outputs messages that correspond to the language
> > environment. Similarly, I think that the messages that the messages
> > returned by our version should be translatable.
> 
> Hmm, that is a good point.  Wow, POSIX has given us a terrible
> interface here, in terms of resource management.  Let's see what glibc
> does:
>
> https://github.com/lattera/glibc/blob/master/sysdeps/posix/gai_strerror.c
> https://github.com/lattera/glibc/blob/master/sysdeps/posix/gai_strerror-strs.h

It is quite a sight for sore eyes...

> It doesn't look like it knows about locales at all.  And a test
> program seems to confirm:
..
> setlocale(LC_MESSAGES, "ja_JP.UTF-8");
> printf("%s\n", gai_strerror(EAI_MEMORY));
> 
> That prints:
> 
> Memory allocation failure
> 
> FreeBSD tries harder, and prints:
> 
> メモリ割り当て失敗
> 
> We can see that it has a thread-local variable that holds a copy of
> that localised string until the next call to gai_strerror() in the
> same thread:
> 
> https://github.com/freebsd/freebsd-src/blob/main/lib/libc/net/gai_strerror.c
> https://github.com/freebsd/freebsd-src/blob/main/lib/libc/nls/ja_JP.UTF-8.msg
> 
> FreeBSD's message catalogues would provide a read-made source of
> translations, bu... hmm, if glibc doesn't bother and the POSIX
> interface is unhelpful and Windows' own implementation is so willfully
> unusable, I don't really feel inclined to build a whole thread-local
> cache thing on our side just to support this mess.

I agree, I wouldn't want to do it either.

> So I think we should just hard-code the error messages in English and
> move on.  However, English is my language so perhaps I should abstain
> and leave it to others to decide how important that is.

I also think that would be a good way.

> > These messages may add extra line-end periods to the parent (or
> > cotaining) messages when appended. This looks as follows.
> >
> > (auth.c:517 : errdetail_log() : sub (detail) message)
> > > Could not translate client host name "hoge" to IP address: An address 
> > > incompatible with the requested protocol was used..
> >
> > (hba.c:1562 : errmsg() : main message)
> > > invalid IP address "192.0.2.1": This is usually a temporary error during 
> > > hostname resolution and means that the local server did not receive a 
> > > response from an authoritative server.
> >
> > When I first saw the first version, I thought it would be better to
> > use Windows' own messages, just like you did. However, considering the
> > content of the message above, wouldn't it be better to adhere to
> > Linux-style messages overall?
> 
> Yeah, I agree that either the glibc or the FreeBSD messages would be
> better than those now that I've seen them.  They are short and sweet.
> 
> > A slightly subtler point is that the second example seems to have a
> > misalignment between the descriptions before and after the colon, but
> > do you think it's not something to be concerned about to this extent?
> 
> I didn't understand what you meant here.

If it was just a temporary error that couldn't be resolved, it doesn't
mean that the IP address is invalid. If such a cause is possible, then
probabyly an error message saying "failed to resolve" would be more
appropriate. However, I wrote it meaning that there is no need to go
to great length to ensure consistency with this message.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: gai_strerror() is not thread-safe on Windows

2023-12-04 Thread Kyotaro Horiguchi

At Tue, 5 Dec 2023 08:26:54 +1300, Thomas Munro  wrote 
in 
> On second thoughts, I guess it would make more sense to use the exact
> messages Windows' own implementation would return instead of whatever
> we had in the past (probably cribbed from some other OS or just made
> up?).  I asked CI to spit those out[1].  Updated patch attached.  Will
> add to CF.
> 
> [1] https://cirrus-ci.com/task/5816802207334400?logs=main#L15

Windows' gai_strerror outputs messages that correspond to the language
environment. Similarly, I think that the messages that the messages
returned by our version should be translatable.

These messages may add extra line-end periods to the parent (or
cotaining) messages when appended. This looks as follows.

(auth.c:517 : errdetail_log() : sub (detail) message)
> Could not translate client host name "hoge" to IP address: An address 
> incompatible with the requested protocol was used..

(hba.c:1562 : errmsg() : main message)
> invalid IP address "192.0.2.1": This is usually a temporary error during 
> hostname resolution and means that the local server did not receive a 
> response from an authoritative server.

When I first saw the first version, I thought it would be better to
use Windows' own messages, just like you did. However, considering the
content of the message above, wouldn't it be better to adhere to
Linux-style messages overall?

A slightly subtler point is that the second example seems to have a
misalignment between the descriptions before and after the colon, but
do you think it's not something to be concerned about to this extent?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: GUC names in messages

2023-11-29 Thread Kyotaro Horiguchi

At Thu, 30 Nov 2023 14:59:21 +0900, Michael Paquier  wrote 
in 
> > Patch 0003 now uses a "%s%s%s" format specifier with GUC_FORMAT macro
> > in guc.c, as recently suggested by Michael [1].
> 
> I cannot think about a better idea as these strings need to be
> translated so they need three %s.


In this patch, the quotation marks cannot be changed from double
quotes.

After a brief review of the use of quotation marks in various
languages, it's observed that French uses guillemets (« »), German
uses lower qutation marks („ “), Spanish uses angular quotation marks
(« ») or alternatively, lower quotetaion marks. Japanese commonly uses
corner brackets (「」), but can also adopt double or single quotation
marks in certain contexts. I took a look at the backend's fr.po file
for a trial, and it indeed seems that guillemets are being used.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pg_dump/nls.mk is missing a file

2023-11-29 Thread Kyotaro Horiguchi

Hello.

Upon reviewing my translation, I discovered that filter.c was not
included in the nls.mk of pg_dump. Additional it appears that two '.h'
files have been included for a long time, but they seem unnecessary as
their removal does not affect the results. The attached patch is
intended to correct these issues.

For Meson, on the other hand, I believe there is nothing in particular
that needs to be done.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_dump/nls.mk b/src/bin/pg_dump/nls.mk
index cd91737f48..ca7ce74275 100644
--- a/src/bin/pg_dump/nls.mk
+++ b/src/bin/pg_dump/nls.mk
@@ -15,13 +15,12 @@ GETTEXT_FILES= $(FRONTEND_COMMON_GETTEXT_FILES) \
compress_zstd.c \
pg_dump.c \
common.c \
+   filter.c \
pg_dump_sort.c \
pg_restore.c \
pg_dumpall.c \
parallel.c \
-   parallel.h \
pg_backup_utils.c \
-   pg_backup_utils.h \
../../common/compression.c \
../../common/exec.c \
../../common/fe_memutils.c \

Re: about help message for new pg_dump's --filter option

2023-11-29 Thread Kyotaro Horiguchi

At Thu, 30 Nov 2023 10:20:40 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Hello.
> 
> Recently, a new --filter option was added to pg_dump. I might be
> wrong, but the syntax of the help message for this feels off. Is the
> word 'on' not necessary after 'based'?
> 
> >  --filter=FILENAMEinclude or exclude objects and data from dump
> >   based expressions in FILENAME

Hmm. A similar message is spelled as "based on expression". Thus, the
attached correct this message in this line.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 57c6836b88..eea320b731 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1119,7 +1119,7 @@ help(const char *progname)
 			 "   including child and partition tables\n"));
 	printf(_("  --extra-float-digits=NUM override default setting for extra_float_digits\n"));
 	printf(_("  --filter=FILENAMEinclude or exclude objects and data from dump\n"
-			 "   based expressions in FILENAME\n"));
+			 "   based on expressions in FILENAME\n"));
 	printf(_("  --if-exists  use IF EXISTS when dropping objects\n"));
 	printf(_("  --include-foreign-data=PATTERN\n"
 			 "   include data of foreign tables on foreign\n"

Extra periods in pg_dump messages

2023-11-29 Thread Kyotaro Horiguchi

Sorry for the sequential mails.

In the bleeding-edge version of pg_dump, when a conditionspecifying an
index, for example, is described in an object filter file, the
following message is output. However, there is a period at the end of
the line. Shouldn't this be removed?

$ pg_dump --filter=/tmp/hoge.filter
pg_dump: error: invalid format in filter read from "/tmp/hoge.filter" on line 
1: include filter for "index" is not allowed.

The attached patch includes modifications related to the calls to
pg_log_filter_error().

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 57c6836b88..ce50566c3a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -18812,7 +18812,7 @@ read_dump_filters(const char *filename, DumpOptions *dopt)
 case FILTER_OBJECT_TYPE_TABLE_DATA:
 case FILTER_OBJECT_TYPE_TABLE_DATA_AND_CHILDREN:
 case FILTER_OBJECT_TYPE_TRIGGER:
-	pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."),
+	pg_log_filter_error(, _("%s filter for \"%s\" is not allowed"),
 		"include",
 		filter_object_type_name(objtype));
 	exit_nicely(1);
@@ -18851,7 +18851,7 @@ read_dump_filters(const char *filename, DumpOptions *dopt)
 case FILTER_OBJECT_TYPE_TRIGGER:
 case FILTER_OBJECT_TYPE_EXTENSION:
 case FILTER_OBJECT_TYPE_FOREIGN_DATA:
-	pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."),
+	pg_log_filter_error(, _("%s filter for \"%s\" is not allowed"),
 		"exclude",
 		filter_object_type_name(objtype));
 	exit_nicely(1);
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 1b974cf7e8..92389353a4 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -1969,7 +1969,7 @@ read_dumpall_filters(const char *filename, SimpleStringList *pattern)
 	{
 		if (comtype == FILTER_COMMAND_TYPE_INCLUDE)
 		{
-			pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."),
+			pg_log_filter_error(, _("%s filter for \"%s\" is not allowed"),
 "include",
 filter_object_type_name(objtype));
 			exit_nicely(1);
@@ -1989,7 +1989,7 @@ read_dumpall_filters(const char *filename, SimpleStringList *pattern)
 			case FILTER_OBJECT_TYPE_SCHEMA:
 			case FILTER_OBJECT_TYPE_TABLE:
 			case FILTER_OBJECT_TYPE_TABLE_AND_CHILDREN:
-pg_log_filter_error(, _("unsupported filter object."));
+pg_log_filter_error(, _("unsupported filter object"));
 exit_nicely(1);
 break;
 
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 1459e02263..c3beacdec1 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts)
 case FILTER_OBJECT_TYPE_DATABASE:
 case FILTER_OBJECT_TYPE_EXTENSION:
 case FILTER_OBJECT_TYPE_FOREIGN_DATA:
-	pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."),
+	pg_log_filter_error(, _("%s filter for \"%s\" is not allowed"),
 		"include",
 		filter_object_type_name(objtype));
 	exit_nicely(1);
@@ -581,7 +581,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts)
 case FILTER_OBJECT_TYPE_TABLE:
 case FILTER_OBJECT_TYPE_TABLE_AND_CHILDREN:
 case FILTER_OBJECT_TYPE_TRIGGER:
-	pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."),
+	pg_log_filter_error(, _("%s filter for \"%s\" is not allowed"),
 		"exclude",
 		filter_object_type_name(objtype));
 	exit_nicely(1);

about help message for new pg_dump's --filter option

2023-11-29 Thread Kyotaro Horiguchi

Hello.

Recently, a new --filter option was added to pg_dump. I might be
wrong, but the syntax of the help message for this feels off. Is the
word 'on' not necessary after 'based'?

>  --filter=FILENAMEinclude or exclude objects and data from dump
>   based expressions in FILENAME

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: initdb --no-locale=C doesn't work as specified when the environment is not C

2023-11-26 Thread Kyotaro Horiguchi

At Wed, 22 Nov 2023 11:04:01 -0500, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > Commit 3e51b278db leaves lc_* conf lines as commented-out when
> > their value is "C". This leads to the following behavior.
> 
> Hmm ... I see a contributing factor here: this bit in
> postgresql.conf.sample is a lie:
> 
> #lc_messages = 'C'# locale for system error message
>   # strings
> 
> A look in guc_tables.c shows that the actual default is '' (empty
> string), which means "use the environment", and that matches how the
> variable is documented in config.sgml.  Somebody --- quite possibly me
> --- was misled by the contents of postgresql.conf.sample into thinking
> that the lc_xxx GUCs all default to C, when that's only true for the
> others.

It seems somewhat intentional that only lc_messages references the
environment at boot time. On the other hand, previously, in the
absence of a specified locale, initdb would embed the environmental
value in the configuration file, as it seems to be documented. Given
that initdb is always used for cluster creation, it's unlikey that
systems depend on this boot-time default for their operation.

> I think that a more correct fix for this would treat lc_messages
> differently from the other lc_xxx GUCs.  Maybe just eliminate the
> hack about not substituting "C" for that one?

For example, the --no-locale option for initdb is supposed to set all
categories to 'C'. That approach would lead to the postgres
referencing the runtime environment for all categories except
lc_messages, which I believe contradicts the documentation. In my
biew, if lc_messages is exempted from that hack, then all other
categories should be similarly excluded as in the second approach
among the attached in the previous mail.

> In any case, we need to fix this mistake in postgresql.conf.sample.

If you are not particularly concerned about the presence of quotation
marks, I think it would be fine to go with the second approach and
make the necessary modification to the configuration file accordingly.

With the attached patch, initdb --no-locale generates the following
lines in the configuration file.

> lc_messages = C   # locale for system error 
> message
>   # strings
> lc_monetary = C   # locale for monetary formatting
> lc_numeric = C# locale for number formatting
> lc_time = C   # locale for time formatting

By the way, the lines around lc_* in the sample file seem to have
somewhat inconsistent indentations. Wouldnt' it be preferable to fix
this? (The attached doesn't that.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/misc/postgresql.conf.sample 
b/src/backend/utils/misc/postgresql.conf.sample
index e48c066a5b..133dd3da7d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -726,7 +726,7 @@
# encoding

 # These settings are initialized by initdb, but they can be changed.
-#lc_messages = 'C' # locale for system error message
+#lc_messages = ''  # locale for system error message
# strings
 #lc_monetary = 'C' # locale for monetary formatting
 #lc_numeric = 'C'  # locale for number formatting
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 0c6f5ceb0a..646e8f29f6 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1226,25 +1226,17 @@ setup_config(void)
conflines = replace_guc_value(conflines, "shared_buffers",
  repltok, 
false);

-   /*
-* Hack: don't replace the LC_XXX GUCs when their value is 'C', because
-* replace_guc_value will decide not to quote that, which looks strange.
-*/
-   if (strcmp(lc_messages, "C") != 0)
-   conflines = replace_guc_value(conflines, "lc_messages",
- 
lc_messages, false);
+   conflines = replace_guc_value(conflines, "lc_messages",
+ lc_messages, 
false);

-   if (strcmp(lc_monetary, "C") != 0)
-   conflines = replace_guc_value(conflines, "lc_monetary",
- 
lc_monetary, false);
+   conflines = replace_guc_value(conflines, "lc_monetary",
+

Re: Make mesage at end-of-recovery less scary.

2023-11-21 Thread Kyotaro Horiguchi

Anyway, this requires rebsaing, and done.

Thanks for John (Naylor) for pointing this out.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From e56f1f24523e3e562a4db166dfeaadc79fd7b27a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 7 Mar 2023 14:55:58 +0900
Subject: [PATCH v27] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is
happening. Actually if recovery meets a record with length = 0, that
usually means it finished applying all available WAL records.

Make this message less scary as "reached end of WAL". Instead, raise
the error level for other kind of WAL failure to WARNING.  To make
sure that the detection is correct, this patch checks if all trailing
bytes in the same page are zeroed in that case.
---
 src/backend/access/transam/xlogreader.c   | 134 ++
 src/backend/access/transam/xlogrecovery.c |  94 +++
 src/backend/replication/walreceiver.c |   7 +-
 src/bin/pg_waldump/pg_waldump.c   |  13 ++-
 src/include/access/xlogreader.h   |   1 +
 src/test/recovery/t/035_recovery.pl   | 130 +
 6 files changed, 327 insertions(+), 52 deletions(-)
 create mode 100644 src/test/recovery/t/035_recovery.pl

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index e0baa86bd3..ce65f99c60 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -48,6 +48,8 @@ static int	ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
 			 int reqLen);
 static void XLogReaderInvalReadState(XLogReaderState *state);
 static XLogPageReadResult XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking);
+static bool ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr,
+  XLogRecord *record);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
   XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
@@ -149,6 +151,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir,
 		pfree(state);
 		return NULL;
 	}
+	state->EndOfWAL = false;
 	state->errormsg_buf[0] = '\0';
 
 	/*
@@ -525,6 +528,7 @@ XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking)
 	/* reset error state */
 	state->errormsg_buf[0] = '\0';
 	decoded = NULL;
+	state->EndOfWAL = false;
 
 	state->abortedRecPtr = InvalidXLogRecPtr;
 	state->missingContrecPtr = InvalidXLogRecPtr;
@@ -608,16 +612,12 @@ restart:
 	Assert(pageHeaderSize <= readOff);
 
 	/*
-	 * Read the record length.
+	 * Verify the record header.
 	 *
 	 * NB: Even though we use an XLogRecord pointer here, the whole record
-	 * header might not fit on this page. xl_tot_len is the first field of the
-	 * struct, so it must be on this page (the records are MAXALIGNed), but we
-	 * cannot access any other fields until we've verified that we got the
-	 * whole header.
+	 * header might not fit on this page.
 	 */
 	record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
-	total_len = record->xl_tot_len;
 
 	/*
 	 * If the whole record header is on this page, validate it immediately.
@@ -636,19 +636,19 @@ restart:
 	}
 	else
 	{
-		/* There may be no next page if it's too small. */
-		if (total_len < SizeOfXLogRecord)
-		{
-			report_invalid_record(state,
-  "invalid record length at %X/%X: expected at least %u, got %u",
-  LSN_FORMAT_ARGS(RecPtr),
-  (uint32) SizeOfXLogRecord, total_len);
+		/*
+		 * xl_tot_len is the first field of the struct, so it must be on this
+		 * page (the records are MAXALIGNed), but we cannot access any other
+		 * fields until we've verified that we got the whole header.
+		 */
+		if (!ValidXLogRecordLength(state, RecPtr, record))
 			goto err;
-		}
-		/* We'll validate the header once we have the next page. */
+
 		gotheader = false;
 	}
 
+	total_len = record->xl_tot_len;
+
 	/*
 	 * Try to find space to decode this record, if we can do so without
 	 * calling palloc.  If we can't, we'll try again below after we've
@@ -1091,25 +1091,81 @@ XLogReaderInvalReadState(XLogReaderState *state)
 	state->readLen = 0;
 }
 
+/*
+ * Validate record length of an XLOG record header.
+ *
+ * This is substantially a part of ValidXLogRecordHeader.  But XLogReadRecord
+ * needs this separate from the function in case of a partial record header.
+ *
+ * Returns true if the xl_tot_len header field has a seemingly valid value,
+ * which means the caller can proceed reading to the following part of the
+ * record.
+ */
+static bool
+ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr,
+	  XLogRecord *record)
+{
+	if (record->xl_tot_len == 0)
+	{
+		char	   *p;
+		char	   *pe;
+
+		/*
+		 * We are almost

initdb --no-locale=C doesn't work as specified when the environment is not C

2023-11-21 Thread Kyotaro Horiguchi

Commit 3e51b278db leaves lc_* conf lines as commented-out when
their value is "C". This leads to the following behavior.

$ echo LANG
ja_JP.UTF8
$ initdb --no-locale hoge
$ grep lc_ hoge/postgresql.conf
#lc_messages = 'C'  # locale for system error message
#lc_monetary = 'C'  # locale for monetary formatting
#lc_numeric = 'C'   # locale for number formatting
#lc_time = 'C'  # locale for time formatting

In this scenario, the postmaster ends up emitting log massages in
Japanese, which contradicts the documentation.

https://www.postgresql.org/docs/devel/app-initdb.html

> --locale=locale 
>   Sets the default locale for the database cluster. If this option is
>   not specified, the locale is inherited from the environment that
>   initdb runs in. Locale support is described in Section 24.1.
> 
..
> --lc-messages=locale
>   Like --locale, but only sets the locale in the specified category.

Here's a somewhat amusing case:

$ echo LANG
ja_JP.UTF8
$ initdb --lc_messages=C
$ grep lc_ hoge/postgresql.conf 
#lc_messages = 'C'  # locale for system error message
lc_monetary = 'ja_JP.UTF8'  # locale for monetary formatting
lc_numeric = 'ja_JP.UTF8'   # locale for number formatting
lc_time = 'ja_JP.UTF8'  # locale for time formatting

Hmm. it seems that initdb replaces the values of all categories
*except the specified one*. This behavior seems incorrect to
me. initdb should replace the value when explicitly specified in the
command line. If you use -c lc_messages=C, it does perform the
expected behavior to some extent, but I believe this is a separate
matter.

I have doubts about not replacing these lines for purely cosmetic
reasons. In this mail, I've attached three possible solutions for the
original issue: the first one enforces replacement only when specified
on the command line, the second one simply always performs
replacement, and the last one addresses the concern about the absence
of quotes around "C" by allowing explicit specification. (FWIW, I
prefer the last one.)

What do you think about these?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 0c6f5ceb0a..56a8c5b60e 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -144,6 +144,10 @@ static char *lc_monetary = NULL;
 static char *lc_numeric = NULL;
 static char *lc_time = NULL;
 static char *lc_messages = NULL;
+static bool lc_monetary_specified = false;
+static bool lc_numeric_specified = false;
+static bool lc_time_specified = false;
+static bool lc_messages_specified = false;
 static char locale_provider = COLLPROVIDER_LIBC;
 static char *icu_locale = NULL;
 static char *icu_rules = NULL;
@@ -1230,19 +1234,19 @@ setup_config(void)
 * Hack: don't replace the LC_XXX GUCs when their value is 'C', because
 * replace_guc_value will decide not to quote that, which looks strange.
 */
-   if (strcmp(lc_messages, "C") != 0)
+   if (strcmp(lc_messages, "C") != 0 || lc_messages_specified)
conflines = replace_guc_value(conflines, "lc_messages",
  
lc_messages, false);
 
-   if (strcmp(lc_monetary, "C") != 0)
+   if (strcmp(lc_monetary, "C") != 0 || lc_monetary_specified)
conflines = replace_guc_value(conflines, "lc_monetary",
  
lc_monetary, false);
 
-   if (strcmp(lc_numeric, "C") != 0)
+   if (strcmp(lc_numeric, "C") != 0 || lc_numeric_specified)
conflines = replace_guc_value(conflines, "lc_numeric",
  
lc_numeric, false);
 
-   if (strcmp(lc_time, "C") != 0)
+   if (strcmp(lc_time, "C") != 0 || lc_time_specified)
conflines = replace_guc_value(conflines, "lc_time",
  
lc_time, false);
 
@@ -2374,6 +2378,15 @@ setlocales(void)
icu_locale = locale;
}
 
+   /*
+* At this point, null means that the value is not specifed in the 
command
+* line.
+*/
+   if (lc_numeric != NULL) lc_numeric_specified = true;
+   if (lc_time != NULL) lc_time_specified = true;
+   if (lc_monetary != NULL) lc_monetary_specified = true;
+   if (lc_messages != NULL) lc_messages_specified = true;
+
/*
 * canonicalize locale names, and obtain any missing values from our
 * current environment
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 0c6f5ceb0a..646e8f2

Re: A recent message added to pg_upgade

2023-11-08 Thread Kyotaro Horiguchi

At Thu, 9 Nov 2023 12:00:59 +0530, Amit Kapila  wrote 
in 
> I have also proposed that as one of the alternatives but didn't get
> many votes. And, I think if the user is passing a special value of
> max_slot_wal_keep_size during the upgrade, it has to be a special
> case, and rejecting it upfront doesn't seem unreasonable to me.

Oops. Sorry, and understood.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: A recent message added to pg_upgade

2023-11-08 Thread Kyotaro Horiguchi

At Thu, 9 Nov 2023 09:53:07 +0530, Amit Kapila  wrote 
in 
> Michael, Horiguchi-San, and others, do you have any thoughts on what
> is the best way to proceed?

As I previously mentioned, I believe that if rejection is to be the
course of action, it would be best to proceed with it sooner rather
than later. On the other hand, I am concerned about the need for users
to perform extra steps depending on the source cluster
conrfiguration. Therefore, another possible approach could be to
simply ignore the given settings in the assignment hook rather than
rejecting by the check hook, and forcibuly apply -1.

What do you think about this third approach?

I haven't checked this with pg_upgrade, but a standalone postmaster
would emit the following messages.

> postgres -b -c max_slot_wal_keep_size=-1
> LOG:  "max_slot_wal_keep_size" is foced to set to -1 during binary upgrade 
> mode.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index b541be8eec..319d4f5a81 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2063,6 +2063,31 @@ check_wal_segment_size(int *newval, void **extra, 
GucSource source)
return true;
 }
 
+/*
+ * GUC check_hook for max_slot_wal_keep_size
+ *
+ * If WALs needed by logical replication slots are deleted, these slots become
+ * inoperable. During a binary upgrade, pg_upgrade sets this variable to -1 via
+ * the command line in an attempt to prevent such deletions, but users have
+ * ways to override it. To ensure the successful completion of the upgrade,
+ * it's essential to keep this variable unaltered.  See
+ * InvalidatePossiblyObsoleteSlot() and start_postmaster() in pg_upgrade for
+ * more details.
+ */
+bool
+check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source)
+{
+   if (IsBinaryUpgrade && *newval != -1)
+   {
+   ereport(LOG,
+   errmsg("\"%s\" is foced to set to -1 during 
binary upgrade mode.",
+  "max_slot_wal_keep_size"));
+   *newval = -1;
+   }
+
+   return true;
+}
+
 /*
  * At a checkpoint, how many WAL segments to recycle as preallocated future
  * XLOG segments? Returns the highest segment that should be preallocated.
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 99823df3c7..5c3d2b1082 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1424,18 +1424,12 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
SpinLockRelease(>mutex);
 
/*
-* The logical replication slots shouldn't be invalidated as
-* max_slot_wal_keep_size GUC is set to -1 during the upgrade.
-*
-* The following is just a sanity check.
+* check_max_slot_wal_keep_size() ensures 
max_slot_wal_keep_size is set
+* to -1, so, slot invalidation for logical slots shouldn't 
happen
+* during an upgrade. At present, only logical slots really 
require
+* this.
 */
-   if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
-   {
-   ereport(ERROR,
-   
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-   errmsg("replication slots must not be 
invalidated during the upgrade"),
-   errhint("\"max_slot_wal_keep_size\" 
must be set to -1 during the upgrade"));
-   }
+   Assert (!(*invalidated && SlotIsLogical(s) && IsBinaryUpgrade));
 
if (active_pid != 0)
{
diff --git a/src/backend/utils/misc/guc_tables.c 
b/src/backend/utils/misc/guc_tables.c
index 7605eff9b9..818ffdb2ae 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2845,7 +2845,7 @@ struct config_int ConfigureNamesInt[] =
},
_slot_wal_keep_size_mb,
-1, -1, MAX_KILOBYTES,
-   NULL, NULL, NULL
+   check_max_slot_wal_keep_size, NULL, NULL
},
 
{
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 2a191830a8..3d74483f44 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -84,6 +84,8 @@ extern bool check_maintenance_io_concurrency(int *newval, 
void **extra,
 extern void assign_maintenance_io_concurrency(int newval, void *extra);
 extern bool check_max_connections(int *newval, void **extra, GucSource source);
 extern bool check_max_wal_senders(int *newval, void **extra, GucSource source);
+extern bool check_max_slot_wal_

Re: GUC names in messages

2023-11-08 Thread Kyotaro Horiguchi

At Thu, 9 Nov 2023 12:55:44 +1100, Peter Smith  wrote in 
> The most common pattern there is "You might need to increase %s.".
..
> Here is a patch to modify those other similar variations so they share
> that common wording.
> 
> PSA.

I'm uncertain whether the phrases "Consider doing something" and "You
might need to do something" are precisely interchangeable. However,
(for me..)  it appears that either phrase could be applied for all
messages that the patch touches.

In short, I'm fine with the patch.


By the way, I was left scratching my head after seeing the following
message.

>ereport(PANIC,
>(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
> -   errmsg("could not find free replication state, increase 
> max_replication_slots")));

Being told to increase max_replication_slots in a PANIC
message feels somewhat off to me. Just looking at the message, it
seems unconvincing to increase "slots" because there is a lack of
"state". So, I poked around in the code and found the following
comment:

> ReplicationOriginShmemSize(void)
> {
> ...
> /*
>  * XXX: max_replication_slots is arguably the wrong thing to use, as here
>  * we keep the replay state of *remote* transactions. But for now it seems
>  * sufficient to reuse it, rather than introduce a separate GUC.
>  */

I haven't read the related code, but if my understanding based on this
comment is correct, wouldn't it mean that a lack of storage space for
the state at the location outputting the message indicates a bug in
the program, not a user configuration error? In other words, isn't
this message something that at least shouldn't be a user-facing
message, and might it be more appropriate to use an Assert instead?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add new option 'all' to pg_stat_reset_shared()

2023-11-07 Thread Kyotaro Horiguchi

At Mon, 6 Nov 2023 14:00:13 +0530, Bharath Rupireddy 
 wrote in 
> On Mon, Nov 6, 2023 at 11:39 AM torikoshia  wrote:
> > Since each stats, except wal_prefetch was reset acquiring LWLock,
> > attached PoC patch makes the call atomic by using these LWlocks.
> >
> > If this is the right direction, I'll try to make wal_prefetch also take
> > LWLock.

I must say, I have reservations about this approach. The main concern
is the duplication of reset code, which has been efficiently
encapsulated for individual targets, into this location. This practice
degrades the maintainability and clarity of the code.

> Well, that's a total of ~17 LWLocks this new function takes to make
> the stats reset atomic. I'm not sure if this atomicity is worth the
> effort which can easily be misused - what if someone runs something
> like SELECT pg_stat_reset_shared() FROM generate_series(1,
> 10n); to cause heavy lock acquisition and release cycles?
...
> IMV, atomicity is not something that applies for the stats reset
> operation because stats are approximate numbers by nature after all.
> If the pg_stat_reset_shared() resets stats for only a bunch of stats
> types and fails, it's the basic application programming style that
> when a query fails it's the application that needs to have a retry
> mechanism. FWIW, the atomicity doesn't apply today if someone wants to
> reset stats in a loop for all stats types.

The infrequent use of this feature, coupled with the fact that there
is no inherent need for these counters to be reset simultaneoulsy,
leads me to think that there is little point in cnetralizing the
locks.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Intermittent failure with t/003_logical_slots.pl test on windows

2023-11-06 Thread Kyotaro Horiguchi

At Mon, 6 Nov 2023 19:42:21 +0530, Nisha Moond  wrote 
in 
> > Appending '2>&1 test:
> > The command still results in NULL and ends up failing as no data is
> > returned. Which means even no error message is returned. The error log

Thanks for confirmation. So, at least the child process was launced
successfully in the cmd.exe's view.

Upon a quick check on my end with Windows' _popen, I have obseved the
following:

- Once a child process is started, it seems to go undetected as an
  error by _popen or subsequent fgets calls if the process ends
  abnormally, with a non-zero exit status or even with a SEGV.

- After the child process has flushed data to stdout, it is possible
  to read from the pipe even if the child process crashes or ends
  thereafter.

- Even if fgets is called before the program starts, it will correctly
  block until the program outputs something. Specifically, when I used
  popen("sleep 5 & target.exe") and immediately performed fgets on the
  pipe, I was able to read the output of target.exe as the first line.

Therefore, based on the information available, it is conceivable that
the child process was killed by something right after it started, or
the program terminated on its own without any error messages.

By the way, in the case of aforementioned SEGV, Application Errors
corresponding to it were identifiable in the Event
Viewer. Additionally, regarding the exit statuses, they can be
captured by using a wrapper batch file (.bat) that records
%ERRORLEVEL% after running the target program. This may yield
insights, aothough its effectiveness is not guaranteed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: "box" type description

2023-11-02 Thread Kyotaro Horiguchi

At Wed, 1 Nov 2023 11:36:01 -0400, Bruce Momjian  wrote in 
> On Wed, Mar 31, 2021 at 01:43:47PM +0200, Christoph Berg wrote:
> > Re: Kyotaro Horiguchi
> > I like that because it points to the "point" syntax so users can
> > figure out how to spell a box.
> 
> I liked Horiguchi-san's patch from 2021, but once I started looking
> further, I found a number of improvements that can be made in the \dTS
> output beyond Horiguchi-san's changes:
> 
> *  boolean outputs 't'/'f', not 'true'/'false'
> *  Added "format" ... for output
> *  tid output format was at the start, not the end
> *  I didn't add space between point x,y because the output has no space
> *  I spelled out "point" instead of "pt"
> *  "line" has two very different input formats so I listed both
>(more different than others like boolean)
> *  I didn't see the need to say "datatype" for LSN and UUID
> *  I improved the txid_snapshot description
> *  There was no description for table_am_handler
> 
> Patch attached.

Thank you for continuing this. The additional changes looks
fine.

Upon reviewing the table again in this line, further potential
improvements and issues have been found. For example:

character, varchar: don't follow the rule.
- 'char(length)' blank-padded string, fixed storage length
+ blank-padded string, fixed storage length, format 'char(length)'

interval: doesn't follow the rule.
- @  , time interval
+ time interval, format '[@]  '
(I think '@' is not necessary here..)

pg_snapshot:

  The description given is just "snapshot", which seems overly simplistic.

txid_snapshot:

  The description reads "transaction snapshot". Is this really
  accurate, especially in contrast with pg_snapshot?

pg_brin_bloom_summary, pg_brin_minmax_multi_summary, pg_mcv_list and many:

I'm uncertain whether these types, which lack an input syntax (but
have an output format), qualify as pseudo-types.  Nevertheless, I
believe it would be beneficial to describe that those types differ
from ordinary types.

Should we consider refining these descriptions in the table?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Intermittent failure with t/003_logical_slots.pl test on windows

2023-11-02 Thread Kyotaro Horiguchi

At Tue, 31 Oct 2023 18:11:48 +0530, vignesh C  wrote in 
> Few others are also facing this problem with similar code like in:
> https://stackoverflow.com/questions/15882799/fgets-returning-error-for-file-returned-by-popen

I'm inclined to believe that the pipe won't enter the EOF state until
the target command terminates (then the top-level cmd.exe). The target
command likely terminated prematurely due to an error before priting
any output.

If we append "2>&1" to the command line, we can capture the error
message through the returned pipe if any. Such error messages will
cause the subsequent code to fail with an error such as "unexpected
string: 'the output'". I'm not sure, but if this is permissive, the
returned error messages could potentially provide insight into the
underlying issue, paving the way for a potential solution.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: A recent message added to pg_upgade

2023-11-01 Thread Kyotaro Horiguchi

Thanks you for the comments!

At Wed, 1 Nov 2023 18:08:19 +1100, Peter Smith  wrote in 
> Hi, here are some minor review comments for the v3 patch.
> 
> ==
> src/backend/access/transam/xlog.c

> I asked ChatGPT to suggest alternative wording for that comment, and
> it came up with something that I felt was a slight improvement.
> 
> SUGGESTION
> ...
> If WALs needed by logical replication slots are deleted, these slots
> become inoperable. During a binary upgrade, pg_upgrade sets this
> variable to -1 via the command line in an attempt to prevent such
> deletions, but users have ways to override it. To ensure the
> successful completion of the upgrade, it's essential to keep this
> variable unaltered.
> ...
> 
> ~~~

ChatGPT seems to tend to generate sentences in a slightly different
from our usual writing. While I tried to retain the original phrasing
in the patch, I don't mind using the suggested version.  Used as is.

> 2.

> + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set to -1
> during binary upgrade mode.");

> Some of the other GUC_check_errdetail()'s do not include the GUC name
> in the translatable message text. Isn't that a preferred style?

> SUGGESTION
> GUC_check_errdetail("\"%s\" must be set to -1 during binary upgrade mode.",
>   "max_slot_wal_keep_size");

I believe that that style was adopted to minimize translatable
messages by consolidting identical ones that only differ in variable
names.  I see both versions in the tree. I didn't find necessity to
adopt this approach for this specific message, especially since I'm
skeptical about adding new messages that end with "must be set to -1
during binary upgrade mode".  (pg_upgrade sets synchronous_commit,
fsync and full_page_writes to "off".)

However, some unique messages are in this style, so I'm fine with
using that style.  Revised accordingly.

> ==
> src/backend/replication/slot.c
> 
> 3. InvalidatePossiblyObsoleteSlot

> + Assert (!*invalidated || !SlotIsLogical(s) || !IsBinaryUpgrade);
> 
> IMO new Assert became trickier to understand than the original condition. 
> YMMV.
> 
> SUGGESTION
> Assert(!(*invalidated && SlotIsLogical(s) && IsBinaryUpgrade));

Yeah, I also liked that style and considered using it, but I didn't
feel it was too hard to read in this particular case, so I ended up
using the current way.  Just like with the point of other comments,
I'm not particularly attached to this style. Thus if someone find it
difficult to read, I have no issue with changing it. Revised as
suggested.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index b541be8eec..46833f6ecd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2063,6 +2063,29 @@ check_wal_segment_size(int *newval, void **extra, 
GucSource source)
return true;
 }
 
+/*
+ * GUC check_hook for max_slot_wal_keep_size
+ *
+ * If WALs needed by logical replication slots are deleted, these slots become
+ * inoperable. During a binary upgrade, pg_upgrade sets this variable to -1 via
+ * the command line in an attempt to prevent such deletions, but users have
+ * ways to override it. To ensure the successful completion of the upgrade,
+ * it's essential to keep this variable unaltered.  See
+ * InvalidatePossiblyObsoleteSlot() and start_postmaster() in pg_upgrade for
+ * more details.
+ */
+bool
+check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source)
+{
+   if (IsBinaryUpgrade && *newval != -1)
+   {
+   GUC_check_errdetail("\"%s\" must be set to -1 during binary 
upgrade mode.",
+   "max_slot_wal_keep_size");
+   return false;
+   }
+   return true;
+}
+
 /*
  * At a checkpoint, how many WAL segments to recycle as preallocated future
  * XLOG segments? Returns the highest segment that should be preallocated.
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 99823df3c7..5c3d2b1082 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1424,18 +1424,12 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
SpinLockRelease(>mutex);
 
/*
-* The logical replication slots shouldn't be invalidated as
-* max_slot_wal_keep_size GUC is set to -1 during the upgrade.
-*
-* The following is just a sanity check.
+* check_max_slot_wal_keep_size() ensures 
max_slot_wal_keep_size is set
+* to -1, so, slot invalidation for logical slots shouldn't 
happen
+* d

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

2023-11-01 Thread Kyotaro Horiguchi

At Wed, 1 Nov 2023 08:39:17 +0900, Michael Paquier  wrote 
in 
> See in StartupXLOG(), around the comment "complain if we did not roll
> forward far enough to reach".  This complains if archive recovery has
> been requested *or* if we retrieved a backup end LSN from the
> backup_label.

Please note that backupStartPoint is not reset even when reaching the
backup end point during crash recovery. If backup_label enforces
archive recovery, I think this point won't be an issue as you
mentioned. For the record, my earlier proposal aimed to detect
reaching the end point even during crash recovery.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Wrong function name in pgstatfuncs.c

2023-11-01 Thread Kyotaro Horiguchi

Since pgstatfuncs.c was refactored, the comments for synthesized
function names are significant to find the function body.

I happened to find a misspelling among the function name
comments. "pg_stat_get_mods_since_analyze" should be
"pg_stat_get_mod_since_analyze".

Upon checking the file using a rudimentary script, I found no other
similar mistakes in the same file.

(FWIW, I also feel that these macros might be going a bit too far by
synthesizing even the function names.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 28ee97968b..1fb8b31863 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -78,7 +78,7 @@ PG_STAT_GET_RELENTRY_INT64(ins_since_vacuum)
 /* pg_stat_get_live_tuples */
 PG_STAT_GET_RELENTRY_INT64(live_tuples)
 
-/* pg_stat_get_mods_since_analyze */
+/* pg_stat_get_mod_since_analyze */
 PG_STAT_GET_RELENTRY_INT64(mod_since_analyze)
 
 /* pg_stat_get_numscans */

Re: A recent message added to pg_upgade

2023-10-31 Thread Kyotaro Horiguchi

At Mon, 30 Oct 2023 14:55:01 +0530, Bharath Rupireddy 
 wrote in 
> > I get it. I agree to go with just the assert because the GUC
> > check_hook kinda tightens the screws against setting
> > max_slot_wal_keep_size to a value other than -1 during the binary
> > upgrade,

Thanks for being on the same page.

>  A few comments on your inhibit_m_s_w_k_s_during_upgrade_2.txt:
> 
> 1.
> + * All WAL files on the publisher node must be retained during an upgrade to
> + * maintain connections from downstreams.  While pg_upgrade explicitly sets
> 
> It's not just the publisher, anyone using logical slots. Also, no
> downstream please. If you want, you can repurpose the comment that's
> added by 29d0a77f.
> 
> /*
>  * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
>  * checkpointer process.  If WALs required by logical replication slots
>  * are removed, the slots are unusable.  This setting prevents the
>  * invalidation of slots during the upgrade. We set this option when
>  * cluster is PG17 or later because logical replication slots can only be
>  * migrated since then. Besides, max_slot_wal_keep_size is added in PG13.
>  */

It is helpful. Thanks!

> 2.
>  At present, only logical slots really require
> + * this.
> 
> Can we remove the above comment as the code with SlotIsLogical(s)
> explains it all?

max_slot_wal_keep_size affects both logical and physical
slots. Therefore, if we are interested only one of the two types of
slots, it's important to clarify the rationale.  Regardless of any
potential exntension to physical slots, I believe it's essential to
clarify the rationale. I couldn't determine from that extensive thread
whether there's a possible extension to physical slots. Could you
inform me if such an extension can happen and, if not, provide the
reason?


> 3.
> +GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set
> to -1 during the upgrade.");
> +return false;
> 
> How about we be explicit like the following which helps users reason
> about this restriction instead of them looking at the comments/docs?
> 
> GUC_check_errcode(ERRCODE_INVALID_PARAMETER_VALUE);
> GUC_check_errmsg(""\"max_slot_wal_keep_size\" must be set
> to -1 when in binary upgrade mode");
> GUC_check_errdetail("A value of -1 prevents the removal of
> WAL required for logical slots upgrade.");
> return false;



I don't quite see the reason to provide such a detailed explanation
just for this error. Additionally, since this check is performed
regardless of the presence or absense of logical slots, I think the
errdetail message might potentially confuse those whosee it. Adding
"binary" looks fine as is and done in the attached.

> 4. I think a test case to hit the error in the check hook in
> 003_logical_slots.pl will help a lot here - not only covers the code
> but also helps demonstrate how one can reach the error.

Yeah, of course. I was planning to add tests once the direction of the
discussion became clear. I will add them in the next version.

> 5. I think the check_hook is better defined in xlog.c the place where
> it's actually being declared and in action. IMV, there's no reason for
> it to be in slot.c as it doesn't deal with any slot related
> variables/structs. This also avoids an unnecessary "utils/guc_hooks.h"
> inclusion in slot.c.
> +bool
> +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source)
> +{

Sounds reasonable. Moved. I simply moved it to xlog.c, but the
function comment was thoroughly written only for this moved function,
making it somewhat stand out..

> 5. A possible problem with this check_hook approach is that it doesn't
> let anyone setting max_slot_wal_keep_size to a value other than -1
> during pg_ugprade even if someone doesn't have logical slots or
> doesn't want to upgrade logical slots in which case the WAL file
> growth during pg_upgrade may be huge (transiently) unless the
> pg_resetwal step of pg_upgrade removes it at the end.

While I doubt anyone wishes to set the variable to a specific value
during upgrade, think there are individuals who might be reluctant to
edit the config file due to unclear reasons.  While we could consider
an alternative - checking for logical slots during binary upgrade-
it's debatable if the effort is justified. (I haven't verified its
feasibility, however.)


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index b541be8eec..d83b55da68 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2063,6 +2063,27 @@ check_wal_segment_

Re: Add new option 'all' to pg_stat_reset_shared()

2023-10-30 Thread Kyotaro Horiguchi

At Mon, 30 Oct 2023 14:15:53 +0530, Bharath Rupireddy 
 wrote in 
> > > I imagine there are cases where people want to initialize all of them
> > > at the same time in addition to initializing one at a time.
> >
> > FWIW, I fairly often wanted it, but forgot about that:p
> 
> Isn't calling pg_stat_reset_shared() for all stats types helping you
> out? Is there any problem with it? Can you be more specific about the
> use-case?

Oh. Sorry, it's my bad. pg_stat_reset_shared() is sufficient.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add new option 'all' to pg_stat_reset_shared()

2023-10-30 Thread Kyotaro Horiguchi

At Mon, 30 Oct 2023 16:35:19 +0900, torikoshia  
wrote in 
> Hi,
> 
> After 96f052613f3, we have below 6 types of parameter for
> pg_stat_reset_shared().
> 
>   "archiver", "bgwriter", "checkpointer", "io", "recovery_prefetch",
>   "wal"
> 
> How about adding a new option 'all' to delete all targets above?
> 
> I imagine there are cases where people want to initialize all of them
> at the same time in addition to initializing one at a time.

FWIW, I fairly often wanted it, but forgot about that:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: A recent message added to pg_upgade

2023-10-30 Thread Kyotaro Horiguchi

At Mon, 30 Oct 2023 12:38:47 +0530, Bharath Rupireddy 
 wrote in 
> Will the check_hook approach work correctly? I haven't checked that by
> myself, but I see InitializeGUCOptions() getting called before
> IsBinaryUpgrade is set to true and the passed-in config options ('c')
> are parsed.

I'm not sure about the wanted behavior exactly, but the fast you
pointed doesn't matter because the check is required after parsing the
command line options. On the other hand, I'm not sure about the
behavior that a setting in postgresql.conf is rejected.

> If the check_hook approach works correctly, I think we must add a test
> hitting the error in check_max_slot_wal_keep_size for the
> IsBinaryUpgrade case. And, I agree with Amit to have a detailed
> messaging with GUC_check_errmsg/GUC_check_errdetail. Also, IMV,
> leaving the error message in InvalidatePossiblyObsoleteSlot() there
> (if required with a better wording as discussed initially in this
> thread) does no harm. Actually, it acts as another safety net given
> that max_slot_wal_keep_size GUC is reloadable via SIGHUP.

The error message, which is deemed impossible, adds an additional
message translation. In another thread, we are discussing the
reduction of translatable messages. Therefore, I suggest using elog()
for the condition at the very least. Whether it should be elog() or
Assert() remains open for discussion, as I don't have a firm stance on
it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3131 matches

Mail list logo