date:20231102

Re: Three commit tips

2023-11-02 Thread Ashutosh Bapat

On Thu, Nov 2, 2023 at 8:52 PM Bruce Momjian  wrote:
>
> Third, I have come up with the following shell script to test for proper
> pgindentation, which I run automatically before commit:
>
> # 
> https://www.postgresql.org/message-id/CAGECzQQL-Dbb%2BYkid9Dhq-491MawHvi6hR_NGkhiDE%2B5zRZ6vQ%40mail.gmail.com
> src/tools/pgindent/pgindent $(git diff --name-only 
> --diff-filter=ACMR) > /tmp/$$
>
> if [ \( "$(wc -l < /tmp/$$)" -eq 1 -a "$(expr match "$(cat /tmp/$$)" 
> "No files to process\>")" -eq 0 \) -o \
>  "$(wc -l < /tmp/$$)" -gt 1 ]
> thenecho "pgindent failure in master branch, exiting." 1>&2
> cat /tmp/$$
> exit 1
> fi
>

Looks useful. Git supports pre-push hook:
https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks, a sample
script showing how to access the commits being pushed and other
arguments at 
https://github.com/git/git/blob/87c86dd14abe8db7d00b0df5661ef8cf147a72a3/templates/hooks--pre-push.sample.
I have not used it. But it seems that your script can be used to
implement the pre-push hook.

-- 
Best Wishes,
Ashutosh Bapat

Re: A recent message added to pg_upgade

2023-11-02 Thread Peter Smith

On Fri, Nov 3, 2023 at 1:11 PM Amit Kapila  wrote:
>
> On Thu, Nov 2, 2023 at 2:36 PM Amit Kapila  wrote:
> >
> > On Thu, Nov 2, 2023 at 11:32 AM Michael Paquier  wrote:
> > >
> > > On Thu, Nov 02, 2023 at 02:32:07PM +1100, Peter Smith wrote:
> > > > On Thu, Nov 2, 2023 at 2:25 PM Peter Smith  
> > > > wrote:
> > > >> Checking this patch yesterday prompted me to create a new thread
> > > >> questioning the inconsistencies of the "GUC names in messages". In
> > > >> that thread, Tom Lane replied and gave some background information [1]
> > > >> about the GUC name embedding versus substitution. In hindsight, I
> > > >> think your original message was fine as-is, but there seem to be
> > > >> examples of every kind of style, so whatever you do would have some
> > > >> precedent.
> > > >>
> > > >> The patch v4 LGTM.
> > > >
> > > > To clarify, all the current code LGTM, but the patch is still missing
> > > > a guc_hook test case, right?
> > >
> > > -   NULL, NULL, NULL
> > > +   check_max_slot_wal_keep_size, NULL, NULL
> > >
> > > FWIW, I am +-0 with what you are proposing here.  I don't quite get
> > > why one may want to enforce this specific GUC at upgrade.
> > >
> >
> > I also can't think of a good reason to do so but OTOH, I can't imagine
> > all possible scenarios. As this setting is invalid or can cause
> > problems, it seems people favor preventing it. Alvaro also voted in
> > favor of preventing it, so we are considering to proceed with it
> > unless more people think otherwise.
> >
>
> Now, that Michael also committed another similar change in commit
> 7021d3b176, it is better to be consistent in both cases. So, either we

I agree. Both patches are setting a special GUC value at the command
line, and both of them don't want the user to somehow override that.
Since the requirements are the same, I felt the implementations
(regardless if they use a guc hook or something else) should also be
done the same way. Yesterday I posted a review comment on the other
thread [1] (#2c) trying to express the same point about consistency.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPsCzt%3DO3_xkyrskaZ3SMxaXoN4L5Z5CgvaGPNx3mXXxOQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Moving forward with TDE [PATCH v3]

2023-11-02 Thread Andres Freund

Hi,

On 2023-10-31 16:23:17 -0500, David Christensen wrote:
> The patches are as follows:
>
> 0001 - doc updates
> 0002 - Basic key management and cipher support
> 0003 - Backend-related changes to support heap encryption
> 0004 - modifications to bin tools and programs to manage key rotation and
> add other knowledge
> 0005 - Encrypted/authenticated WAL
>
> These are very broad strokes at this point and should be split up a bit
> more to make things more granular and easier to review, but I wanted to get
> this update out.

Yes, particularly 0003 really needs to be split - as is it's not easily
reviewable.



> From 327e86d52be1df8de9c3a324cb06b85ba5db9604 Mon Sep 17 00:00:00 2001
> From: David Christensen 
> Date: Fri, 29 Sep 2023 15:16:00 -0400
> Subject: [PATCH v3 5/5] Add encrypted/authenticated WAL
>
> When using an encrypted cluster, we need to ensure that the WAL is also
> encrypted. While we could go with an page-based approach, we use instead a
> per-record approach, using GCM for the encryption method and storing the 
> AuthTag
> in the xl_crc field.
>
> We change the xl_crc field to instead be a union struct, with a compile-time
> adjustable size to allow us to customize the number of bytes allocated to the
> GCM authtag.  This allows us to easily adjust the size of bytes needed to
> support our authentication.  (Testing has included up to 12, but leaving at 
> this
> point to 4 due to keeping the size of the WAL records the same.)

Ugh, that'll be quite a bit of overhead in some workloads... You can't really
use such a build for non-encrypted workloads, making this a not very
deployable path...


> @@ -905,20 +905,28 @@ XLogInsertRecord(XLogRecData *rdata,
>   {
>   /*
>* Now that xl_prev has been filled in, calculate CRC of the 
> record
> -  * header.
> +  * header.  If we are using encrypted WAL, this CRC is 
> overwritten by
> +  * the authentication tag, so just zero
>*/
> - rdata_crc = rechdr->xl_crc;
> - COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc));
> - FIN_CRC32C(rdata_crc);
> - rechdr->xl_crc = rdata_crc;
> + if (!encrypt_wal)
> + {
> + rdata_crc = rechdr->xl_integrity.crc;
> + COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, 
> xl_integrity.crc));
> + FIN_CRC32C(rdata_crc);
> + rechdr->xl_integrity.crc = rdata_crc;
> + }
> + else
> + memset(>xl_integrity, 0, 
> sizeof(rechdr->xl_integrity));

Why aren't you encrypting most of the data here?  Just as for CRC computation,
encrypting a large record in XLOG_BLCKSZ


>   * XLogRecordDataHeaderLong structs all begin with a single 'id' byte. It's
>   * used to distinguish between block references, and the main data structs.
>   */
> +
> +#define XL_AUTHTAG_SIZE 4
> +#define XL_HEADER_PAD 2
> +
>  typedef struct XLogRecord
>  {
>   uint32  xl_tot_len; /* total len of entire record */
> @@ -45,14 +49,16 @@ typedef struct XLogRecord
>   XLogRecPtr  xl_prev;/* ptr to previous record in 
> log */
>   uint8   xl_info;/* flag bits, see below */
>   RmgrId  xl_rmid;/* resource manager for this 
> record */
> - /* 2 bytes of padding here, initialize to zero */
> - pg_crc32c   xl_crc; /* CRC for this record */
> -
> + uint8   xl_pad[XL_HEADER_PAD];  /* required alignment 
> padding */

What does "required" mean here? And why is this defined in a separate define?
And why do we need the explicit field here at all? The union will still
provide sufficient alignment for a uint32.

It also doesn't seem right to remove the comment about needing to zero out the
space.


> From 7557acf60f52da4a86fd9f902bab4804c028dd4b Mon Sep 17 00:00:00 2001
> From: David Christensen 
> Date: Tue, 31 Oct 2023 15:24:02 -0400
> Subject: [PATCH v3 3/5] Backend-related changes




> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -323,6 +323,11 @@ end_heap_rewrite(RewriteState state)
>   state->rs_buffer,
>   true);
>
> + PageEncryptInplace(state->rs_buffer, MAIN_FORKNUM,
> +
> RelationIsPermanent(state->rs_new_rel),
> +state->rs_blockno,
> +
> RelationGetSmgr(state->rs_new_rel)->smgr_rlocator.locator.relNumber
> + );
>   PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
>
>   smgrextend(RelationGetSmgr(state->rs_new_rel), MAIN_FORKNUM,

I don't think it's ok to have to make such changes in a

Re: Tab completion regression test failed on illumos

2023-11-02 Thread Japin Li



On Fri, 03 Nov 2023 at 10:03, Thomas Munro  wrote:
> On Fri, Nov 3, 2023 at 2:22 PM Thomas Munro  wrote:
>> On Fri, Nov 3, 2023 at 3:42 AM Japin Li  wrote:
>> > I think this might be a bug comes from Illumos pseudo-tty. I can reproduce
>> > this by using pseudo-tty on Illumos.
>>
>> I don't know but my guess is that this has to do with termios defaults
>> being different.  From a quick look at 'man termios', perhaps TABDLY
>> is set to expand tabs to spaces?  Can you fix it by tweaking the flags
>> in src/common/sprompt.c?
>
> Argh, sorry that's completely the wrong end.  I suppose that sort of
> thing would have to happen in IPC::Run.  I wonder what would happen if
> IPC::Run called  ->set_raw() on the IO::Pty object it constructs, or
> failing that, if IO::Stty can be used to mess with the relevant
> settings.

Thanks for your explantation, the termios.c_oflag on Illumos enables TABDLY
by default.

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Re: A recent message added to pg_upgade

2023-11-02 Thread Amit Kapila

On Thu, Nov 2, 2023 at 2:36 PM Amit Kapila  wrote:
>
> On Thu, Nov 2, 2023 at 11:32 AM Michael Paquier  wrote:
> >
> > On Thu, Nov 02, 2023 at 02:32:07PM +1100, Peter Smith wrote:
> > > On Thu, Nov 2, 2023 at 2:25 PM Peter Smith  wrote:
> > >> Checking this patch yesterday prompted me to create a new thread
> > >> questioning the inconsistencies of the "GUC names in messages". In
> > >> that thread, Tom Lane replied and gave some background information [1]
> > >> about the GUC name embedding versus substitution. In hindsight, I
> > >> think your original message was fine as-is, but there seem to be
> > >> examples of every kind of style, so whatever you do would have some
> > >> precedent.
> > >>
> > >> The patch v4 LGTM.
> > >
> > > To clarify, all the current code LGTM, but the patch is still missing
> > > a guc_hook test case, right?
> >
> > -   NULL, NULL, NULL
> > +   check_max_slot_wal_keep_size, NULL, NULL
> >
> > FWIW, I am +-0 with what you are proposing here.  I don't quite get
> > why one may want to enforce this specific GUC at upgrade.
> >
>
> I also can't think of a good reason to do so but OTOH, I can't imagine
> all possible scenarios. As this setting is invalid or can cause
> problems, it seems people favor preventing it. Alvaro also voted in
> favor of preventing it, so we are considering to proceed with it
> unless more people think otherwise.
>

Now, that Michael also committed another similar change in commit
7021d3b176, it is better to be consistent in both cases. So, either we
should have check hooks for both parameters or follow another route
where we always forcibly override these parameters (which means the
user-provided values for these parameters will be ignored) in
pg_upgrade and document it. Yet another simple way is to simply
document the current behavior. In the future, if we see users complain
about this or have use cases to use these parameters during an
upgrade, we can accordingly try to adapt the behavior.

Thoughts?

-- 
With Regards,
Amit Kapila.

Re: Tab completion regression test failed on illumos

2023-11-02 Thread Thomas Munro

On Fri, Nov 3, 2023 at 2:22 PM Thomas Munro  wrote:
> On Fri, Nov 3, 2023 at 3:42 AM Japin Li  wrote:
> > I think this might be a bug comes from Illumos pseudo-tty. I can reproduce
> > this by using pseudo-tty on Illumos.
>
> I don't know but my guess is that this has to do with termios defaults
> being different.  From a quick look at 'man termios', perhaps TABDLY
> is set to expand tabs to spaces?  Can you fix it by tweaking the flags
> in src/common/sprompt.c?

Argh, sorry that's completely the wrong end.  I suppose that sort of
thing would have to happen in IPC::Run.  I wonder what would happen if
IPC::Run called  ->set_raw() on the IO::Pty object it constructs, or
failing that, if IO::Stty can be used to mess with the relevant
settings.

Re: Possible typo in nodeAgg.c

2023-11-02 Thread David Rowley

On Fri, 3 Nov 2023 at 13:49, Bruce Momjian  wrote:
>
> On Fri, Oct 16, 2020 at 09:03:52AM +, Hou, Zhijie wrote:
> >   /*
> >* Don't set the limit below 3/4 of hash_mem. In that case, we are at 
> > the
> >* minimum number of partitions, so we aren't going to dramatically 
> > exceed
> > -  * work mem anyway.
> > +  * hash_mem anyway.
>
> Can someone comment on this?  Is the text change correct?

"work mem" is incorrect.  I'd prefer it if we didn't talk about
hash_mem as if it were a thing.  It's work_mem * hash_mem_multiplier.
Because of the underscore, using "hash_mem" to mean this makes it look
like we're talking about a variable by that name. Maybe it would be
better to refer to the variable name that's used to store the result
of get_hash_memory_limit(), i.e. hash_mem_limit. "the limit" should
likely use "*mem_limit" instead as there are multiple limits
mentioned.

It would also be better if this comment explained what's special about
4 * partition_mem. It seems to have nothing to do with the 3/4
mentioned in the comment.

David

Re: Tab completion regression test failed on illumos

2023-11-02 Thread Thomas Munro

On Fri, Nov 3, 2023 at 3:42 AM Japin Li  wrote:
> On Thu, 02 Nov 2023 at 22:23, Tom Lane  wrote:
> > Japin Li  writes:
> >> It seems the 'SEL\t' is converted to 'SEL ' which is "SEL" with 5 
> >> spaces.
> >
> > That would be plausible if readline were disabled, or otherwise
> > not functioning.
> >
>
> I think this might be a bug comes from Illumos pseudo-tty. I can reproduce
> this by using pseudo-tty on Illumos.

I don't know but my guess is that this has to do with termios defaults
being different.  From a quick look at 'man termios', perhaps TABDLY
is set to expand tabs to spaces?  Can you fix it by tweaking the flags
in src/common/sprompt.c?  Somewhere near the line that disables ECHO,
perhaps you can figure out how to disable that in c_oflag?  This is
all the ancient forgotten magic that allows all Unixes to drive 70
year old electric typewriters, inserting suitable pauses and
converting various things as it goes.

Re: Three commit tips

2023-11-02 Thread Bruce Momjian

On Thu, Nov  2, 2023 at 11:07:19AM -0500, Nathan Bossart wrote:
> On Thu, Nov 02, 2023 at 11:22:38AM -0400, Bruce Momjian wrote:
> > First, I have been hesitant to ascribe others as patch authors if I
> > heavily modified a doc patch because I didn't want them blamed for any
> > mistakes I made.  However, I also want to give them credit, so I decided
> > I would annotate commits with "partial", e.g.:
> > 
> > Author:  Andy Jackson (partial)
> 
> I tend to use "Co-authored-by" for this purpose.

Very good idea.  I will use that instead.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Possible typo in nodeAgg.c

2023-11-02 Thread Bruce Momjian

On Fri, Oct 16, 2020 at 09:03:52AM +, Hou, Zhijie wrote:
> Hi
> 
> In /src/backend/executor/nodeAgg.c
> 
> I found the following comment still use work mem,
> Since hash_mem has been introduced, Is it more accurate to use hash_mem here ?
> 
> @@ -1827,7 +1827,7 @@ hash_agg_set_limits(double hashentrysize, double 
> input_groups, int used_bits,
>   /*
>* Don't set the limit below 3/4 of hash_mem. In that case, we are at 
> the
>* minimum number of partitions, so we aren't going to dramatically 
> exceed
> -  * work mem anyway.
> +  * hash_mem anyway.

Can someone comment on this?  Is the text change correct?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

2023-11-02 Thread David Rowley

On Fri, 3 Nov 2023 at 01:02, Richard Guo  wrote:
> It seems that the test is still not stable on 32-bit machines even after
> 4b14e18714.  I see the following plan diff on cfbot [1].

I recreated that locally this time.  Seems there's still flexibility
to push or not push down the sort and the costs of each are close
enough that it differs between 32 and 64-bit.

The fix I just pushed removes the flexibility for doing a local sort
by turning off enable_sort.

Thanks for the report.

David

Re: Pre-proposal: unicode normalized text

2023-11-02 Thread Nico Williams

On Wed, Oct 04, 2023 at 01:15:03PM -0700, Jeff Davis wrote:
> > The fact that there are multiple types of normalization and multiple
> > notions of equality doesn't make this easier.

And then there's text that isn't normalized to any of them.

> NFC is really the only one that makes sense.

Yes.

Most input modes produce NFC, though there may be scripts (like Hangul)
where input modes might produce NFD, so I wouldn't say NFC is universal.

Unfortunately HFS+ uses NFD so NFD can leak into places naturally enough
through OS X.

> I believe that having a kind of text data type where it's stored in NFC
> and compared with memcmp() would be a good place for many users to be -
> - probably most users. It's got all the performance and stability
> benefits of memcmp(), with slightly richer semantics. It's less likely
> that someone malicious can confuse the database by using different
> representations of the same character.
> 
> The problem is that it's not universally better for everyone: there are
> certainly users who would prefer that the codepoints they send to the
> database are preserved exactly, and also users who would like to be
> able to use unassigned code points.

The alternative is forminsensitivity, where you compare strings as
equal even if they aren't memcmp() eq as long as they are equal when
normalized.  This can be made fast, though not as fast as memcmp().

The problem with form insensitivity is that you might have to implement
it in numerous places.  In ZFS there's only a few, but in a database
every index type, for example, will need to hook in form insensitivity.
If so then that complexity would be a good argument to just normalize.

Nico
--

Re: Pre-proposal: unicode normalized text

2023-11-02 Thread Nico Williams

On Tue, Oct 17, 2023 at 05:07:40PM +0200, Daniel Verite wrote:
> > * Add a per-database option to enforce only storing assigned unicode
> > code points.
> 
> There's a problem in the fact that the set of assigned code points is
> expanding with every Unicode release, which happens about every year.
> 
> If we had this option in Postgres 11 released in 2018 it would use
> Unicode 11, and in 2023 this feature would reject thousands of code
> points that have been assigned since then.

Yes, and that's desirable if PG were to normalize text as Jeff proposes,
since then PG wouldn't know how to normalize text containing codepoints
assigned after that.  At that point to use those codepoints you'd have
to upgrade PG -- not too unreasonable.

Nico
--

Re: Pre-proposal: unicode normalized text

2023-11-02 Thread Nico Williams

On Wed, Oct 04, 2023 at 01:16:22PM -0400, Robert Haas wrote:
> There's a very popular commercial database where, or so I have been
> led to believe, any byte sequence at all is accepted when you try to
> put values into the database. [...]

In other circles we call this "just-use-8".

ZFS, for example, has an option to require that filenames be valid
UTF-8 or not, and if not it will accept any garbage (other than ASCII
NUL and /, for obvious reasons).

For filesystems the situation is a bit dire because:

 - strings at the system call boundary have never been tagged with a
   codeset (in the beginning there was only ASCII)
 - there has never been a standard codeset to use at the system call
   boundary, 
 - there have been multiple codesets in use for decades

so filesystems have to be prepared to be tolerant of garbage, at least
until only Unicode is left (UTF-16 on Windows filesystems, UTF-8 for
most others).

This is another reason that ZFS has form-insensitive/form-preserving
behavior: if you want to use non-UTF-8 filenames then names or
substrings thereof that look like valid UTF-8 won't accidentally be
broken by normalization.

If PG never tagged strings with codesets on the wire then PG has the
same problem, especially since there's multiple implementations of the
PG wire protocol.

So I can see why a "popular database" might want to take this approach.

For the longer run though, either move to supporting only UTF-8, or
allow multiple text types each with a codeset specified in its type.

> At any rate, if we were to go in the direction of rejecting code
> points that aren't yet assigned, or aren't yet known to the collation
> library, that's another way for data loading to fail. Which feels like
> very defensible behavior, but not what everyone wants, or is used to.

Yes.  See points about ZFS.  I do think ZFS struck a good balance.

PG could take the ZFS approach and add functions for use in CHECK
constraints that enforce valid UTF-8, valid Unicode (no use of
unassigned codepoints, no use of private use codepoints not configured
into the database), etc.

Coming back to the "just-use-8" thing, a database could have a text type
where the codeset is not specified, one or more text types where the
codeset is specified, manual or automatic codeset conversions, and
whatever enforcement functions make sense.  Provided that the type
information is not lost at the edges.

> > Whether we ever get to a core data type -- and more importantly,
> > whether anyone uses it -- I'm not sure.
> 
> Same here.

A TEXTutf8 type (whatever name you want to give it) could be useful as a
way to a) opt into heavier enforcement w/o having to write CHECK
constraints, b) documentation of intent, all provided that the type is
not lost on the wire nor in memory.

Support for other codesets is less important.

Nico
--

Re: Pre-proposal: unicode normalized text

2023-11-02 Thread Nico Williams

On Fri, Oct 06, 2023 at 02:37:06PM -0400, Robert Haas wrote:
> > Sure, because TEXT in PG doesn't have codeset+encoding as part of it --
> > it's whatever the database's encoding is.  Collation can and should be a
> > porperty of a column, since for Unicode it wouldn't be reasonable to
> > make that part of the type.  But codeset+encoding should really be a
> > property of the type if PG were to support more than one.  IMO.
> 
> No, what I mean is, you can't just be like "oh, the varlena will be
> different in memory than on disk" as if that were no big deal.

It would have to be the same in memory as on disk, indeed, but you might
need new types in C as well for that.

> I agree that, as an alternative to encoding being a column property,
> it could instead be completely a type property, meaning that if you
> want to store, say, LATIN1 text in your UTF-8 database, you first
> create a latint1text data type and then use it, rather than, as in the
> model I proposed, creating a text column and then applying a setting
> like ENCODING latin1 to it. I think that there might be some problems

Yes, that was the idea.

> with that model, but it could also have some benefits. [...]

Mainly, I think, whether you want PG to do automatic codeset conversions
(ugly and problematic) or not, like for when using text functions.

Automatic codeset conversions are problematic because a) it can be lossy
(so what to do when it is?) and b) automatic type conversions can be
surprising.

Ultimately the client would have to do its own codeset conversions, if
it wants them, or treat text in codesets other than its local one as
blobs and leave it for a higher app layer to deal with.

I wouldn't want to propose automatic codeset conversions.  If you'd want
that then you might as well declare it has to all be UTF-8 and say no to
any other codesets.

> But, even if we were all convinced that this kind of feature was good
> to add, I think it would almost certainly be wrong to invent new
> varlena features along the way.

Yes.

Nico
--

Re: Remove distprep

2023-11-02 Thread Andres Freund

On 2023-11-01 16:39:24 -0400, Peter Eisentraut wrote:
> > OTOH, it seems somewhat unlikely that maintainer-clean is utilized much in
> > extensions. I see it in things like postgis, but that has it's own configure
> > etc, even though it also invokes pgxs.
> 
> I thought about this.  I don't think this is something that any extension
> would use.  If they care about the distinction between distclean and
> maintainer-clean, are they also doing their own distprep and dist?  Seems
> unlikely.  I mean, if some extension is actually affected, I'm happy to
> accommodate, but we can deal with that when we learn about it.  Moreover, if
> we are moving forward in this direction, we would presumably also like the
> extensions to get rid of their distprep step.
> 
> So I think we are ready to move ahead with this patch.  There have been some
> light complaints earlier in this thread that people wanted to keep some way
> to clean only some of the files.  But there hasn't been any concrete
> follow-up on that, as far as I can see, so I don't know what to do about
> that.

+1, let's do this. We can add dedicated target for more specific cases later
if we decide we want that.

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-11-02 Thread Tomas Vondra




On 11/2/23 22:33, Matthias van de Meent wrote:
> On Thu, 2 Nov 2023 at 22:25, Tomas Vondra  
> wrote:
>>
>>
>>
>> On 11/2/23 21:02, Matthias van de Meent wrote:
>>> On Thu, 2 Nov 2023 at 20:32, Tomas Vondra  
>>> wrote:
 On 11/2/23 20:09, stepan rutz wrote:
> db1=# explain (analyze, serialize) select * from test;
> QUERY PLAN
> ---
>  Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
> time=0.023..0.027 rows=1 loops=1)
>  Planning Time: 0.077 ms
>  Execution Time: 303.281 ms
>  Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
 [...]
 BTW if you really want to print amount of memory, maybe print it in
 kilobytes, like every other place in explain.c?
>>>
>>> Isn't node width in bytes, or is it an opaque value not to be
>>> interpreted by users? I've never really investigated that part of
>>> Postgres' explain output...
>>>
>>
>> Right, "width=" is always in bytes. But fields like amount of sorted
>> data is in kB, and this seems closer to that.
>>
 Also, explain generally
 prints stuff in "key: value" style (in text format).
>>>
>>> That'd be key: metrickey=metricvalue for expanded values like those in
>>> plan nodes and the buffer usage, no?
>>>
>>
>> Possibly. But the proposed output does neither. Also, it starts with
>> "Serialized Bytes" but then prints info about bandwidth.
>>
>>
>  Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
>>>
>>> I was thinking more along the lines of something like this:
>>>
>>> [...]
>>> Execution Time: xxx ms
>>> Serialization: time=yyy.yyy (in ms) size=yyy (in KiB, or B) mode=text
>>> (or binary)
 This is significantly different from your output, as it doesn't hide
>>> the measured time behind a lossy calculation of bandwidth, but gives
>>> the measured data to the user; allowing them to derive their own
>>> precise bandwidth if they're so inclined.
>>>
>>
>> Might work. I'm still not convinced we need to include the mode, or that
>> the size is that interesting/useful, though.
> 
> I'd say size is interesting for systems where network bandwidth is
> constrained, but CPU isn't. We currently only show estimated widths &
> accurate number of tuples returned, but that's not an accurate
> explanation of why your 30-row 3GB resultset took 1h to transmit on a
> 10mbit line - that is only explained by the bandwidth of your
> connection and the size of the dataset. As we can measure the size of
> the returned serialized dataset here, I think it's in the interest of
> any debugability to also present it to the user. Sadly, we don't have
> good measures of bandwidth without sending that data across, so that's
> the only metric that we can't show here, but total query data size is
> definitely something that I'd be interested in here.

Yeah, I agree with that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Pre-proposal: unicode normalized text

2023-11-02 Thread Thomas Munro

bowerbird and hammerkop didn't like commit a02b37fc.  They're still
using the old 3rd build system that is not tested by CI.  It's due for
removal in the 17 cycle IIUC but in the meantime I guess the new
codegen script needs to be invoked by something under src/tools/msvc?

  varlena.obj : error LNK2019: unresolved external symbol
unicode_category referenced in function unicode_assigned
[H:\\prog\\bf\\root\\HEAD\\pgsql.build\\postgres.vcxproj]

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-11-02 Thread Matthias van de Meent

On Thu, 2 Nov 2023 at 22:25, Tomas Vondra  wrote:
>
>
>
> On 11/2/23 21:02, Matthias van de Meent wrote:
> > On Thu, 2 Nov 2023 at 20:32, Tomas Vondra  
> > wrote:
> >> On 11/2/23 20:09, stepan rutz wrote:
> >>> db1=# explain (analyze, serialize) select * from test;
> >>> QUERY PLAN
> >>> ---
> >>>  Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
> >>> time=0.023..0.027 rows=1 loops=1)
> >>>  Planning Time: 0.077 ms
> >>>  Execution Time: 303.281 ms
> >>>  Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
> >> [...]
> >> BTW if you really want to print amount of memory, maybe print it in
> >> kilobytes, like every other place in explain.c?
> >
> > Isn't node width in bytes, or is it an opaque value not to be
> > interpreted by users? I've never really investigated that part of
> > Postgres' explain output...
> >
>
> Right, "width=" is always in bytes. But fields like amount of sorted
> data is in kB, and this seems closer to that.
>
> >> Also, explain generally
> >> prints stuff in "key: value" style (in text format).
> >
> > That'd be key: metrickey=metricvalue for expanded values like those in
> > plan nodes and the buffer usage, no?
> >
>
> Possibly. But the proposed output does neither. Also, it starts with
> "Serialized Bytes" but then prints info about bandwidth.
>
>
> >>>  Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
> >
> > I was thinking more along the lines of something like this:
> >
> > [...]
> > Execution Time: xxx ms
> > Serialization: time=yyy.yyy (in ms) size=yyy (in KiB, or B) mode=text
> > (or binary)
> > > This is significantly different from your output, as it doesn't hide
> > the measured time behind a lossy calculation of bandwidth, but gives
> > the measured data to the user; allowing them to derive their own
> > precise bandwidth if they're so inclined.
> >
>
> Might work. I'm still not convinced we need to include the mode, or that
> the size is that interesting/useful, though.

I'd say size is interesting for systems where network bandwidth is
constrained, but CPU isn't. We currently only show estimated widths &
accurate number of tuples returned, but that's not an accurate
explanation of why your 30-row 3GB resultset took 1h to transmit on a
10mbit line - that is only explained by the bandwidth of your
connection and the size of the dataset. As we can measure the size of
the returned serialized dataset here, I think it's in the interest of
any debugability to also present it to the user. Sadly, we don't have
good measures of bandwidth without sending that data across, so that's
the only metric that we can't show here, but total query data size is
definitely something that I'd be interested in here.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-11-02 Thread Tomas Vondra




On 11/2/23 21:02, Matthias van de Meent wrote:
> On Thu, 2 Nov 2023 at 20:32, Tomas Vondra  
> wrote:
>> On 11/2/23 20:09, stepan rutz wrote:
>>> db1=# explain (analyze, serialize) select * from test;
>>> QUERY PLAN
>>> ---
>>>  Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
>>> time=0.023..0.027 rows=1 loops=1)
>>>  Planning Time: 0.077 ms
>>>  Execution Time: 303.281 ms
>>>  Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
>> [...]
>> BTW if you really want to print amount of memory, maybe print it in
>> kilobytes, like every other place in explain.c?
> 
> Isn't node width in bytes, or is it an opaque value not to be
> interpreted by users? I've never really investigated that part of
> Postgres' explain output...
> 

Right, "width=" is always in bytes. But fields like amount of sorted
data is in kB, and this seems closer to that.

>> Also, explain generally
>> prints stuff in "key: value" style (in text format).
> 
> That'd be key: metrickey=metricvalue for expanded values like those in
> plan nodes and the buffer usage, no?
> 

Possibly. But the proposed output does neither. Also, it starts with
"Serialized Bytes" but then prints info about bandwidth.


>>>  Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
> 
> I was thinking more along the lines of something like this:
> 
> [...]
> Execution Time: xxx ms
> Serialization: time=yyy.yyy (in ms) size=yyy (in KiB, or B) mode=text
> (or binary)
> > This is significantly different from your output, as it doesn't hide
> the measured time behind a lossy calculation of bandwidth, but gives
> the measured data to the user; allowing them to derive their own
> precise bandwidth if they're so inclined.
> 

Might work. I'm still not convinced we need to include the mode, or that
the size is that interesting/useful, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Moving forward with TDE [PATCH v3]

2023-11-02 Thread Matthias van de Meent

On Tue, 31 Oct 2023 at 22:23, David Christensen
 wrote:
>
> Greetings,
>
> I am including an updated version of this patch series; it has been rebased 
> onto 6ec62b7799 and reworked somewhat.
>
> The patches are as follows:
>
> 0001 - doc updates
> 0002 - Basic key management and cipher support
> 0003 - Backend-related changes to support heap encryption

I'm quite surprised at the significant number of changes being made
outside the core storage manager files. I thought that changing out
mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
would be the most obvious change to implement cluster-wide encryption
with the least code touched, as relations don't need to know whether
the files they're writing are encrypted, right? Is there a reason to
not implement this at the smgr level that I overlooked in the
documentation of these patches?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Add new option 'all' to pg_stat_reset_shared()

2023-11-02 Thread Matthias van de Meent

On Thu, 2 Nov 2023 at 20:26, Bharath Rupireddy
 wrote:
>
> On Wed, Nov 1, 2023 at 4:24 AM Michael Paquier  wrote:
> >
> > On Tue, Oct 31, 2023 at 04:26:18PM +0900, torikoshia wrote:
> > > Yes, calling pg_stat_reset_shared() for all stats types can do what I 
> > > wanted
> > > to do.
> > > But calling it with 6 different parameters seems tiresome and I thought it
> > > would be convenient to have a parameter to delete all cluster-wide
> > > statistics at once.
> > >
> > > I may be wrong, but I imagine that it's more common to want to delete all 
> > > of
> > > the statistics for an entire cluster rather than just a portion of it.
> >
> > If more flexibility is wanted in this function, could it be an option
> > to consider a flavor like pg_stat_reset_shared(text[]), where it is
> > possible to specify a list of shared stats types to reset?  Perhaps
> > there are no real use cases for it, just wanted to mention it anyway
> > regarding the fact that it could have benefits to refactor this code
> > to use a bitwise operator for its internals with bit flags for each
> > type.
>
> I don't see a strong reason to introduce yet-another API when someone
> can just call things in a loop. I could recollect a recent analogy - a
> proposal to have a way to define multiple custom wait events with a
> single function call instead of callers defining in a loop didn't draw
> much interest.

Knowing that your metrics have a shared starting point can be quite
valuable, as it allows you to do some math that would otherwise be
much less accurate when working with stats over a short amount of
time. I've not used these stats systems much myself, but skew between
metrics caused by different reset points can be difficult to detect
and debug, so I think an atomic call to reset all these stats could be
worth implementing.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: "box" type description

2023-11-02 Thread Bruce Momjian

On Thu, Nov  2, 2023 at 05:28:20PM +0900, Kyotaro Horiguchi wrote:
> Thank you for continuing this. The additional changes looks
> fine.
> 
> Upon reviewing the table again in this line, further potential
> improvements and issues have been found. For example:
> 
> character, varchar: don't follow the rule.
> - 'char(length)' blank-padded string, fixed storage length
> + blank-padded string, fixed storage length, format 'char(length)'

So, char() and varchar() are _definition_ synonyms for characater and
character varying, so I put the way you define them at the _front_ of
the text.  The "format" is the _output_ format and I put that at the end
for other types.  I put numeric() at the front too since its definition
is complex.  (I now see numeric should be "precision, scale" so I fixed
that.)

> interval: doesn't follow the rule.
> - @  , time interval
> + time interval, format '[@]  '
> (I think '@' is not necessary here..)

Agreed, '@' is optional so removed, and I added "...".

> pg_snapshot:
> 
>   The description given is just "snapshot", which seems overly simplistic.
> 
> txid_snapshot:
> 
>   The description reads "transaction snapshot". Is this really
>   accurate, especially in contrast with pg_snapshot?

Uh, the docs have for txid_snapshot:

user-level transaction ID snapshot (deprecated; see
pg_snapshot)<

Do we want to add "deprecated" in the output.

> pg_brin_bloom_summary, pg_brin_minmax_multi_summary, pg_mcv_list and many:
> 
> I'm uncertain whether these types, which lack an input syntax (but
> have an output format), qualify as pseudo-types.  Nevertheless, I
> believe it would be beneficial to describe that those types differ
> from ordinary types.

Good point, now labeled as pseudo-types.

Updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 5a6cfbd94d..55340b00ad 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -3366,13 +3366,13 @@ SELECT person.name, holidays.num_weeks FROM person, holidays
 lseg
 32 bytes
 Finite line segment
-((x1,y1),(x2,y2))
+[(x1,y1),(x2,y2)]


 box
 32 bytes
 Rectangular box
-((x1,y1),(x2,y2))
+(x1,y1),(x2,y2)


 path
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 92bcaf2c73..f6110a850d 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -32,7 +32,7 @@
 # OIDS 1 - 99
 
 { oid => '16', array_type_oid => '1000',
-  descr => 'boolean, \'true\'/\'false\'',
+  descr => 'boolean, format \'t\'/\'f\'',
   typname => 'bool', typlen => '1', typbyval => 't', typcategory => 'B',
   typispreferred => 't', typinput => 'boolin', typoutput => 'boolout',
   typreceive => 'boolrecv', typsend => 'boolsend', typalign => 'c' },
@@ -90,7 +90,7 @@
   typispreferred => 't', typinput => 'oidin', typoutput => 'oidout',
   typreceive => 'oidrecv', typsend => 'oidsend', typalign => 'i' },
 { oid => '27', array_type_oid => '1010',
-  descr => '(block, offset), physical location of tuple',
+  descr => 'tuple physical location, format \'(block,offset)\'',
   typname => 'tid', typlen => '6', typbyval => 'f', typcategory => 'U',
   typinput => 'tidin', typoutput => 'tidout', typreceive => 'tidrecv',
   typsend => 'tidsend', typalign => 's' },
@@ -179,34 +179,34 @@
 # OIDS 600 - 699
 
 { oid => '600', array_type_oid => '1017',
-  descr => 'geometric point \'(x, y)\'',
+  descr => 'geometric point, format \'(x,y)\'',
   typname => 'point', typlen => '16', typbyval => 'f', typcategory => 'G',
   typsubscript => 'raw_array_subscript_handler', typelem => 'float8',
   typinput => 'point_in', typoutput => 'point_out', typreceive => 'point_recv',
   typsend => 'point_send', typalign => 'd' },
 { oid => '601', array_type_oid => '1018',
-  descr => 'geometric line segment \'(pt1,pt2)\'',
+  descr => 'geometric line segment, format \'[point1,point2]\'',
   typname => 'lseg', typlen => '32', typbyval => 'f', typcategory => 'G',
   typsubscript => 'raw_array_subscript_handler', typelem => 'point',
   typinput => 'lseg_in', typoutput => 'lseg_out', typreceive => 'lseg_recv',
   typsend => 'lseg_send', typalign => 'd' },
 { oid => '602', array_type_oid => '1019',
-  descr => 'geometric path \'(pt1,...)\'',
+  descr => 'geometric path, format \'(point1,...)\'',
   typname => 'path', typlen => '-1', typbyval => 'f', typcategory => 'G',
   typinput => 'path_in', typoutput => 'path_out', typreceive => 'path_recv',
   typsend => 'path_send', typalign => 'd', typstorage => 'x' },
 { oid => '603', array_type_oid => '1020',
-  descr => 'geometric box \'(lower left,upper right)\'',
+  descr => 'geometric box, format \'lower left point,upper right point\'',
   typname => 'box',

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-11-02 Thread Matthias van de Meent

On Thu, 2 Nov 2023 at 20:32, Tomas Vondra  wrote:
> On 11/2/23 20:09, stepan rutz wrote:
> > db1=# explain (analyze, serialize) select * from test;
> > QUERY PLAN
> > ---
> >  Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
> > time=0.023..0.027 rows=1 loops=1)
> >  Planning Time: 0.077 ms
> >  Execution Time: 303.281 ms
> >  Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
> [...]
> BTW if you really want to print amount of memory, maybe print it in
> kilobytes, like every other place in explain.c?

Isn't node width in bytes, or is it an opaque value not to be
interpreted by users? I've never really investigated that part of
Postgres' explain output...

> Also, explain generally
> prints stuff in "key: value" style (in text format).

That'd be key: metrickey=metricvalue for expanded values like those in
plan nodes and the buffer usage, no?

> >  Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec

I was thinking more along the lines of something like this:

[...]
Execution Time: xxx ms
Serialization: time=yyy.yyy (in ms) size=yyy (in KiB, or B) mode=text
(or binary)

This is significantly different from your output, as it doesn't hide
the measured time behind a lossy calculation of bandwidth, but gives
the measured data to the user; allowing them to derive their own
precise bandwidth if they're so inclined.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-11-02 Thread stepan rutz


Hi Thomas,

indeed by doing less the code also becomes trivial and
ExplainPropertyInteger can be used as a oneliner.

My intention was to actually get the realistic payload-bytes from the
wire-protocol counted by the serialization option. I am also adding the
protocol bits and the length of the data that is generated by
serialization output-functions. So it should (hopefully) be the real
numbers.

Attached is a new revision of the patch that prints kB (floor'ed by
integer-division by 1024). Maybe that is also misleading and bytes would
be nicer (though harder to read).

The output is now as follows:

db1=# explain (analyze, serialize) select * from test;
    QUERY PLAN
---
 Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
time=0.014..0.017 rows=1 loops=1)
 Planning Time: 0.245 ms
 Execution Time: 292.983 ms
 Serialized Bytes: 77039 kB
(4 rows)

Definately a lot nicer and controlled by  ExplainPropertyInteger in
terms of formatting.

The main motivation was to actually get a correct feeling for the
execution time. Actually counting the bytes gives an impression of what
would go over the wire. Only the big numbers matter here of course.

Regards, Stepan



On 02.11.23 20:32, Tomas Vondra wrote:


On 11/2/23 20:09, stepan rutz wrote:

Hi Thomas,

you are right of course. Thanks!

I have attached a new version of the patch that supports the syntax like
suggested. The previous patch was insonsistent in style indeed.

explain (analyze, serialize)

and

explain (analyze, serialize binary)

That doesn't make too much of a difference for most scenarios I am
certain. However the the seralize option itself does. Mostly because it
performs the detoasting and that was a trap for me in the past with just
plain analyze.


Eg this scenario really is not too far fetched in a world where people
have large JSONB values.


db1=# create table test(id bigint, val text);

db1=# insert into test(val) select string_agg(s::text, ',') from (select
generate_series(1, 10_000_000) as s) as a1;

now we have a cell that has roughly 80Mb in it. A large detoasting that
will happen in reallife but in explain(analyze).

and then...

db1=# explain (analyze) select * from test;
     QUERY PLAN
---
  Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
time=0.018..0.020 rows=1 loops=1)
  Planning Time: 0.085 ms
  Execution Time: 0.044 ms
(3 rows)

db1=# explain (analyze, serialize) select * from test;
     QUERY PLAN
---
  Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
time=0.023..0.027 rows=1 loops=1)
  Planning Time: 0.077 ms
  Execution Time: 303.281 ms
  Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
(4 rows)

db1=#

So the explain(analyze) does not process the ~80 MB in 0.044ms in any
way of course.

Honestly, I see absolutely no point in printing this. I have little idea
what to do with the "bytes". We have to transfer this over network, but
surely there's other data not included in this sum, right?

But the bandwidth seems pretty bogus/useless, as it's calculated from
execution time, which includes everything else, not just serialization.
So what does it say? It certainly does not include the network transfer.

IMO we should either print nothing or just the bytes. I don't see the
point in printing the mode, which is determined by the command.


Actually I could print the serialized bytes using 1. grouping-separators
(eg 78_888_953) or 2. in the way pg_size_pretty does it.

If doing it the pg_size_pretty way I am uncertain if it would be ok to
query the actual pg_size_pretty function via its (certainly frozen) oid
of 3166 and do OidFunctionCall1(3166...) to invoke it. Otherwise I'd say
it would be nice if the code from that function would be made available
as a utility function for all c-code.  Any suggestions on this topic?


I'm rather skeptical about this proposal, mostly because it makes it
harder to process the explain output in scripts etc.

But more importantly, it's a completely separate matter from what this
patch does, so if you want to pursue this, I suggest you start a
separate thread. If you want to introduce separators, surely this is not
the only place that should do it (e.g. why not to do that for "rows" or
"cost" estimates)?

BTW if you really want to print amount of memory, maybe print it in
kilobytes, like every other place in explain.c? Also, explain generally
prints stuff in "key: value" style (in text format).


regards

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f1d71bc54e..e5aec7ca58 100644
---

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-11-02 Thread Tomas Vondra

On 11/2/23 20:09, stepan rutz wrote:
> Hi Thomas,
> 
> you are right of course. Thanks!
> 
> I have attached a new version of the patch that supports the syntax like
> suggested. The previous patch was insonsistent in style indeed.
> 
> explain (analyze, serialize)
> 
> and
> 
> explain (analyze, serialize binary)
> 
> That doesn't make too much of a difference for most scenarios I am
> certain. However the the seralize option itself does. Mostly because it
> performs the detoasting and that was a trap for me in the past with just
> plain analyze.
> 
> 
> Eg this scenario really is not too far fetched in a world where people
> have large JSONB values.
> 
> 
> db1=# create table test(id bigint, val text);
> 
> db1=# insert into test(val) select string_agg(s::text, ',') from (select
> generate_series(1, 10_000_000) as s) as a1;
> 
> now we have a cell that has roughly 80Mb in it. A large detoasting that
> will happen in reallife but in explain(analyze).
> 
> and then...
> 
> db1=# explain (analyze) select * from test;
>     QUERY PLAN
> ---
>  Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
> time=0.018..0.020 rows=1 loops=1)
>  Planning Time: 0.085 ms
>  Execution Time: 0.044 ms
> (3 rows)
> 
> db1=# explain (analyze, serialize) select * from test;
>     QUERY PLAN
> ---
>  Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
> time=0.023..0.027 rows=1 loops=1)
>  Planning Time: 0.077 ms
>  Execution Time: 303.281 ms
>  Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
> (4 rows)
> 
> db1=#
> 
> So the explain(analyze) does not process the ~80 MB in 0.044ms in any
> way of course.

Honestly, I see absolutely no point in printing this. I have little idea
what to do with the "bytes". We have to transfer this over network, but
surely there's other data not included in this sum, right?

But the bandwidth seems pretty bogus/useless, as it's calculated from
execution time, which includes everything else, not just serialization.
So what does it say? It certainly does not include the network transfer.

IMO we should either print nothing or just the bytes. I don't see the
point in printing the mode, which is determined by the command.

> 
> Actually I could print the serialized bytes using 1. grouping-separators
> (eg 78_888_953) or 2. in the way pg_size_pretty does it.
> 
> If doing it the pg_size_pretty way I am uncertain if it would be ok to
> query the actual pg_size_pretty function via its (certainly frozen) oid
> of 3166 and do OidFunctionCall1(3166...) to invoke it. Otherwise I'd say
> it would be nice if the code from that function would be made available
> as a utility function for all c-code.  Any suggestions on this topic?
> 

I'm rather skeptical about this proposal, mostly because it makes it
harder to process the explain output in scripts etc.

But more importantly, it's a completely separate matter from what this
patch does, so if you want to pursue this, I suggest you start a
separate thread. If you want to introduce separators, surely this is not
the only place that should do it (e.g. why not to do that for "rows" or
"cost" estimates)?

BTW if you really want to print amount of memory, maybe print it in
kilobytes, like every other place in explain.c? Also, explain generally
prints stuff in "key: value" style (in text format).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add new option 'all' to pg_stat_reset_shared()

2023-11-02 Thread Bharath Rupireddy

On Wed, Nov 1, 2023 at 4:24 AM Michael Paquier  wrote:
>
> On Tue, Oct 31, 2023 at 04:26:18PM +0900, torikoshia wrote:
> > Yes, calling pg_stat_reset_shared() for all stats types can do what I wanted
> > to do.
> > But calling it with 6 different parameters seems tiresome and I thought it
> > would be convenient to have a parameter to delete all cluster-wide
> > statistics at once.
> >
> > I may be wrong, but I imagine that it's more common to want to delete all of
> > the statistics for an entire cluster rather than just a portion of it.
>
> If more flexibility is wanted in this function, could it be an option
> to consider a flavor like pg_stat_reset_shared(text[]), where it is
> possible to specify a list of shared stats types to reset?  Perhaps
> there are no real use cases for it, just wanted to mention it anyway
> regarding the fact that it could have benefits to refactor this code
> to use a bitwise operator for its internals with bit flags for each
> type.

I don't see a strong reason to introduce yet-another API when someone
can just call things in a loop. I could recollect a recent analogy - a
proposal to have a way to define multiple custom wait events with a
single function call instead of callers defining in a loop didn't draw
much interest.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-11-02 Thread stepan rutz


Hi Thomas,

you are right of course. Thanks!

I have attached a new version of the patch that supports the syntax like
suggested. The previous patch was insonsistent in style indeed.

explain (analyze, serialize)

and

explain (analyze, serialize binary)

That doesn't make too much of a difference for most scenarios I am
certain. However the the seralize option itself does. Mostly because it
performs the detoasting and that was a trap for me in the past with just
plain analyze.


Eg this scenario really is not too far fetched in a world where people
have large JSONB values.


db1=# create table test(id bigint, val text);

db1=# insert into test(val) select string_agg(s::text, ',') from (select
generate_series(1, 10_000_000) as s) as a1;

now we have a cell that has roughly 80Mb in it. A large detoasting that
will happen in reallife but in explain(analyze).

and then...

db1=# explain (analyze) select * from test;
    QUERY PLAN
---
 Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
time=0.018..0.020 rows=1 loops=1)
 Planning Time: 0.085 ms
 Execution Time: 0.044 ms
(3 rows)

db1=# explain (analyze, serialize) select * from test;
    QUERY PLAN
---
 Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
time=0.023..0.027 rows=1 loops=1)
 Planning Time: 0.077 ms
 Execution Time: 303.281 ms
 Serialized Bytes: 7953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
(4 rows)

db1=#

So the explain(analyze) does not process the ~80 MB in 0.044ms in any
way of course.

Actually I could print the serialized bytes using 1. grouping-separators
(eg 78_888_953) or 2. in the way pg_size_pretty does it.

If doing it the pg_size_pretty way I am uncertain if it would be ok to
query the actual pg_size_pretty function via its (certainly frozen) oid
of 3166 and do OidFunctionCall1(3166...) to invoke it. Otherwise I'd say
it would be nice if the code from that function would be made available
as a utility function for all c-code.  Any suggestions on this topic?

Regards,

/Stepan


On 02.11.23 18:49, Tomas Vondra wrote:

Hi,

On 9/15/23 22:09, stepan rutz wrote:

Hi,

please see a revised version yesterday's mail. The patch attached now
provides the following:

EXPLAIN(ANALYZE,SERIALIZE)

and

EXPLAIN(ANALYZE,SERIALIZEBINARY)


I haven't looked at the patch in detail yet, but this option name looks
a bit strange/inconsistent. Either it should be SERIALIZE_BINARY (to
match other multi-word options), or maybe there should be just SERIALIZE
with a parameter to determine text/binary (like FORMAT, for example).

So we'd do either

 EXPLAIN (SERIALIZE)
 EXPLAIN (SERIALIZE TEXT)

to get serialization to text (which I guess 99% of people will do), or

 EXPLAIN (SERIALIZE BINARY)

to get binary.


regards

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f1d71bc54e..bb75ac0b73 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -26,6 +26,7 @@
 #include "nodes/nodeFuncs.h"
 #include "parser/analyze.h"
 #include "parser/parsetree.h"
+#include "mb/pg_wchar.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
 #include "tcop/tcopprot.h"
@@ -33,6 +34,7 @@
 #include "utils/guc_tables.h"
 #include "utils/json.h"
 #include "utils/lsyscache.h"
+#include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/ruleutils.h"
 #include "utils/snapmgr.h"
@@ -47,7 +49,6 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
 /* Hook for plugins to get control in explain_get_index_name() */
 explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
 
-
 /* OR-able flags for ExplainXMLTag() */
 #define X_OPENING 0
 #define X_CLOSING 1
@@ -154,7 +155,8 @@ static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
-
+static DestReceiver *CreateSerializeDestReceiver(int16 format);
+static uint64 GetNumberOfSerializedBytes(DestReceiver *dest);
 
 /*
  * ExplainQuery -
@@ -192,6 +194,23 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 			es->settings = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "generic_plan") == 0)
 			es->generic = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "serialize") == 0)
+		{
+			es->serialize = true;
+			/* optionally check for BINARY as argument */
+			if (opt->arg)
+			{
+char *p = defGetString(opt);
+if (strcmp(p, "binary") == 0)
+	es->serialize = es->serializebinary = true;
+else
+	ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("unrecognized value for EXPLAIN option \"%s\": \"%s\"",
+opt->defname, p),
+		 parser_errposition(pstate,

Re: Including Doxyfile and Meson script for docs into main source tree

2023-11-02 Thread John Morris

>On the patch itself I'm somewhat unconvinced that it is a good idea or
>long-term maintainable to actually have a kinda random copy of the
>configuration file(!) of an external software (doxygen in this)

Rather than copying the Doxyfile, we could just publish the non-default values. 
As a mostly non-Doxygen user, I found it useful to see the config file in its 
entirety, but I agree it is a bit much.

Would only showing modified values make more sense?

From: Stefan Kaltenbrunner 
Date: Thursday, November 2, 2023 at 2:51 AM
To: Bohdan Mart 
Cc: pgsql-hack...@postgresql.org , 'Bruce 
Momjian' , postg...@coyotebush.net , 
John Morris 
Subject: Re: Including Doxyfile and Meson script for docs into main source tree
Hi Bohdan!

I will see about making th ecurrent doxygen configuration available as a
download again in the next few days. Not sure whether there is an
additional actionable item yet as far as the project from John Morris
goes wrt. the postgresql.org sysadmin team.

The patch proposed is an enhancement of the build process in core
postgresql and I think we would look into utilizing that also for the
"official" build once applied.

I dont think we would want to locally maintain a C-based filter and a
custom build procedure otherwise.

On the patch itself I'm somewhat unconvinced that it is a good idea or
long-term maintainable to actually have a kinda random copy of the
configuration file(!) of an external software (doxygen in this) case in
our code that might need continous updating and maintainance to keep up
with new upstream releases.
As it stands the patch right now is against 1.9.4, with 1.9.1 running on
our installation and 1.9.8 being the latest release...

Stefan

On 30.10.23 14:57, Bohdan Mart wrote:
> Hello, Stefan!
>
> What do you think about this? See quoted message from John Morris.
>
> P.S. Also we would like you to provide currently used Doxyfile, as
> Postgres doxigen's home page no longer have link to download said file.
>
> Regards,
>  Bohdan.
>
> On 14.10.23 21:54, John Morris wrote:
>> Thank you for trying the patch out and commenting on it.
>>
>> I'm starting to think of it as a project. Here's a quick project
>> statement.
>>
>> The purpose is to generate improved  Doxygen output while making
>> maximal use of how Postgres currently does program comments.
>>
>> Thinking in terms of specific steps, and adding to what you have
>> mentioned:
>>
>>   * Configure Doxyfile so it produces output similar to previous output.
>>   * Only build Doxygen output if requested
>>   * Don't compile the filter or configure the Doxyfile  if they aren't
>> needed
>>   * Include contrib in the sources to document
>>   * Provide options for other (non-html) output. (Which ones?)
>>   * Update Postgres documentation to include instructions for creating
>> Doxygen output.
>>   * Mention it in developer guidelines and provide sample code showing a
>> "desired" commenting style.
>>
>> Does that list seem complete? I don't want people to think we're
>> imposing new standards or legislating new commenting styles.  It's
>> more a matter of describing what we currently do, maybe with some
>> minor suggestions for improving.
>>
>>   * John Morris
>>

Re: Simplify xlogreader.c with XLogRec* macros

2023-11-02 Thread Bharath Rupireddy

On Tue, Oct 31, 2023 at 4:12 PM 邱宇航  wrote:
>
> >
> > I thought these can also be rewrite to:
> >
> > if (!XLogRecHasBlockRef(record, block_id))
>
> Oops, I missed that. New version is attached.

+1. Indeed a reasonable change. The attached v2 patch LGTM.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Fix search_path for all maintenance commands

2023-11-02 Thread Jeff Davis

On Tue, 2023-10-31 at 13:16 -0400, Isaac Morland wrote:

> Perhaps the search_path for running a maintenance command should be
> the search_path set for the table owner (ALTER ROLE … SET search_path
> …)?

That's an interesting idea; I hadn't considered that, or at least not
very deeply. I feel like it came up before but I can't remember what
(if anything) was wrong with it.

If we expanded this idea a bit, I could imagine it applying to SECURITY
DEFINER functions as well, and that would make writing SECURITY DEFINER
functions a lot safer.

I was earlier pushing for search_path to be tied to the function (in my
"CREATE FUNCTION ... SEARCH" proposal) on the grounds that the author
(usually) doesn't want the behavior to depend on the caller's
search_path. That proposal didn't go very well because it required
extra DDL.

But if we tie the search_path to the user-switching behavior rather
than the function, that still protects the function author from many
sorts of search_path attacks, because it's either running as the
function author with the function author's search_path; or running as
the invoking user with their search_path. And there aren't a lot of
cases where a function author would want it to run with their
privileges and the caller's search_path.

[ It would still leave open the problem of a SECURITY INVOKER function
in an index expression returning inconsistent results due to a changing
search_path, which would compromise the index structure and any
constraints using that index. But that problem is more bounded, at
least. ]

>  After that, change search_path on function invocation as usual
> rather than having special rules for what happens when a function is
> invoked during a maintenance command.

I don't follow what you mean here.

Regards,
Jeff Davis

Re: Atomic ops for unlogged LSN

2023-11-02 Thread Bharath Rupireddy

On Thu, Nov 2, 2023 at 9:10 AM Nathan Bossart  wrote:
>
> On Wed, Nov 01, 2023 at 09:15:20PM +, John Morris wrote:
> > This is a rebased version . Even though I labelled it “v3”, there should be 
> > no changes from “v2”.
>
> Thanks.  I think this is almost ready, but I have to harp on the
> pg_atomic_read_u64() business once more.  The relevant comment in atomics.h
> has this note:
>
>  * The read is guaranteed to return a value as it has been written by this or
>  * another process at some point in the past. There's however no cache
>  * coherency interaction guaranteeing the value hasn't since been written to
>  * again.
>
> However unlikely, this seems to suggest that CreateCheckPoint() could see
> an old value with your patch.  Out of an abundance of caution, I'd
> recommend changing this to pg_atomic_compare_exchange_u64() like
> pg_atomic_read_u64_impl() does in generic.h.

+1. pg_atomic_read_u64 provides no barrier semantics whereas
pg_atomic_compare_exchange_u64 does. Without the barrier, it might
happen that the value is read while the other backend is changing it.
I think something like below providing full barrier semantics looks
fine to me:

XLogRecPtr ulsn;

pg_atomic_compare_exchange_u64(>unloggedLSN, , 0);
ControlFile->unloggedLSN = ulsn;

> @@ -4635,7 +4629,6 @@ XLOGShmemInit(void)
>
> SpinLockInit(>Insert.insertpos_lck);
> SpinLockInit(>info_lck);
> -   SpinLockInit(>ulsn_lck);
>  }
>
> Shouldn't we do the pg_atomic_init_u64() here?  We can still set the
> initial value in StartupXLOG(), but it might be safer to initialize the
> variable where we are initializing the other shared memory stuff.

I think no one accesses the unloggedLSN in between
CreateSharedMemoryAndSemaphores -> XLOGShmemInit and StartupXLOG.
However, I see nothing wrong in doing
pg_atomic_init_u64(>unloggedLSN, InvalidXLogRecPtr); in
XLOGShmemInit.

> Since this isn't a tremendously performance-sensitive area, IMHO we should
> code defensively to eliminate any doubts about correctness and to make it
> easier to reason about.

Right.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Postgres picks suboptimal index after building of an extended statistics

2023-11-02 Thread Tomas Vondra

On 9/25/23 06:30, Andrey Lepikhov wrote:
> ...
> I can't stop thinking about this issue. It is bizarre when Postgres
> chooses a non-unique index if a unique index gives us proof of minimum
> scan.

That's true, but no one implemented this heuristics. So the "proof of
minimum scan" is merely hypothetical - the optimizer is unaware of it.

> I don't see a reason to teach the clauselist_selectivity() routine to
> estimate UNIQUE indexes. We add some cycles, but it will work with btree
> indexes only.

I'm not sure I understand what this is meant to say. Can you elaborate?
We only allow UNIQUE for btree indexes anyway, so what exactly is the
problem here?

> Maybe to change compare_path_costs_fuzzily() and add some heuristic, for
> example:
> "If selectivity of both paths gives us no more than 1 row, prefer to use
> a unique index or an index with least selectivity."
> 

I don't understand how this would work. What do yo mean by "selectivity
of a path"? AFAICS the problem here is that we estimate a path to return
more rows (while we know there can only be 1, thanks to UNIQUE index).

So how would you know the path does not give us more than 1 row? Surely
you're not proposing compare_path_costs_fuzzily() to do something
expensive like analyzing the paths, or something.

Also, how would it work in upper levels? If you just change which path
we keep, but leave the inaccurate row estimate in place, that may fix
that level, but it's certainly going to confuse later planning steps.

IMHO the problem here is that we produce wrong estimate, so we better
fix that, instead of adding band-aids to other places.

This happens because functional dependencies are very simple type of
statistics - it has some expectations about the input data and also the
queries executed on it. For example it assumes the data is reasonably
homogeneous, so that we can calculate a global "degree".

But the data in the example directly violates that - it has 1000 rows
that are very random (so we'd find no dependencies), and 1000 rows with
perfect dependencies. Hence we end with degree=0.5, which approximates
the dependencies to all data. Not great, true, but that's the price for
simplicity of this statistics kind.

So the simplest solution is to disable dependencies on such data sets.
It's a bit inconvenient/unfortunate we build dependencies by default,
and people may not be aware of there assumptions.

Perhaps we could/should make dependency_degree() more pessimistic when
we find some "violations" of the rule (we intentionally are not strict
about it, because data are imperfect). I don't have a good idea how to
change the formulas, but I'm open to the idea in principle.

The other thing we could do is checking for unique indexes in
clauselist_selectivity, and if we find a match we can just skip the
extended statistics altogether. Not sure how expensive this would be,
but for typical cases (with modest number of indexes) perhaps OK. It
wouldn't work without a unique index, but I don't have a better idea.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Detoasting optionally to make Explain-Analyze less misleading

2023-11-02 Thread Tomas Vondra

Hi,

On 9/15/23 22:09, stepan rutz wrote:
> Hi,
> 
> please see a revised version yesterday's mail. The patch attached now
> provides the following:
> 
> EXPLAIN(ANALYZE,SERIALIZE)
> 
> and
> 
> EXPLAIN(ANALYZE,SERIALIZEBINARY)
> 

I haven't looked at the patch in detail yet, but this option name looks
a bit strange/inconsistent. Either it should be SERIALIZE_BINARY (to
match other multi-word options), or maybe there should be just SERIALIZE
with a parameter to determine text/binary (like FORMAT, for example).

So we'd do either

EXPLAIN (SERIALIZE)
EXPLAIN (SERIALIZE TEXT)

to get serialization to text (which I guess 99% of people will do), or

EXPLAIN (SERIALIZE BINARY)

to get binary.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tab completion for CREATE TABLE ... AS

2023-11-02 Thread Gilles Darold


Hi,


Look like the tab completion for CREATE TABLE ... AS is not proposed.


gilles=# CREATE TABLE test
( OF    PARTITION OF

 The attached patch fix that and also propose the further completion 
after the AS keyword.



gilles=# CREATE TABLE test
( AS    OF    PARTITION OF
gilles=# CREATE TABLE test AS
SELECT  WITH

Adding the patch to current commitfest.


Best regards,

--
Gilles Darold
http://www.darold.net/
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 93742fc6ac..d793a94d6a 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3231,11 +3231,15 @@ psql_completion(const char *text, int start, int end)
 	/* Complete CREATE TABLE  with '(', OF or PARTITION OF */
 	else if (TailMatches("CREATE", "TABLE", MatchAny) ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny))
-		COMPLETE_WITH("(", "OF", "PARTITION OF");
+		COMPLETE_WITH("(", "OF", "PARTITION OF", "AS");
 	/* Complete CREATE TABLE  OF with list of composite types */
 	else if (TailMatches("CREATE", "TABLE", MatchAny, "OF") ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "OF"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_composite_datatypes);
+	/* Complete CREATE TABLE  AS with list of keywords */
+	else if (TailMatches("CREATE", "TABLE", MatchAny, "AS") ||
+			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "AS"))
+		COMPLETE_WITH("SELECT", "WITH");
 	/* Complete CREATE TABLE name (...) with supported options */
 	else if (TailMatches("CREATE", "TABLE", MatchAny, "(*)") ||
 			 TailMatches("CREATE", "UNLOGGED", "TABLE", MatchAny, "(*)"))

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-11-02 Thread Bharath Rupireddy

On Sat, Oct 28, 2023 at 2:22 AM Jeff Davis  wrote:
>
> I think I see what you are saying: WALRead() is at a lower level than
> the XLogReaderRoutine callbacks, because it's used by the .page_read
> callbacks.
>
> That makes sense, but my first interpretation was that WALRead() is
> above the XLogReaderRoutine callbacks because it calls .segment_open
> and .segment_close. To me that sounds like a layering violation, but it
> exists today without your patch.

Right. WALRead() is a common function used by most if not all
page_read callbacks. Typically, the page_read callbacks code has 2
parts - first determine the target/start LSN and second read WAL (via
WALRead() for instance).

> I suppose the question is: should reading from the WAL buffers an
> intentional thing that the caller does explicitly by specific callers?
> Or is it an optimization that should be hidden from the caller?
>
> I tend toward the former, at least for now.

Yes, it's an optimization that must be hidden from the caller.

> I suspect that when we do
> some more interesting things, like replicating unflushed data, we will
> want reading from buffers to be a separate step, not combined with
> WALRead(). After things in this area settle a bit then we might want to
> refactor and combine them again.

As said previously, the new XLogReadFromBuffers() function is generic
and extensible in the way that anyone with a target/start LSN
(corresponding to flushed or written-but-not-yet-flushed WAL) and TLI
can call it to read from WAL buffers. It's just that the patch
currently uses it where it makes sense i.e. in WALRead(). But, it's
usable in, say, a page_read callback reading unflushed WAL from WAL
buffers.

> > If someone wants to read unflushed WAL, the typical way to implement
> > it is to write a new page_read callback
> > read_local_unflushed_xlog_page/logical_read_unflushed_xlog_page or
> > similar without WALRead() but just the new function
> > XLogReadFromBuffers to read from WAL buffers and return.
>
> Then why is it being called from WALRead() at all?

The patch focuses on reading flushed WAL from WAL buffers if
available, not the unflushed WAL at all; that's why it's in WALRead()
before reading from the WAL file using pg_pread().

I'm trying to make a point that the XLogReadFromBuffers() enables one
to read unflushed WAL from WAL buffers (if really wanted for future
features like replicate from WAL buffers as a new opt-in feature to
improve the replication performance).

> > I prefer to keep the partial hit handling as-is just
> > in case:
>
> So a "partial hit" is essentially a narrow race condition where one
> page is read from buffers, and it's valid; and by the time it gets to
> the next page, it has already been evicted (along with the previously
> read page)?
>
> In other words, I think you are describing a case where
> eviction is happening faster than the memcpy()s in a loop, which is
> certainly possible due to scheduling or whatever, but doesn't seem like
> the common case.
>
> The case I'd expect for a partial read is when the startptr points to
> an evicted page, but some later pages in the requested range are still
> present in the buffers.
>
> I'm not really sure whether either of these cases matters, but if we
> implement one and not the other, there should be some explanation.

At any given point of time, WAL buffer pages are maintained as a
circularly sorted array in an ascending order from
OldestInitializedPage to InitializedUpTo (new pages are inserted at
this end). Also, the current patch holds WALBufMappingLock while
reading the buffer pages, meaning, no one can replace the buffer pages
until reading is finished. Therefore, the above two described partial
hit cases can't happen - when reading multiple pages if the first page
is found to be existing in the buffer pages, it means the other pages
must exist too because of the circular and sortedness of the WAL
buffer page array.

Here's an illustration with WAL buffers circular array (idx, LSN) of
size 10 elements with contents as {(0, 160), (1, 170), (2, 180), (3,
90), (4, 100), (5, 110), (6, 120), (7, 130), (8, 140), (9, 150)} and
current InitializedUpTo pointing to page at LSN 180, idx 2.
- Read 6 pages starting from LSN 80. Nothing is read from WAL buffers
as the page at LSN 80 doesn't exist despite other 5 pages starting
from LSN 90 exist.
- Read 6 pages starting from LSN 90. All the pages exist and are read
from WAL buffers.
- Read 6 pages starting from LSN 150. Note that WAL is currently
flushed only up to page at LSN 180 and the callers won't ask for
unflushed WAL read. If a caller asks for an unflushed WAL read
intentionally or unintentionally, XLogReadFromBuffers() reads only 4
pages starting from LSN 150 to LSN 180 and will leave the remaining 2
pages for the caller to deal with. This is the partial hit that can
happen. Therefore, retaining the partial hit code in WALRead() as-is
in the current patch is needed IMV.

> > Yes, I proposed that idea in

Re: [HACKERS] Allow INSTEAD OF DELETE triggers to modify the tuple for RETURNING

2023-11-02 Thread Marko Tiikkaja

On Thu, Nov 2, 2023 at 12:24 PM Shlok Kyal  wrote:
> I went through the CFbot and found that docs build run is giving some
> error (link: https://cirrus-ci.com/task/5712582359646208):
>
> Just wanted to make sure you are aware of the issue.

I am now.  Thanks! :-)  Will try to keep an eye on the builds in the future.

Attached v4 of the patch which should fix the issue.


.m


instead_of_delete_returning_v4.patch
Description: Binary data

Re: Three commit tips

2023-11-02 Thread Nathan Bossart

On Thu, Nov 02, 2023 at 11:22:38AM -0400, Bruce Momjian wrote:
> First, I have been hesitant to ascribe others as patch authors if I
> heavily modified a doc patch because I didn't want them blamed for any
> mistakes I made.  However, I also want to give them credit, so I decided
> I would annotate commits with "partial", e.g.:
> 
>   Author:  Andy Jackson (partial)

I tend to use "Co-authored-by" for this purpose.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Don't pass NULL pointer to strcmp().

2023-11-02 Thread Tom Lane

Looking closer, I realized that my proposed change in RestoreGUCState
is unnecessary, because guc_free() is already permissive about being
passed a NULL.  That leaves us with one live bug in
get_explain_guc_options, two probably-unreachable hazards in
check_GUC_init and write_one_nondefault_variable, and two API changes
in GetConfigOption and GetConfigOptionResetString.  I'm dubious that
back-patching the API changes would be a good idea, so I applied
that to HEAD only.  The rest I backpatched as far as relevant.

Thanks for the report!

regards, tom lane

Three commit tips

2023-11-02 Thread Bruce Momjian

I have three suggestions on committing that I thought would be helpful
to the hacker audience.

First, I have been hesitant to ascribe others as patch authors if I
heavily modified a doc patch because I didn't want them blamed for any
mistakes I made.  However, I also want to give them credit, so I decided
I would annotate commits with "partial", e.g.:

Author:  Andy Jackson (partial)

Second, I have found the git diff option --word-diff=color to be very
helpful and I have started using it, especially for doc patches where
the old text is in red and the new text is in green.  Posting patches in
that format is probably not helpful though.

Third, I have come up with the following shell script to test for proper
pgindentation, which I run automatically before commit:

# 
https://www.postgresql.org/message-id/CAGECzQQL-Dbb%2BYkid9Dhq-491MawHvi6hR_NGkhiDE%2B5zRZ6vQ%40mail.gmail.com
src/tools/pgindent/pgindent $(git diff --name-only --diff-filter=ACMR) 
> /tmp/$$

if [ \( "$(wc -l < /tmp/$$)" -eq 1 -a "$(expr match "$(cat /tmp/$$)" 
"No files to process\>")" -eq 0 \) -o \
 "$(wc -l < /tmp/$$)" -gt 1 ]
thenecho "pgindent failure in master branch, exiting." 1>&2
cat /tmp/$$
exit 1
fi

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Tab completion regression test failed on illumos

2023-11-02 Thread Japin Li



On Thu, 02 Nov 2023 at 22:23, Tom Lane  wrote:
> Japin Li  writes:
>> It seems the 'SEL\t' is converted to 'SEL ' which is "SEL" with 5 spaces.
>
> That would be plausible if readline were disabled, or otherwise
> not functioning.
>

I think this might be a bug comes from Illumos pseudo-tty. I can reproduce
this by using pseudo-tty on Illumos.

Here is a simple test case:

$ cat pseudo-tty.c
#define _XOPEN_SOURCE 600

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define DEV_PTMX"/dev/ptmx"

int
main(void)
{
int ptm_fd;
pid_t pid;
char *pts_name;

ptm_fd = open(DEV_PTMX, O_RDWR);
grantpt(ptm_fd);
unlockpt(ptm_fd);
pts_name = ptsname(ptm_fd);

pid = fork();
if (pid == -1) {
fprintf(stderr, "could not fork a new process: %m\n");
close(ptm_fd);
return -1;
} else if (pid == 0) {
int pts_fd;

close(ptm_fd);
pts_fd = open(pts_name, O_RDWR);
write(pts_fd, "SEL\tH", 5);
close(pts_fd);
} else {
int status;
char buffer[512] = { 0 };
ssize_t bytes;

bytes = read(ptm_fd, buffer, sizeof(buffer));
printf("%ld: '%s'\n", bytes, buffer);
waitpid(pid, , 0);
close(ptm_fd);
}

return 0;
}

On IllumsOS
$ gcc -o pseudo-tty pseudo-tty.c
$ ./pseudo-tty
9: 'SEL H'

On Ubuntu
$ gcc -o pseudo-tty pseudo-tty.c
$ ./pseudo-tty
5: 'SEL H'

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Re: CRC32C Parallel Computation Optimization on ARM

2023-11-02 Thread Nathan Bossart

On Thu, Nov 02, 2023 at 06:17:20AM +, Xiang Gao wrote:
> After reading the discussion, I understand that in order to avoid performance
> regression in some instances, we need to try our best to avoid runtime checks.
> I don't know if I understand it correctly.

The idea is that we don't want to start forcing runtime checks on builds
where we aren't already doing runtime checks.  IOW if the compiler can use
the ARMv8 CRC instructions with the default compiler flags, we should only
use vmull_p64() if it can also be used with the default compiler flags.  I
suspect this limitation sounds worse than it actually is in practice.  The
vast majority of the buildfarm uses runtime checks, and at least some of
the platforms that don't, such as the Apple M-series machines, seem to
include +crypto by default.

Of course, if a compiler picks up +crc but not +crypto in its defaults, we
could lose the vmull_p64() optimization on that platform.  But as noted in
the other thread, we can revisit if these kinds of hypothetical situations
become reality.

> Could you please give me some suggestions about how to refine this patch?

Of course.  I think we'll ultimately want to independently check for the
availability of the new instruction like we do for the other sets of
intrinsics:

PGAC_ARMV8_VMULL_INTRINSICS([])
if test x"$pgac_armv8_vmull_intrinsics" != x"yes"; then
PGAC_ARMV8_VMULL_INTRINSICS([-march=armv8-a+crypto])
fi

My current thinking is that we'll want to add USE_ARMV8_VMULL and
USE_ARMV8_VMULL_WITH_RUNTIME_CHECK and use those to decide exactly what to
compile.  I'll admit I haven't fully thought through every detail yet, but
I'm cautiously optimistic that we can avoid too much complexity in the
autoconf/meson scripts.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Tab completion regression test failed on illumos

2023-11-02 Thread Tom Lane

Japin Li  writes:
> It seems the 'SEL\t' is converted to 'SEL ' which is "SEL" with 5 spaces.

That would be plausible if readline were disabled, or otherwise
not functioning.

regards, tom lane

Re: BUG #18167: cannot create partitioned tables when default_tablespace is set

2023-11-02 Thread Marius RAICU

Hello Alvaro, all,

I have done some research regarding this bug.

Basically, we forbid the creation of partitioned tables and indexes if a 
default_tablespace is specified in postgresql.conf.

In tablespace.c, the comment says:

"Don't allow specifying that when creating a partitioned table, however, 
since the result is confusing."

I did not see why the result is confusing.

I just disabled the checks in tablespace.c, tablecmds.c and indexcmds.c 
and now it works.

I modified the expected result in tests, and the tests are passing too.

See the attached patch.

Regards,

Marius Raicu

On 10/25/23 11:58, tender wang wrote:

Alvaro Herrera  于2023年10月25日周三 17:41写道：

On 2023-Oct-24, PG Bug reporting form wrote:

> marius@[local]:5434/postgres=# show default_tablespace;
>  default_tablespace
> 
>  tblspc1
> (1 row)
>
> marius@[local]:5434/postgres=# create tablespace tblspc1 location
> '/home/marius/pgcode/tblspc1';
> CREATE TABLESPACE
> marius@[local]:5434/postgres=# create database test tablespace
tblspc1;
> CREATE DATABASE
> marius@[local]:5434/postgres=# \c test
> You are now connected to database "test" as user "marius".
> marius@[local]:5434/test=# create table toto(id numeric)
partition by
> list(id);
> ERROR:  cannot specify default tablespace for partitioned relations

Oh, so the problem here is that *both* default_tablespace and the
database's tablespace are set, and then a partitioned table creation
fails when it doesn't specify any tablespace?  That indeed sounds
like a
bug.  I'll have a look, thanks.  I'm surprised it took so long for
this
to be reported.

Oh, interesting issue!
I found another two case:
First: default_tablespace not set and create part rel failed
postgres=# create tablespace tbsp3 location '/tender/pgsql/tbsp3';
CREATE TABLESPACE
postgres=# create database test3 tablespace tbsp3;
CREATE DATABASE
postgres=# \c test3
You are now connected to database "test3" as user "gpadmin".
test3=# show default_tablespace ;
 default_tablespace

(1 row)

test3=# create table part1(a int) partition by list(a) tablespace tbsp3;
ERROR:  cannot specify default tablespace for partitioned relations

Second: default_tablespace and database's tablespace both set, but 
part rel created

test3=# set default_tablespace = tbsp2;
SET
test3=# create table part1(a int) partition by list(a);
CREATE TABLE

I'm not sure if the above two cases are a bug. If the document could 
provide detailed explanations, that would be great.

-- 
Álvaro Herrera               48°01'N 7°57'E  —

https://www.EnterpriseDB.com/ 
"Someone said that it is at least an order of magnitude more work
to do
production software than a prototype. I think he is wrong by at least
an order of magnitude."                              (Brian Kernighan)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c160d8a301..afba303e13 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -758,10 +758,12 @@ DefineIndex(Oid tableId,
 	if (stmt->tableSpace)
 	{
 		tablespaceId = get_tablespace_oid(stmt->tableSpace, false);
+		/*
 		if (partitioned && tablespaceId == MyDatabaseTableSpace)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("cannot specify default tablespace for partitioned relations")));
+		*/
 	}
 	else
 	{
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 416a98e7ce..1a50c9c479 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -790,11 +790,12 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	if (stmt->tablespacename)
 	{
 		tablespaceId = get_tablespace_oid(stmt->tablespacename, false);
-
+   /*
 		if (partitioned && tablespaceId == MyDatabaseTableSpace)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("cannot specify default tablespace for partitioned relations")));
+		*/
 	}
 	else if (stmt->partbound)
 	{
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 13b0dee146..f73c6a1286 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1133,9 +1133,7 @@ check_default_tablespace(char **newval, void **extra, GucSource source)
  * GetDefaultTablespace -- get the OID of the current default tablespace
  *
  * Temporary objects have different default tablespaces, hence the
- * relpersistence parameter must be specified.  Also, for partitioned tables,
- * we disallow specifying the database default, so that needs to be specified
- * too.
+ * relpersistence parameter must be specified.
  *
  * May return InvalidOid to indicate "use the database's default tablespace".
  *
@@ -1172,16 +1170,15 @@ GetDefaultTablespace(char relpersistence, bool

Re: Extract numeric filed in JSONB more effectively

2023-11-02 Thread John Naylor

On Wed, Nov 1, 2023 at 9:18 AM Chapman Flack  wrote:
> So, it would not have been my choice to assign RfC status before getting to a 
> resolution on that.

It's up to the reviewer (here Chapman), not the author, to decide
whether to set it to RfC. I've set the status to "needs review".

Re: Document efficient self-joins / UPDATE LIMIT techniques.

2023-11-02 Thread Laurenz Albe

On Tue, 2023-10-31 at 14:12 -0400, Corey Huinker wrote:
> 
> 
> > About the SELECT example:
> > -
> > 
> > That example belongs to UPDATE, I'd say, because that is the main
> > operation.
> 
> I'm iffy on that suggestion. A big part of putting it in SELECT was the fact
> that it shows usage of SKIP LOCKED and FOR UPDATE.

I can accept that.

> 
> > About the UPDATE example:
> > -
> > 
> > I think that could go, because it is pretty similar to the previous
> > one.  You even use ctid in both examples.
> 
> It is similar, but the idea here is to aid in discovery. A user might miss the
> technique for update if it's only documented in delete, and even if they did 
> see
> it there, they might not realize that it works for both UPDATE and DELETE.
> We could make reference links from one to the other, but that seems like extra
> work for the reader.

I am talking about the similarity between the SELECT and the UPDATE example.
I don't agree with bloating the documentation with redundant examples just
to save a user a click.

I like the idea of a link. Perhaps:

  If you need to perform a large UPDATE in batches to avoid excessive bloat,
  deadlocks or to reduce the load on the server, look at the example in .

Other observations:

  @@ -234,6 +234,35 @@ DELETE FROM films
  In some cases the join style is easier to write or faster to
  execute than the sub-select style.
 
  +  
  +   In situations where a single operation would consume too many resources,
  +   either causing the operation to fail or negatively impacting other 
workloads,
  +   it may be desirable to break up a large DELETE into
  +   multiple separate commands. While doing this will actually increase the
  +   total amount of work performed, it can break the work into chunks that 
have
  +   a more acceptable impact on other workloads.  The
  +   SQL standard does
  +   not define a LIMIT clause for DELETE
  +   operations, but it is possible get the equivalent functionality through 
the
  +   USING clause to a
  +   Common Table Expression which 
identifies
  +   a subset of rows to be deleted, locks those rows, and returns their system
  +   column ctid values:

I don't think that reducing the load on the server is such a great use case
that we should recommend it as "best practice" in the documentation (because,
as your patch now mentions, it doesn't reduce the overall load).

I also don't think we need a verbal description of what the following query 
does.

How about something like:

"If you have to delete lots of rows, it can make sense to perform the operation
 in several smaller batches to reduce the risk of deadlocks.  The
 SQL standard does
 not define a LIMIT clause for DELETE,
 but it is possible to achieve a similar effect with a self-join on
 the system column ctid:"

  +
  +WITH delete_batch AS (
  +  SELECT l.ctid
  +  FROM user_logs AS l
  +  WHERE l.status = 'archived'
  +  ORDER BY l.creation_date
  +  LIMIT 1
  +  FOR UPDATE
  +)
  +DELETE FROM user_logs AS ul
  +USING delete_branch AS del
  +WHERE ul.ctid = del.ctid;
  +
  +  This allows for flexible search criteria within the CTE and an efficient 
self-join.
  +  

The last sentence is redundant, I'd say.

But you could add:

"An added benefit is that by using an ORDER BY clause in
 the subquery, you can determine the order in which the rows will be locked
 and deleted, which will prevent deadlocks with other statements that lock
 the rows in the same order."

But if you do that, you had better use "ORDER BY id" or something else that
looks more like a unique column.

--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1679,6 +1679,30 @@ SELECT * FROM (SELECT * FROM mytable FOR UPDATE) ss 
WHERE col1 = 5;
 condition is not textually within the sub-query.


+   
+In cases where a DML operation involving many rows

I think we should avoid using DML.  Beginner might not know it, and it is
not an index term.  My suggestion is "data modification statement/operation".

+must be performed, and that table experiences numerous other simultaneous
+DML operations, a FOR UPDATE clause
+used in conjunction with SKIP LOCKED can be useful for
+performing partial DML operations:
+
+
+WITH mods AS (
+SELECT ctid FROM mytable
+WHERE status = 'active' AND retries > 10
+ORDER BY id FOR UPDATE SKIP LOCKED
+)
+UPDATE mytable SET status = 'failed'
+FROM mods WHERE mytable.ctid = mods.ctid;
+
+
+This allows the DML operation to be performed in parts, 
avoiding locking,
+until such time as the set of rows that remain to be modified is small 
enough

"until such time as" does not sound English to me.  "Until the number of rows 
that remain"
would be better, in my opinion.

+that the locking will not affect overall performance, at which point the 
same

"that the locking" --> "that locking them"

+statement can be issued without the SKIP LOCKED clause 
to

Re: Explicitly skip TAP tests under Meson if disabled

2023-11-02 Thread Peter Eisentraut


On 30.10.23 10:12, Tom Lane wrote:

+1 for counting such tests as "skipped" in the summary.  -1 for
emitting a message per skipped test.  If I'm intentionally not
running those tests, that would be very annoying noise, and
potentially would obscure messages I actually need to see.


In my usage, those messages only show up in the logs, not during a 
normal test run.  This is similar to other skip messages, like "skipped 
on Windows" or "skipped because LDAP not enabled" etc.

Re: Statistics Import and Export

2023-11-02 Thread Tomas Vondra

On 11/2/23 06:01, Corey Huinker wrote:
> 
> 
> Maybe I just don't understand, but I'm pretty sure ANALYZE does not
> derive index stats from column stats. It actually builds them from the
> row sample.
> 
> 
> That is correct, my error.
>  
> 
> 
> > * now support extended statistics except for MCV, which is currently
> > serialized as an difficult-to-decompose bytea field.
> 
> Doesn't pg_mcv_list_items() already do all the heavy work?
> 
> 
> Thanks! I'll look into that.
> 
> The comment below in mcv.c made me think there was no easy way to get
> output.
> 
> /*
>  * pg_mcv_list_out      - output routine for type pg_mcv_list.
>  *
>  * MCV lists are serialized into a bytea value, so we simply call byteaout()
>  * to serialize the value into text. But it'd be nice to serialize that into
>  * a meaningful representation (e.g. for inspection by people).
>  *
>  * XXX This should probably return something meaningful, similar to what
>  * pg_dependencies_out does. Not sure how to deal with the deduplicated
>  * values, though - do we want to expand that or not?
>  */
> 

Yeah, that was the simplest output function possible, it didn't seem
worth it to implement something more advanced. pg_mcv_list_items() is
more convenient for most needs, but it's quite far from the on-disk
representation.

That's actually a good question - how closely should the exported data
be to the on-disk format? I'd say we should keep it abstract, not tied
to the details of the on-disk format (which might easily change between
versions).

I'm a bit confused about the JSON schema used in pg_statistic_export
view, though. It simply serializes stakinds, stavalues, stanumbers into
arrays ... which works, but why not to use the JSON nesting? I mean,
there could be a nested document for histogram, MCV, ... with just the
correct fields.

  {
...
histogram : { stavalues: [...] },
mcv : { stavalues: [...], stanumbers: [...] },
...
  }

and so on. Also, what does TRIVIAL stand for?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Incorrect file reference in comment in procarray.c

2023-11-02 Thread Daniel Gustafsson

> On 2 Nov 2023, at 13:40, Etsuro Fujita  wrote:

> Attached is a small patch for that: s/heapam_visibility.c/snapmgr.c/.

No objections to the patch, the change is correct.  However, with git grep and
ctags and other ways of discovery I wonder if we're not better off avoiding
such references to filenames which are prone to going stale (and do from time
to time).

--
Daniel Gustafsson

Incorrect file reference in comment in procarray.c

2023-11-02 Thread Etsuro Fujita

Hi,

While working on something else, I noticed $SUBJECT: commit b7eda3e0e
moved XidInMVCCSnapshot() from tqual.c into snapmgr.c, but follow-up
commit c91560def updated this reference incorrectly:

@@ -1498,7 +1498,7 @@ GetMaxSnapshotSubxidCount(void)
  * information may not be available.  If we find any overflowed subxid arrays,
  * we have to mark the snapshot's subxid data as overflowed, and extra work
  * *may* need to be done to determine what's running (see XidInMVCCSnapshot()
- * in tqual.c).
+ * in heapam_visibility.c).

Attached is a small patch for that: s/heapam_visibility.c/snapmgr.c/.

Best regards,
Etsuro Fujita


fix-file-reference-in-comment.patch
Description: Binary data

Re: Force the old transactions logs cleanup even if checkpoint is skipped

2023-11-02 Thread Shlok Kyal

Hi,

I went through the Cfbot and saw that some test are failing for it
(link: https://cirrus-ci.com/task/4631357628874752):

test: postgresql:recovery / recovery/019_replslot_limit

# test failed
--- stderr ---
# poll_query_until timed out executing this query:
# SELECT '0/15000D8' <= replay_lsn AND state = 'streaming'
#  FROM pg_catalog.pg_stat_replication
#  WHERE application_name IN ('standby_1', 'walreceiver')
# expecting this output:
# t
# last actual query output:
#
# with stderr:
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 7.

I tried to test it locally and this test is timing out in my local
machine as well.

Thanks
Shlok Kumar Kyal

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

2023-11-02 Thread Richard Guo

On Thu, Nov 2, 2023 at 3:19 PM David Rowley  wrote:

> I'm not yet seeing any failures in the buildfarm, so don't really want
> to push a fix for this one if there are going to be a few more
> unstable ones to fix.  I may just hold off a while to see.


It seems that the test is still not stable on 32-bit machines even after
4b14e18714.  I see the following plan diff on cfbot [1].

--- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out
2023-11-02 11:35:12.016196978 +
+++
/tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out
2023-11-02 11:42:09.092242808 +
@@ -4022,24 +4022,21 @@
 -- subquery using stable function (can't be sent to remote)
 PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN
(SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date)
ORDER BY c1;
 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20);
-QUERY PLAN
---
- Sort
+QUERY PLAN
+--
+ Nested Loop Semi Join
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-   Sort Key: t1.c1
-   ->  Nested Loop Semi Join
+   Join Filter: (t2.c3 = t1.c3)
+   ->  Foreign Scan on public.ft1 t1
  Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Join Filter: (t2.c3 = t1.c3)
- ->  Foreign Scan on public.ft1 t1
-   Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7,
t1.c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM
"S 1"."T 1" WHERE (("C 1" < 20))
- ->  Materialize
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S
1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1" ASC NULLS LAST
+   ->  Materialize
+ Output: t2.c3
+ ->  Foreign Scan on public.ft2 t2
Output: t2.c3
-   ->  Foreign Scan on public.ft2 t2
- Output: t2.c3
- Filter: (date(t2.c4) = '01-17-1970'::date)
- Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C
1" > 10))
-(15 rows)
+   Filter: (date(t2.c4) = '01-17-1970'::date)
+   Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" >
10))
+(12 rows)

[1]
https://api.cirrus-ci.com/v1/artifact/task/5727898984775680/testrun/build-32/testrun/postgres_fdw/regress/regression.diffs

Thanks
Richard

Commitfest 2023-11 has started

2023-11-02 Thread John Naylor

Any new patches will now need to be submitted to the January
commitfest. (Normally we'd give advance notice of that change, but
we're a bit behind this time.)

I have started going through the patch entries looking for out-of-date
statuses and filling in missing author fields. Currently we're at:
Needs review: 210. Waiting on Author: 42. Ready for Committer: 29.
Committed: 55. Withdrawn: 10. Returned with Feedback: 1. Total: 347.

Next week I will also look at patches new to this CF and try to
identify any that could be either committed or returned quickly.

Now is a good time to check if your patch still applies and passes
tests. http://cfbot.cputube.org/ shows 81 patches needing rebase. I
don't know what is usual, but that seems like a pretty large number,
so I won't spam the list requesting rebase on each thread. Instead I
plan to do so on a case-by-case basis where I think a patch has some
momentum and needs a little push.

If you have submitted a patch this cycle and have not yet reviewed a
patch, we encourage you to sign up to do so. If you actively
reviewing, we are grateful! We perennially have plenty of code, but a
shortage of good review.

--
John Naylor

Re: pg_upgrade and logical replication

2023-11-02 Thread Amit Kapila

On Thu, Nov 2, 2023 at 3:41 PM vignesh C  wrote:
>
> I have slightly modified it now and also made it consistent with the
> replication slot upgrade, but I was not sure if we need to add
> anything more. Let me know if anything else needs to be added. I will
> add it.
>

I think it is important for users to know how they upgrade their
multi-node setup. Say a two-node setup where replication is working
both ways (aka each node has both publications and subscriptions),
similarly, how to upgrade, if there are multiple nodes involved?

One more thing I was thinking about this patch was that here unlike
the publication's slot information, we can't ensure with origin's
remote_lsn that all the WAL is received and applied before allowing
the upgrade. I can't think of any problem at the moment due to this
but still a point worth giving a thought.

-- 
With Regards,
Amit Kapila.

Re: Support run-time partition pruning for hash join

2023-11-02 Thread Richard Guo

On Tue, Aug 29, 2023 at 6:41 PM Richard Guo  wrote:

> So it seems that the new costing logic is quite crude and tends to be
> very conservative, but it can help avoid the large overhead in the worst
> cases.  I think this might be a good start to push this patch forward.
>
> Any thoughts or comments?
>

I rebased this patch over the latest master.  Nothing changed except
that I revised the new added test case to make it more stable.

However, the cfbot indicates that there are test cases that fail on
FreeBSD [1] (no failure on other platforms).  So I set up a FreeBSD-13
locally but just cannot reproduce the failure.  I must be doing
something wrong.  Can anyone give me some hints or suggestions?

FYI. The failure looks like:

 explain (costs off)
   select p2.a, p1.c from permtest_parent p1 inner join permtest_parent p2
   on p1.a = p2.a and left(p1.c, 3) ~ 'a1$';
- QUERY PLAN
-
- Hash Join
-   Hash Cond: (p2.a = p1.a)
-   ->  Seq Scan on permtest_grandchild p2
-   ->  Hash
- ->  Seq Scan on permtest_grandchild p1
-   Filter: ("left"(c, 3) ~ 'a1$'::text)
-(6 rows)
-
+ERROR:  unrecognized node type: 1130127496

[1]
https://api.cirrus-ci.com/v1/artifact/task/5334808075698176/testrun/build/testrun/regress/regress/regression.diffs

Thanks
Richard

v3-0001-Support-run-time-partition-pruning-for-hash-join.patch
Description: Binary data

Re: [HACKERS] Allow INSTEAD OF DELETE triggers to modify the tuple for RETURNING

2023-11-02 Thread Shlok Kyal

Hi,

On Thu, 19 Oct 2023 at 16:35, Marko Tiikkaja  wrote:
>
> Hi,
>
> Thank you for the feedback.
>
> Apparently it took me six years, but I've attached the latest version
> of the patch which I believe addresses all issues.  I'll add it to the
> open commitfest.
>
>
> .m

I went through the CFbot and found that docs build run is giving some
error (link: https://cirrus-ci.com/task/5712582359646208):
[07:37:19.769] trigger.sgml:266: parser error : Opening and ending tag
mismatch: command line 266 and COMMAND
[07:37:19.769] DELETE operations, INSTEAD
OF
[07:37:19.769] ^
[07:37:19.769] trigger.sgml:408: parser error : Opening and ending tag
mismatch: para line 266 and sect1
[07:37:19.769] 
[07:37:19.769] ^
[07:37:19.769] trigger.sgml:1034: parser error : Opening and ending
tag mismatch: sect1 line 266 and chapter
[07:37:19.769] 
[07:37:19.769] ^
[07:37:19.769] trigger.sgml:1035: parser error : chunk is not well balanced
[07:37:19.769]
[07:37:19.769] ^
[07:37:19.769] postgres.sgml:222: parser error : Failure to process
entity trigger
[07:37:19.769] 
[07:37:19.769] ^
[07:37:19.769] postgres.sgml:222: parser error : Entity 'trigger' not defined
[07:37:19.769] 
[07:37:19.769] ^
[07:37:19.956] make[2]: *** [Makefile:73: postgres-full.xml] Error 1
[07:37:19.957] make[1]: *** [Makefile:8: all] Error 2
[07:37:19.957] make: *** [Makefile:16: all] Error 2
[07:37:19.957]
[07:37:19.957] real 0m0.529s
[07:37:19.957] user 0m0.493s
[07:37:19.957] sys 0m0.053s
[07:37:19.957]
[07:37:19.957] Exit status: 2

Just wanted to make sure you are aware of the issue.

Thanks
Shlok Kumar Kyal

Re: pg_upgrade and logical replication

2023-11-02 Thread vignesh C

On Wed, 1 Nov 2023 at 10:13, Michael Paquier  wrote:
>
> On Mon, Oct 30, 2023 at 03:05:09PM +0530, vignesh C wrote:
> > The patch was not applying because of recent commits. Here is a
> > rebased version of the patches.
>
> + * We don't want the launcher to run while upgrading because it may start
> + * apply workers which could start receiving changes from the publisher
> + * before the physical files are put in place, causing corruption on the
> + * new cluster upgrading to, so setting max_logical_replication_workers=0
> + * to disable launcher.
>   */
>  if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> -appendPQExpBufferStr(, " -c max_slot_wal_keep_size=-1");
> +appendPQExpBufferStr(, " -c max_slot_wal_keep_size=-1 -c 
> max_logical_replication_workers=0");
>
> At least that's consistent with the other side of the coin with
> publications.  So 0001 looks basically OK seen from here.
>
> The indentation of 0002 seems off in a few places.

I fixed wherever possible for documentation and also ran pgindent and
pgperltidy.

> +
> + Verify that all the subscription tables in the old subscriber are in
> + r (ready) state. Setup the
> +  subscriber
> + configurations in the new subscriber.
> [...]
> +
> + There is a prerequisites that all the subscription tables should be in
> + r (ready) state for
> + pg_upgrade to be able to upgrade the
> + subscriber. If this is not met an error will be reported.
> +
>
> This part is repeated.

Removed the duplicate contents.

> Globally, this documentation addition does not
> seem really helpful for the end-user as it describes the checks that
> are done during the upgrade.  Shouldn't this part of the docs,
> similarly to the publication part, focus on providing a check list of
> actions to take to achieve a clean upgrade, with a list of commands
> and configurations required?  The good part is that information about
> what's copied is provided (pg_subscription_rel and the origin status),
> still this could be improved.

I have slightly modified it now and also made it consistent with the
replication slot upgrade, but I was not sure if we need to add
anything more. Let me know if anything else needs to be added. I will
add it.

> +
> + Enable the subscriptions by executing
> + ALTER SUBSCRIPTION ... 
> ENABLE.
> +
>
> This is something users can act on, but how does this operation help
> with the upgrade?  Should this happen for all the descriptions
> subscriptions?  Or you mean that this is something that needs to be
> run after the upgrade?

The subscriptions will be upgraded in disabled mode. Users must enable
the subscriptions after the upgrade is completed. I have mentioned the
same to avoid confusion.

> +
> + Create all the new tables that were created in the publication and
> + refresh the publication by executing
> + ALTER SUBSCRIPTION ... 
> REFRESH PUBLICATION.
> +
>
> What does "new tables" refer to in this case?  Are you referring to
> the case where new relations have been added on a publication node
> after an upgrade and need to be copied?  Does one need to DISABLE the
> subscriptions on the subscriber node before running the upgrade, or is
> a REFRESH enough?  The test only uses a REFRESH, so the docs and the
> code don't entirely agree with each other.

Yes, "new tables" refers to the new tables created in the publisher
when the upgrade is in progress. No need to disable the subscription
before upgrade, during upgrade the subscriptions will be copied in
disabled mode, they should be enabled after the upgrade. Mentioned all
these accordingly.

> +  
> +   For upgradation of the subscriptions, all the subscription tables should 
> be
> +   in r (ready) state, or else the
> +   pg_upgrade run will error.
> +  
>
> "Upgradation"?

I have removed this content since we have added this in the
prerequisite section now.

> +# Set tables to 'i' state
> +$old_sub->safe_psql(
> +   'postgres',
> +   "UPDATE pg_subscription_rel
> +   SET srsubstate = 'i' WHERE srsubstate = 'r'");
>
> I am not sure that doing catalog manipulation in the TAP test itself
> is a good idea, because this can finish by being unpredictible in the
> long-term for the test maintenance.  I think that this portion of the
> test should just be removed.  poll_query_until() or wait queries
> making sure that all the relations are in the state we want them to be
> before the beginning of the upgrade is enough in terms of test
> coverag, IMO.

Changed the scenario by using primary key failure.

> +$result = $new_sub->safe_psql('postgres',
> +   "SELECT remote_lsn FROM pg_replication_origin_status");
>
> This assumes one row, but perhaps this had better do a match based on
> external_id and/or local_id?

Modified

The attached v11 version patch has the changes for the same.

Regards,
Vignesh
From 768cba6d701913660b7e930d68e8b5497aae499f Mon Sep 17

[PATCH] Add additional extended protocol commands to psql: \parse and \bindx

2023-11-02 Thread Anthonin Bonnefoy

Hi all!

Currently, only unnamed prepared statements are supported by psql with the
\bind command and it's not possible to create or use named prepared statements
through extended protocol.

This patch introduces 2 additional commands: \parse and \bindx.
\parse allows us to issue a Parse message to create a named prepared statement
through extended protocol.
\bindx allows to bind and execute a named prepared statement through extended
protocol.

The main goal is to provide more ways to test extended protocol in
regression tests
similarly to what \bind is doing.

Regards,
Anthonin


0001-psql-Add-support-for-prepared-stmt-with-extended-pro.patch
Description: Binary data

Re: Including Doxyfile and Meson script for docs into main source tree

2023-11-02 Thread Stefan Kaltenbrunner


Hi Bohdan!

I will see about making th ecurrent doxygen configuration available as a 
download again in the next few days. Not sure whether there is an 
additional actionable item yet as far as the project from John Morris 
goes wrt. the postgresql.org sysadmin team.


The patch proposed is an enhancement of the build process in core 
postgresql and I think we would look into utilizing that also for the 
"official" build once applied.


I dont think we would want to locally maintain a C-based filter and a 
custom build procedure otherwise.


On the patch itself I'm somewhat unconvinced that it is a good idea or 
long-term maintainable to actually have a kinda random copy of the 
configuration file(!) of an external software (doxygen in this) case in 
our code that might need continous updating and maintainance to keep up 
with new upstream releases.
As it stands the patch right now is against 1.9.4, with 1.9.1 running on 
our installation and 1.9.8 being the latest release...



Stefan


On 30.10.23 14:57, Bohdan Mart wrote:

Hello, Stefan!

What do you think about this? See quoted message from John Morris.

P.S. Also we would like you to provide currently used Doxyfile, as 
Postgres doxigen's home page no longer have link to download said file.


Regards,
 Bohdan.

On 14.10.23 21:54, John Morris wrote:

Thank you for trying the patch out and commenting on it.

I'm starting to think of it as a project. Here's a quick project 
statement.


The purpose is to generate improved  Doxygen output while making 
maximal use of how Postgres currently does program comments.


Thinking in terms of specific steps, and adding to what you have 
mentioned:


  * Configure Doxyfile so it produces output similar to previous output.
  * Only build Doxygen output if requested
  * Don't compile the filter or configure the Doxyfile  if they aren't
    needed
  * Include contrib in the sources to document
  * Provide options for other (non-html) output. (Which ones?)
  * Update Postgres documentation to include instructions for creating
    Doxygen output.
  * Mention it in developer guidelines and provide sample code showing a
    "desired" commenting style.

Does that list seem complete? I don't want people to think we're 
imposing new standards or legislating new commenting styles.  It's 
more a matter of describing what we currently do, maybe with some 
minor suggestions for improving.


  * John Morris

Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-11-02 Thread Amit Kapila

On Tue, Oct 31, 2023 at 2:25 AM David Rowley  wrote:
>
> On Mon, 30 Oct 2023 at 23:48, Amit Kapila  wrote:
> >
> > On Fri, Oct 27, 2023 at 3:23 AM David Rowley  wrote:
> > > * parallel.c in HandleParallelMessages():
> > > * applyparallelworker.c in HandleParallelApplyMessages():
> >
> > Both the above calls are used to handle ERROR/NOTICE messages from
> > parallel workers as you have also noticed. The comment atop
> > initReadOnlyStringInfo() clearly states that it is used in the
> > performance-critical path. So, is it worth changing these places? In
> > the future, this may pose the risk of this API being used
> > inconsistently.
>
> I'm ok to leave those ones out.
>

The other two look good to me.

-- 
With Regards,
Amit Kapila.

Re: Commitfest manager November 2023

2023-11-02 Thread John Naylor

On Thu, Nov 2, 2023 at 4:35 PM Daniel Gustafsson  wrote:
>
> > On 1 Nov 2023, at 07:33, John Naylor  wrote:
> >
> > I didn't see any recent mentions in the archives, so I'll volunteer to
> > be CF manager for 2023-11.
>
> You probably need some extra admin privileges on your account for accessing 
> the
> CFM functionality, in the meantime I've switched the 202311 CF to InProgress
> and marked 202401 as Open.

Thanks for taking care of that!

(Per the wiki, I requested admin privs on pgsql-www, so still awaiting that)

Re: Commitfest manager November 2023

2023-11-02 Thread Daniel Gustafsson

> On 1 Nov 2023, at 07:33, John Naylor  wrote:
> 
> I didn't see any recent mentions in the archives, so I'll volunteer to
> be CF manager for 2023-11.

You probably need some extra admin privileges on your account for accessing the
CFM functionality, in the meantime I've switched the 202311 CF to InProgress
and marked 202401 as Open.

Thanks for volunteering!

--
Daniel Gustafsson

Re: A recent message added to pg_upgade

2023-11-02 Thread Amit Kapila

On Thu, Nov 2, 2023 at 11:32 AM Michael Paquier  wrote:
>
> On Thu, Nov 02, 2023 at 02:32:07PM +1100, Peter Smith wrote:
> > On Thu, Nov 2, 2023 at 2:25 PM Peter Smith  wrote:
> >> Checking this patch yesterday prompted me to create a new thread
> >> questioning the inconsistencies of the "GUC names in messages". In
> >> that thread, Tom Lane replied and gave some background information [1]
> >> about the GUC name embedding versus substitution. In hindsight, I
> >> think your original message was fine as-is, but there seem to be
> >> examples of every kind of style, so whatever you do would have some
> >> precedent.
> >>
> >> The patch v4 LGTM.
> >
> > To clarify, all the current code LGTM, but the patch is still missing
> > a guc_hook test case, right?
>
> -   NULL, NULL, NULL
> +   check_max_slot_wal_keep_size, NULL, NULL
>
> FWIW, I am +-0 with what you are proposing here.  I don't quite get
> why one may want to enforce this specific GUC at upgrade.
>

I also can't think of a good reason to do so but OTOH, I can't imagine
all possible scenarios. As this setting is invalid or can cause
problems, it seems people favor preventing it. Alvaro also voted in
favor of preventing it, so we are considering to proceed with it
unless more people think otherwise.

-- 
With Regards,
Amit Kapila.

Re: psql not responding to SIGINT upon db reconnection

2023-11-02 Thread Shlok Kyal

Hi,

> That sounds like a much better solution. Attached you will find a v4
> that implements your suggestion. Please let me know if there is
> something that I missed. I can confirm that the patch works.
>
> $ ./build/src/bin/psql/psql -h pg.neon.tech
> NOTICE:  Welcome to Neon!
> Authenticate by visiting:
> https://console.neon.tech/psql_session/xxx
>
>
> NOTICE:  Connecting to database.
> psql (17devel, server 15.3)
> SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, 
> compression: off)
> Type "help" for help.
>
> tristan957=> \c
> NOTICE:  Welcome to Neon!
> Authenticate by visiting:
> https://console.neon.tech/psql_session/yyy
>
>
> ^Cconnection to server at "pg.neon.tech" (3.18.6.96), port 5432 
> failed:
> Previous connection kept
> tristan957=>

I went through CFbot and found out that some tests are timing out.
Links:
https://cirrus-ci.com/task/6735437444677632
https://cirrus-ci.com/task/4536414189125632
https://cirrus-ci.com/task/5046587584413696
https://cirrus-ci.com/task/6172487491256320
https://cirrus-ci.com/task/5609537537835008

Some tests which are timing out are as follows:
[00:48:49.243] Summary of Failures:
[00:48:49.243]
[00:48:49.243] 4/270 postgresql:regress / regress/regress TIMEOUT 1000.01s
[00:48:49.243] 6/270 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade
TIMEOUT 1000.02s
[00:48:49.243] 34/270 postgresql:recovery /
recovery/027_stream_regress TIMEOUT 1000.02s
[00:48:49.243] 48/270 postgresql:plpgsql / plpgsql/regress TIMEOUT 1000.02s
[00:48:49.243] 49/270 postgresql:plperl / plperl/regress TIMEOUT 1000.01s
[00:48:49.243] 61/270 postgresql:dblink / dblink/regress TIMEOUT 1000.03s
[00:48:49.243] 89/270 postgresql:postgres_fdw / postgres_fdw/regress
TIMEOUT 1000.01s
[00:48:49.243] 93/270 postgresql:test_decoding / test_decoding/regress
TIMEOUT 1000.02s
[00:48:49.243] 110/270 postgresql:test_extensions /
test_extensions/regress TIMEOUT 1000.03s
[00:48:49.243] 123/270 postgresql:unsafe_tests / unsafe_tests/regress
TIMEOUT 1000.02s
[00:48:49.243] 152/270 postgresql:pg_dump / pg_dump/010_dump_connstr
TIMEOUT 1000.03s

Just want to make sure you are aware of the issue.

Thanks
Shlok Kumar Kyal

Re: Don't pass NULL pointer to strcmp().

2023-11-02 Thread Xing Guo

Hi,

Seems that Tom's patch cannot be applied to the current master branch.
I just re-generate the patch for others to play with.

On 11/2/23, Nathan Bossart  wrote:
> On Wed, Nov 01, 2023 at 10:39:04PM -0400, Tom Lane wrote:
>> Nathan Bossart  writes:
>>> What if we disallowed NULL string GUCs in v17?
>>
>> Well, we'd need to devise some other solution for hacks like the
>> one used by timezone_abbreviations (see comment in
>> check_timezone_abbreviations).  I think it's not worth the trouble,
>> especially seeing that 95% of guc.c is already set up for this.
>> The bugs are mostly in newer code like get_explain_guc_options,
>> and I think that's directly traceable to the lack of any comments
>> or docs about this behavior.
>
> Eh, yeah, it's probably not worth it if we find ourselves trading one set
> of hacks for another.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>


-- 
Best Regards,
Xing
From d620d3a2b44cef63024baca4ccdf55bf7724737f Mon Sep 17 00:00:00 2001
From: Xing Guo 
Date: Thu, 2 Nov 2023 16:42:00 +0800
Subject: [PATCH v3] Consider treating NULL is a valid boot_val for string
 GUCs.

---
 src/backend/utils/misc/guc.c   | 28 
 src/include/utils/guc_tables.h | 10 ++
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 39d3775e80..28056af19c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1473,7 +1473,9 @@ check_GUC_init(struct config_generic *gconf)
 			{
 struct config_string *conf = (struct config_string *) gconf;
 
-if (*conf->variable != NULL && strcmp(*conf->variable, conf->boot_val) != 0)
+if (*conf->variable != NULL &&
+	(conf->boot_val == NULL ||
+	 strcmp(*conf->variable, conf->boot_val) != 0))
 {
 	elog(LOG, "GUC (PGC_STRING) %s, boot_val=%s, C-var=%s",
 		 conf->gen.name, conf->boot_val ? conf->boot_val : "", *conf->variable);
@@ -4213,8 +4215,7 @@ SetConfigOption(const char *name, const char *value,
 /*
  * Fetch the current value of the option `name', as a string.
  *
- * If the option doesn't exist, return NULL if missing_ok is true (NOTE that
- * this cannot be distinguished from a string variable with a NULL value!),
+ * If the option doesn't exist, return NULL if missing_ok is true,
  * otherwise throw an ereport and don't return.
  *
  * If restrict_privileged is true, we also enforce that only superusers and
@@ -4257,7 +4258,8 @@ GetConfigOption(const char *name, bool missing_ok, bool restrict_privileged)
 			return buffer;
 
 		case PGC_STRING:
-			return *((struct config_string *) record)->variable;
+			return *((struct config_string *) record)->variable ?
+*((struct config_string *) record)->variable : "";
 
 		case PGC_ENUM:
 			return config_enum_lookup_by_value((struct config_enum *) record,
@@ -4304,7 +4306,8 @@ GetConfigOptionResetString(const char *name)
 			return buffer;
 
 		case PGC_STRING:
-			return ((struct config_string *) record)->reset_val;
+			return ((struct config_string *) record)->reset_val ?
+((struct config_string *) record)->reset_val : "";
 
 		case PGC_ENUM:
 			return config_enum_lookup_by_value((struct config_enum *) record,
@@ -5255,7 +5258,14 @@ get_explain_guc_options(int *num)
 {
 	struct config_string *lconf = (struct config_string *) conf;
 
-	modified = (strcmp(lconf->boot_val, *(lconf->variable)) != 0);
+	if (lconf->boot_val == NULL &&
+		*lconf->variable == NULL)
+		modified = false;
+	else if (lconf->boot_val == NULL ||
+		*lconf->variable == NULL)
+		modified = true;
+	else
+		modified = (strcmp(lconf->boot_val, *(lconf->variable)) != 0);
 }
 break;
 
@@ -5482,7 +5492,8 @@ write_one_nondefault_variable(FILE *fp, struct config_generic *gconf)
 			{
 struct config_string *conf = (struct config_string *) gconf;
 
-fprintf(fp, "%s", *conf->variable);
+if (*conf->variable)
+	fprintf(fp, "%s", *conf->variable);
 			}
 			break;
 
@@ -6142,7 +6153,8 @@ RestoreGUCState(void *gucstate)
 {
 	struct config_string *conf = (struct config_string *) gconf;
 
-	guc_free(*conf->variable);
+	if (*conf->variable)
+		guc_free(*conf->variable);
 	if (conf->reset_val && conf->reset_val != *conf->variable)
 		guc_free(conf->reset_val);
 	if (conf->reset_extra && conf->reset_extra != gconf->extra)
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 1ec9575570..0c38255961 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -240,6 +240,16 @@ struct config_real
 	void	   *reset_extra;
 };
 
+/*
+ * A note about string GUCs: the boot_val is allowed to be NULL, which leads
+ * to the reset_val and the actual variable value (*variable) also being NULL.
+ * However, there is no way to set a NULL value subsequently using
+ * set_config_option or any other GUC API.  Also, GUC APIs such as SHOW will

Re: pg_upgrade and logical replication

2023-11-02 Thread Amit Kapila

On Wed, Nov 1, 2023 at 8:33 AM Michael Paquier  wrote:
>
> On Fri, Oct 27, 2023 at 05:05:39PM +0530, Amit Kapila wrote:
> > I was analyzing this part and it seems it could be tricky to upgrade
> > in FINISHEDCOPY state. Because the system would expect that subscriber
> > would know the old slotname from oldcluster which it can drop at
> > SYNCDONE state. Now, as sync_slot_name is generated based on subid,
> > relid which could be different in the new cluster, the generated
> > slotname would be different after the upgrade. OTOH, if the relstate
> > is INIT, then I think the sync could be performed even after the
> > upgrade.
>
> TBH, I am really wondering if there is any need to go down to being
> able to handle anything else than READY for the relation states in
> pg_subscription_rel.  One reason is that it makes it much easier to
> think about how to handle these in parallel of a node with
> publications that also need to go through an upgrade, because as READY
> relations they don't require any tracking.  IMO, this makes it simpler
> to think about cases where a node holds both subscriptions and
> publications.
>

But that poses needless restrictions for the users. For example, there
appears no harm in upgrading even when the relation is in
SUBREL_STATE_INIT state. Users should be able to continue replication
after the upgrade.

> FWIW, my take is that it feels natural to do the upgrades of
> subscriptions first, creating a similarity with the case of minor
> updates with physical replication setups.
>
> > Shouldn't we at least ensure that replication origins do exist in the
> > old cluster corresponding to each of the subscriptions? Otherwise,
> > later the query to get remote_lsn for origin in getSubscriptions()
> > would fail.
>
> You mean in the shape of a pre-upgrade check making sure that
> pg_replication_origin_status has entries for all the subscriptions we
> expect to see during the upgrade?
>

Yes.

-- 
With Regards,
Amit Kapila.

Re: "box" type description

2023-11-02 Thread Kyotaro Horiguchi

At Wed, 1 Nov 2023 11:36:01 -0400, Bruce Momjian  wrote in 
> On Wed, Mar 31, 2021 at 01:43:47PM +0200, Christoph Berg wrote:
> > Re: Kyotaro Horiguchi
> > I like that because it points to the "point" syntax so users can
> > figure out how to spell a box.
> 
> I liked Horiguchi-san's patch from 2021, but once I started looking
> further, I found a number of improvements that can be made in the \dTS
> output beyond Horiguchi-san's changes:
> 
> *  boolean outputs 't'/'f', not 'true'/'false'
> *  Added "format" ... for output
> *  tid output format was at the start, not the end
> *  I didn't add space between point x,y because the output has no space
> *  I spelled out "point" instead of "pt"
> *  "line" has two very different input formats so I listed both
>(more different than others like boolean)
> *  I didn't see the need to say "datatype" for LSN and UUID
> *  I improved the txid_snapshot description
> *  There was no description for table_am_handler
> 
> Patch attached.

Thank you for continuing this. The additional changes looks
fine.

Upon reviewing the table again in this line, further potential
improvements and issues have been found. For example:

character, varchar: don't follow the rule.
- 'char(length)' blank-padded string, fixed storage length
+ blank-padded string, fixed storage length, format 'char(length)'

interval: doesn't follow the rule.
- @  , time interval
+ time interval, format '[@]  '
(I think '@' is not necessary here..)

pg_snapshot:

  The description given is just "snapshot", which seems overly simplistic.

txid_snapshot:

  The description reads "transaction snapshot". Is this really
  accurate, especially in contrast with pg_snapshot?

pg_brin_bloom_summary, pg_brin_minmax_multi_summary, pg_mcv_list and many:

I'm uncertain whether these types, which lack an input syntax (but
have an output format), qualify as pseudo-types.  Nevertheless, I
believe it would be beneficial to describe that those types differ
from ordinary types.

Should we consider refining these descriptions in the table?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

2023-11-02 Thread Michael Paquier

On Thu, Nov 02, 2023 at 08:19:35PM +1300, David Rowley wrote:
> No tests were introduced.  Is this the only existing one that's
> unstable as far as you saw?

That seems to be the only one.

> I'm not yet seeing any failures in the buildfarm, so don't really want
> to push a fix for this one if there are going to be a few more
> unstable ones to fix.  I may just hold off a while to see.

The CF bot is also thinking that this is not really stable, impacting
the tests of the patches:
https://cirrus-ci.com/task/6685074121293824
https://cirrus-ci.com/task/4739402799251456
https://cirrus-ci.com/task/5209803589419008
--
Michael


signature.asc
Description: PGP signature

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

2023-11-02 Thread David Rowley

On Thu, 2 Nov 2023 at 18:39, Michael Paquier  wrote:
> The CI has been telling me that the plans of the tests introduced by
> this patch are not that stable when building with 32b.  See:
> diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 
> /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out
> --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 
> 2023-11-02 05:25:47.290268511 +
> +++ 
> /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out
>  2023-11-02 05:30:45.242316423 +
> @@ -4026,13 +4026,13 @@
>   Sort
> Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
> Sort Key: t1.c1
> -   ->  Nested Loop Semi Join
> +   ->  Hash Semi Join
>   Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
> - Join Filter: (t2.c3 = t1.c3)
> + Hash Cond: (t1.c3 = t2.c3)
>   ->  Foreign Scan on public.ft1 t1
> Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
> Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 
> 1"."T 1" WHERE (("C 1" < 20))
> - ->  Materialize
> + ->  Hash
> Output: t2.c3
> ->  Foreign Scan on public.ft2 t2
>   Output: t2.c3

No tests were introduced.  Is this the only existing one that's
unstable as far as you saw?

I'm not yet seeing any failures in the buildfarm, so don't really want
to push a fix for this one if there are going to be a few more
unstable ones to fix.  I may just hold off a while to see.

Thanks for letting me know about this.

David

Re: Tab completion regression test failed on illumos

2023-11-02 Thread Japin Li



On Thu, 02 Nov 2023 at 13:42, Japin Li  wrote:
> On Thu, 02 Nov 2023 at 13:01, Noah Misch  wrote:
>> On Wed, Nov 01, 2023 at 03:19:39PM +0800, Japin Li wrote:
>>> I try to run regression test on illumos, the 010_tab_completion will
>>> failed because of timeout.
>>
>>> Any suggestions?  Thanks in advance!
>>
>> This test failed for me, in a different way, when I briefly installed IO::Pty
>> on a Solaris buildfarm member:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2023-01-03%2022%3A39%3A26
>>
> Thanks for confirm this!
>
> I try to install IO::Pty using cpan, however, I cannot get the same error.
>
>> The IPC::Run pty tests also fail on Solaris.  If I were debugging this, I'd
>> start by fixing IO::Pty and IPC::Run to pass their own test suites on Solaris
>> or illumos.  Then I'd see if problems continue for this postgresql test.
>
> So, it might be a bug comes from Perl.

After enable debug for IPC::Run, I found the following logs:

IPC::Run 0001 0123456789012-4 [#2(415745)]: writing to fd 11 (kid's stdin)
IPC::Run 0001 0123456789012-4 [#2(415745)]: write( 11, 'SEL ' ) = 4 
<- Here write 4 bytes.
IPC::Run 0001 0123456789012-4 [#2(415745)]: fds for select: ---r--r
IPC::Run 0001 0123456789012-4 [#2(415745)]: timeout=0
IPC::Run 0001 0123456789012-4 [#2(415745)]: selected  ---r
IPC::Run 0001 0123456789012-4 [#2(415745)]: filtering data from fd 11 (kid's 
stdout)
IPC::Run 0001 0123456789012-4 [#2(415745)]: reading from fd 11 (kid's stdout)
IPC::Run 0001 0123456789012-4 [#2(415745)]: read( 11 ) = 8 chars 'SEL ' 
<- But read 8 bytes.

It seems the 'SEL\t' is converted to 'SEL ' which is "SEL" with 5 spaces.

-- 
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

make pg_ctl more friendly

2023-11-02 Thread Crisp Lee

Hi hackers:

I got a basebackup using pg_basebackup -R. After that, I created a restore
point named test on primary, and set recovery_target_name to test,
recovery_target_action to shutdown in standby datadir. I got a failure
startup message  after 'pg_ctl start -D $standby_datadir'. I think it is
not a failure, and makes users nervous, especially for newbies.

My thought is to generate a recovery.done file if the postmaster receives
exit code 3 from the startup process. When postmaster  exits, pg_ctl will
give a more friendly message to users.


0001-Make-pg_ctl-more-friendly-for-users.patch
Description: Binary data

Re: Is this a problem in GenericXLogFinish()?

2023-11-02 Thread Amit Kapila

On Wed, Nov 1, 2023 at 12:54 PM Michael Paquier  wrote:
>
> On Mon, Oct 30, 2023 at 03:54:39PM +0530, Amit Kapila wrote:
> > On Sat, Oct 28, 2023 at 4:30 PM Michael Paquier  wrote:
> >> On Sat, Oct 28, 2023 at 03:45:13PM +0530, Amit Kapila wrote:
> >> > Yes, we need it to exclude any concurrent in-progress scans that could
> >> > return incorrect tuples during bucket squeeze operation.
> >>
> >> Thanks.  So I assume that we should just set REGBUF_NO_CHANGE when the
> >> primary and write buffers are the same and there are no tuples to
> >> move.  Say with something like the attached?
> >
> > What if the primary and write buffer are not the same but ntups is
> > zero? Won't we hit the assertion again in that case?
>
> The code tells that it should be able to handle such a case,
>

It should be possible when we encounter some page in between the chain
that has all dead items and write buffer points to some page after the
primary bucket page and before the read buffer page.

> so this
> would mean to set REGBUF_NO_CHANGE only when we have no tuples to
> move:
> -XLogRegisterBuffer(1, wbuf, REGBUF_STANDARD);
> +/*
> + * A cleanup lock is still required on the write buffer even
> + * if there are no tuples to move, so it needs to be registered
> + * in this case.
> + */
> +wbuf_flags = REGBUF_STANDARD;
> +if (xlrec.ntups == 0)
> +wbuf_flags |= REGBUF_NO_CHANGE;
> +XLogRegisterBuffer(1, wbuf, wbuf_flags);
>

Why register wbuf at all if there are no tuples to add and it is not
the same as bucketbuf? Also, I think this won't be correct if prevbuf
and wrtbuf are the same and also we have no tuples to add to wbuf. I
have attached a naive and crude way to achieve it. This needs more
work both in terms of trying to find a better way to change the code
or ensure this won't break any existing case. I have just run the
existing tests. Such a fix certainly required more testing.

-- 
With Regards,
Amit Kapila.

fix_hash_squeeze_wal_1.patch
Description: Binary data

Re: Intermittent failure with t/003_logical_slots.pl test on windows

2023-11-02 Thread Kyotaro Horiguchi

At Tue, 31 Oct 2023 18:11:48 +0530, vignesh C  wrote in 
> Few others are also facing this problem with similar code like in:
> https://stackoverflow.com/questions/15882799/fgets-returning-error-for-file-returned-by-popen

I'm inclined to believe that the pipe won't enter the EOF state until
the target command terminates (then the top-level cmd.exe). The target
command likely terminated prematurely due to an error before priting
any output.

If we append "2>&1" to the command line, we can capture the error
message through the returned pipe if any. Such error messages will
cause the subsequent code to fail with an error such as "unexpected
string: 'the output'". I'm not sure, but if this is permissive, the
returned error messages could potentially provide insight into the
underlying issue, paving the way for a potential solution.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: CRC32C Parallel Computation Optimization on ARM

2023-11-02 Thread Xiang Gao

On Tue, 31 Oct 2023 15:48:21PM -0500, Nathan Bossart wrote:
>> Thanks.  I went ahead and split this prerequisite part out to a separate
>> thread [0] since it's sort-of unrelated to your proposal here.  It's not
>> really a prerequisite, but I do think it will simplify things a bit.

>Per the other thread [0], we should try to avoid the runtime check when
>possible, as it seems to produce a small regression.  This means that if
>the ARMv8 CRC instructions are found with the default compiler flags, we
>can only use vmull_p64() if it can also be used with the default flags.
>Otherwise, we can just do the runtime check.

>[0] https://postgr.es/m/2620794.1698783160%40sss.pgh.pa.us

After reading the discussion, I understand that in order to avoid performance
regression in some instances, we need to try our best to avoid runtime checks.
I don't know if I understand it correctly.
if so, we need to check whether to use the ARM CRC32C and VMULL instruction
directly or with runtime check. There will be many scenarios here and the code
will be more complicated.
Could you please give me some suggestions about how to refine this patch?
Thanks very much!

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Re: speed up a logical replica setup

2023-11-02 Thread Ashutosh Bapat

On Wed, Nov 1, 2023 at 7:10 PM Ashutosh Bapat
 wrote:
>
> I noticed some differences between this and a similar utility
> https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_create_subscriber.c.
> I will be reviewing these differences next to see if we are missing
> anything here.

Some more missing things to discuss

Handling signals - The utility cleans up left over objects on exit.
But default signal handlers will make the utility exit without a
proper cleanup [1]. The signal handlers may clean up the objects
themselves or at least report the objects that need tobe cleaned up.

Idempotent behaviour - Given that the utility will be used when very
large amount of data is involved, redoing everything after a network
glitch or a temporary failure should be avoided. This is true when the
users start with base backup. Again, I don't think we should try to be
idempotent in v1 but current design shouldn't stop us from doing so. I
didn't find anything like that in my review. But something to keep in
mind.

That finishes my first round of review. I will wait for your updated
patches before the next round.

[1] NOTEs section in man atexit().

--
Best Wishes,
Ashutosh Bapat

Re: A recent message added to pg_upgade

2023-11-02 Thread Michael Paquier

On Thu, Nov 02, 2023 at 02:32:07PM +1100, Peter Smith wrote:
> On Thu, Nov 2, 2023 at 2:25 PM Peter Smith  wrote:
>> Checking this patch yesterday prompted me to create a new thread
>> questioning the inconsistencies of the "GUC names in messages". In
>> that thread, Tom Lane replied and gave some background information [1]
>> about the GUC name embedding versus substitution. In hindsight, I
>> think your original message was fine as-is, but there seem to be
>> examples of every kind of style, so whatever you do would have some
>> precedent.
>>
>> The patch v4 LGTM.
> 
> To clarify, all the current code LGTM, but the patch is still missing
> a guc_hook test case, right?

-   NULL, NULL, NULL
+   check_max_slot_wal_keep_size, NULL, NULL

FWIW, I am +-0 with what you are proposing here.  I don't quite get
why one may want to enforce this specific GUC at upgrade.  Anyway, if
they do, I'd be curious to hear why this is required and this patch
would prevent them to do so.  Actually, this could be a good reason
for making the logical slot handling during pg_upgrade an option
rather than a mandatory thing.
--
Michael


signature.asc
Description: PGP signature

76 matches

Mail list logo