date:20211216

Re: [PATCH] Accept IP addresses in server certificate SANs

2021-12-16 Thread Kyotaro Horiguchi

Sorry for the silly mistake.

At Fri, 17 Dec 2021 15:40:10 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > NSS departs slightly from the spec and will additionally try to match
> > an IP address against the CN, but only if there are no iPAddresses in
> > the SAN. It roughly matches the logic for DNS names.
> 
> OpenSSL seems different. X509_check_host() tries SAN then CN iff SAN
> doesn't exist.  X509_check_ip() tries SAN and completely ignores
> iPAdress and CN.

OpenSSL seems different. X509_check_host() tries SAN then CN iff SAN
doesn't exist.  X509_check_ip() tries iPAddress and completely ignores
CN.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Allow escape in application_name

2021-12-16 Thread Kyotaro Horiguchi

At Fri, 17 Dec 2021 14:50:37 +0900, Fujii Masao  
wrote in 
> > FWIW, I don't understand why we care of the case where the function
> > itself changes its mind to set values[i] to null
> 
> Whether we add this check or not, the behavior is the same, i.e., that
> setting value is not used. But by adding the check we can avoid
> unnecessary call of process_pgfdw_appname() when the value is
> NULL. OTOH, of course we can also remove the check. So I'm ok to
> remove that if you're thinking it's more consistent to remove that.

That depends. It seems to me defGetString is assumed to return a valid
pointer, since (AFAIS) all of the callers don't check against NULL. On
the other hand the length of the string may be zero. Actually
check_conn_params() called just after makes the same assumption (only
on "password", though).  On the other hand PQconnectdbParams assumes
NULL value as not-set.

So assumption on the NULL value differs in some places and at least
postgres_fdw doesn't use NULL to represent "not exists".

Thus rewriting the code we're focusing on like the following would
make sense to me.

>   if (strcmp(keywords[i], "application_name") == 0)
>   {
>   values[i]  = process_pgfdw_appname(values[i]);
>
>   /*
>* Break if we have a non-empty string. If we end up failing 
> with
>* all candidates, fallback_application_name would work.
>*/
>   if (appanme[0] != '\0')
>   break;
>   }   


> Probably now it's good chance to revisit this issue. As I commented at
> [1], at least for me it's intuitive to use empty string rather than
> "[unknown]" when appname or username, etc was NULL or '\0'. To
> implement this behavior, I argued to remove the check itself. But
> there are several options:

Thanks for revisiting.

> #1. use "[unknown]"
> #2. add the check but not use "[unknown]"
> #3. don't add the check (i.e., what the current patch does)
> 
> For now, I'm ok to choose #2 or #3.

As I said before, given that we don't show "unkown" or somethig like
as the fallback, I'm fine with not having a NULL check since anyway it
bumps into SEGV immediately.  In short I'm fine with #3 here.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-16 Thread Shruthi Gowda

On Mon, Dec 13, 2021 at 8:43 PM Shruthi Gowda  wrote:
>
> On Mon, Dec 6, 2021 at 11:25 PM Robert Haas  wrote:
> >
> > On Sun, Dec 5, 2021 at 11:44 PM Sadhuprasad Patro  wrote:
> > > 3.
> > > @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt 
> > > *stmt)
> > >   */
> > >   pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock);
> > >
> > > - do
> > > + /* Select an OID for the new database if is not explicitly configured. 
> > > */
> > > + if (!OidIsValid(dboid))
> > >   {
> > > - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId,
> > > -Anum_pg_database_oid);
> > > - } while (check_db_file_conflict(dboid));
> > >
> > > I think we need to do 'check_db_file_conflict' for the USER given OID
> > > also.. right? It may already be present.
> >
> > Hopefully, if that happens, we straight up fail later on.
>
> That's right. If a database with user-specified OID exists, the
> createdb fails with a "duplicate key value" error.
> If just a data directory with user-specified OID exists,
> MakePGDirectory() fails to create the directory and the cleanup
> callback createdb_failure_callback() removes the directory that was
> not created by 'createdb()' function.
> The subsequent create database call with the same OID will succeed.
> Should we handle the case where a data directory exists and the
> corresponding DB with that oid does not exist? I presume this
> situation doesn't arise unless the user tries to create directories in
> the data path. Any thoughts?

I have updated the DBOID preserve patch to handle this case and
generated the latest patch on top of your v7-001-preserve-relfilenode
patch.

Thanks & Regards
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v7-0002-Preserve-database-OIDs-in-pg_upgrade.patch
Description: Binary data

Re: Add id's to various elements in protocol.sgml

2021-12-16 Thread Peter Eisentraut


On 15.12.21 16:59, Brar Piening wrote:

On Dec 15, 2021 at 15:49, Alvaro Herrera wrote:

On 2021-Dec-15, Brar Piening wrote:

Since I can't argue towards some general utility for the xreflabels
and don't have any other solid argument in favor of adding more, I
will remove them from my current patch but leave the existing ones
intact.

Yeah, I think not adding them until we have a use for them might be
wisest.

A new version of the patch that doesn't add xreflabels is attached.


Now this patch adds a bunch of ids, but you can't use them to link to, 
because as soon as you do, you will get complaints about a missing 
xreflabel.  So what is the remaining purpose?

Re: Column Filtering in Logical Replication

2021-12-16 Thread Peter Eisentraut


On 17.12.21 05:47, Amit Kapila wrote:

I think in the above sentence, you mean to say "dropped from the
publication". So, IIUC, you are proposing that if one drops a column
that was part of the column list of a relation in a publication, an
error will be raised. Also, if the user specifies CASCADE in Alter
Table ... Drop Column, then we drop the relation from publication. Is
that right? BTW, this is somewhat on the lines of what row_filter
patch is also doing where if the user drops the column that was part
of row_filter for a relation in publication, we give an error and if
the user tries to drop the column with CASCADE then the relation is
removed from the publication.


That looks correct.  Consider how triggers behave: Dropping a column 
that a trigger uses (either in UPDATE OF or a WHEN condition) errors 
with RESTRICT and drops the trigger with CASCADE.

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2021-12-16 Thread Masahiko Sawada

On Thu, Dec 16, 2021 at 5:27 AM Peter Geoghegan  wrote:
>
> On Fri, Dec 10, 2021 at 1:48 PM Peter Geoghegan  wrote:
> > * I'm still working on the optimization that we discussed on this
> > thread: the optimization that allows the final relfrozenxid (that we
> > set in pg_class) to be determined dynamically, based on the actual
> > XIDs we observed in the table (we don't just naively use FreezeLimit).
>
> Attached is v4 of the patch series, which now includes this
> optimization, broken out into its own patch. In addition, it includes
> a prototype of opportunistic freezing.
>
> My emphasis here has been on making non-aggressive VACUUMs *always*
> advance relfrozenxid, outside of certain obvious edge cases. And so
> with all the patches applied, up to and including the opportunistic
> freezing patch, every autovacuum of every table manages to advance
> relfrozenxid during benchmarking -- usually to a fairly recent value.
> I've focussed on making aggressive VACUUMs (especially anti-wraparound
> autovacuums) a rare occurrence, for truly exceptional cases (e.g.,
> user keeps canceling autovacuums, maybe due to automated script that
> performs DDL). That has taken priority over other goals, for now.

Great!

I've looked at 0001 patch and here are some comments:

@@ -535,8 +540,16 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,

xidFullScanLimit);
aggressive |= MultiXactIdPrecedesOrEquals(rel->rd_rel->relminmxid,

   mxactFullScanLimit);
+   skipwithvm = true;
if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
+   {
+   /*
+* Force aggressive mode, and disable skipping blocks using the
+* visibility map (even those set all-frozen)
+*/
aggressive = true;
+   skipwithvm = false;
+   }

vacrel = (LVRelState *) palloc0(sizeof(LVRelState));

@@ -544,6 +557,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->rel = rel;
vac_open_indexes(vacrel->rel, RowExclusiveLock, >nindexes,
 >indrels);
+   vacrel->aggressive = aggressive;
vacrel->failsafe_active = false;
vacrel->consider_bypass_optimization = true;

How about adding skipwithvm to LVRelState too?

---
/*
-* The current block is potentially skippable;
if we've seen a
-* long enough run of skippable blocks to
justify skipping it, and
-* we're not forced to check it, then go ahead and skip.
-* Otherwise, the page must be at least
all-visible if not
-* all-frozen, so we can set
all_visible_according_to_vm = true.
+* The current page can be skipped if we've
seen a long enough run
+* of skippable blocks to justify skipping it
-- provided it's not
+* the last page in the relation (according to
rel_pages/nblocks).
+*
+* We always scan the table's last page to
determine whether it
+* has tuples or not, even if it would
otherwise be skipped
+* (unless we're skipping every single page in
the relation). This
+* avoids having lazy_truncate_heap() take
access-exclusive lock
+* on the table to attempt a truncation that just fails
+* immediately because there are tuples on the
last page.
 */
-   if (skipping_blocks && !FORCE_CHECK_PAGE())
+   if (skipping_blocks && blkno < nblocks - 1)

Why do we always need to scan the last page even if heap truncation is
disabled (or in the failsafe mode)?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: row filtering for logical replication

2021-12-16 Thread Greg Nancarrow

On Fri, Dec 17, 2021 at 9:41 AM Peter Smith  wrote:
>
> PSA the v47* patch set.
>

I found that even though there are now separately-maintained WHERE clauses
per pubaction, there still seem to be problems when applying the old/new
row rules for UPDATE.
A simple example of this was previously discussed in [1].
The example is repeated below:

 Publication
create table tbl1 (a int primary key, b int);
create publication A for table tbl1 where (b<2) with(publish='insert');
create publication B for table tbl1 where (a>1) with(publish='update');

 Subscription
create table tbl1 (a int primary key, b int);
create subscription sub connection 'dbname=postgres host=localhost
port=1' publication A,B;

 Publication
insert into tbl1 values (1,1);
update tbl1 set a = 2;

So using the v47 patch-set, I still find that the UPDATE above results in
publication of an INSERT of (2,1), rather than an UPDATE of (1,1) to (2,1).
This is according to the 2nd UPDATE rule below, from patch 0003.

+ * old-row (no match)new-row (no match)  -> (drop change)
+ * old-row (no match)new row (match) -> INSERT
+ * old-row (match)   new-row (no match)  -> DELETE
+ * old-row (match)   new row (match) -> UPDATE

This is because the old row (1,1) doesn't match the UPDATE filter "(a>1)",
but the new row (2,1) does.
This functionality doesn't seem right to me. I don't think it can be
assumed that (1,1) was never published (and thus requires an INSERT rather
than UPDATE) based on these checks, because in this example, (1,1) was
previously published via a different operation - INSERT (and using a
different filter too).
I think the fundamental problem here is that these UPDATE rules assume that
the old (current) row was previously UPDATEd (and published, or not
published, according to the filter applicable to UPDATE), but this is not
necessarily the case.
Or am I missing something?

[1]
https://postgr.es/m/CAJcOf-dz0srExG0NPPgXh5X8eL2uxk7C=czogtbf8cnqoru...@mail.gmail.com

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Emit a warning if the extension's GUC is set incorrectly

2021-12-16 Thread Peter Eisentraut


On 17.12.21 03:25, Shinya Kato wrote:

For now, I'v attached the patch that fixed the compilation error.


I think it would be good if you could split the uncontroversial new 
EmitErrorsOnPlaceholders() calls into a separate patch.  And please add 
an explanation what exactly the rest of the patch changes.

Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-16 Thread Greg Stark

On Thu, 16 Dec 2021 at 22:18, Tom Lane  wrote:
>
> * If the sort order is underspecified, or you omit ORDER BY
> entirely, then it's not clear which rows will be operated on.
> The LIMIT might stop after just some of the rows in a peer
> group, and you can't predict which ones.

Meh, that never seemed very compelling to me. I think that's on the
user and there are *lots* of cases where the user can easily know
enough extra context to know that's what she wants. In particular I've
often wanted to delete one of two identical records and it would have
been really easy to just do a DELETE LIMIT 1. I know there are ways to
do it but it's always seemed like unnecessary hassle when there was a
natural way to express it.

> * UPDATE/DELETE necessarily involve the equivalent of SELECT
> FOR UPDATE, which may cause the rows to be ordered more
> surprisingly than you expected, ie the sort happens *before*
> rows are replaced by their latest versions, which might have
> different sort keys.

This... is a real issue. Or is it? Just to be clear I think what
you're referring to is a case like:

INSERT INTO t values (1),(2)

In another session: BEGIN; UPDATE t set c=0 where c=2

DELETE FROM t ORDER BY c ASC LIMIT 1

In other session: COMMIT

Which row was deleted? In this case it was the row where c=1. Even
though the UPDATE reported success (i.e. 1 row updated) so presumably
it happened "before" the delete. The delete saw the ordering from
before it was blocked and saw the ordering with c=1, c=2 so apparently
it happened "before" the update.

There are plenty of other ways to see the same surprising
serialization failure. If instead of a DELETE you did an UPDATE there
are any number of functions or other features that have been added
which can expose the order in which the updates happened.

The way to solve those cases would be to use serializable isolation
(or even just repeatable read in this case). Wouldn't that also solve
the DELETE serialization failure too?

-- 
greg

Re: [PATCH] Accept IP addresses in server certificate SANs

2021-12-16 Thread Kyotaro Horiguchi

At Thu, 16 Dec 2021 18:44:54 +, Jacob Champion  wrote 
in 
> On Thu, 2021-12-16 at 14:54 +0900, Kyotaro Horiguchi wrote:
> > It seems like saying that we must search for iPAddress and mustn't use
> > CN nor dNSName if the client connected using IP address. Otherwise, if
> > the host name is a domain name, we use only dNSName if present, and
> > use CN otherwise.  That behavior seems agreeing to what you wrote as
> > NSS's behavior.
> 
> NSS departs slightly from the spec and will additionally try to match
> an IP address against the CN, but only if there are no iPAddresses in
> the SAN. It roughly matches the logic for DNS names.

OpenSSL seems different. X509_check_host() tries SAN then CN iff SAN
doesn't exist.  X509_check_ip() tries SAN and completely ignores
iPAdress and CN.

> Here's the description of the NSS behavior and some of the reasoning
> behind it, quoted from a developer on Bugzilla [1]:
> 
> > Elsewhere in RFC 2818, it says 
> > 
> >If a subjectAltName extension of type dNSName is present, that MUST
> >be used as the identity. Otherwise, the (most specific) Common Name
> >field in the Subject field of the certificate MUST be used. 
> > 
> > Notice that this section is not conditioned upon the URI being a hostname
> > and not an IP address.  So this statement conflicts with the one cited 
> > above.  
> > 
> > I implemented this policy:
> > 
> > if the URI contains a host name
> > if the subject alt name is present and has one or more DNS names
> > use the DNS names in that extension as the server identity
> > else
> > use the subject common name as the server identity
> > else if the URI contains an IP address
> > if the subject alt name is present and has one or more IP addresses
> > use the IP addresses in that extension as the server identity
> > else
> > compare the URI IP address string with the subject common name.
(Wow. The article is 20-years old.)

*I* am fine with it.

> It sounds like both you and Andrew might be comfortable with that same
> behavior? I think it looks like a sane solution, so I'll implement that
> and we can see what it looks like. (My work on this will be paused over
> the end-of-year holidays.)

> > I'm not sure about ipv4 comptible addresses.  However, I think we can
> > identify ipv4 compatible address easily.
> 
> Yeah, it would probably not be a difficult feature to add later.

I agree.

> > +* pg_inet_net_pton() will accept CIDR masks, which we 
> > don't want to
> > +* match, so skip the comparison if the host string 
> > contains a slash.
> > +*/
> > +   if (!strchr(host, '/')
> > +   && pg_inet_net_pton(PGSQL_AF_INET6, host, addr, -1) 
> > == 128)
> > 
> > If a cidr is given, pg_inet_net_pton returns a number less than 128 so
> > we don't need to check '/' explicity?  (I'm not sure '/128' is
> > sensible but doesn't harm..)
> 
> Personally I think that, if someone wants your libpq to connect to a
> server with a hostname of "some:ipv6::address/128", then they are
> trying to pull something (evading a poorly coded blocklist, perhaps?)
> and we should not allow that to match an IP. Thoughts?

If the client could connect to the network-address, it could be said
that we can assume that address is the name:p Just kidding.

As the name suggests, the function reads a network address. And the
only user is network_in().  I think we should provide pg_inet_pton()
instead of abusing pg_inet_net_pton().  inet_net_pton_*() functions
can be modified to reject /cidr part without regression so we are able
to have pg_inet_pton() with a small amount of change.

- inet_net_pton_ipv4(const char *src, u_char *dst)
+ inet_net_pton_ipv4_internal(const char *src, u_char *dst, bool netaddr)

+ inet_net_pton_ipv4(const char *src, u_char *dst)
 (calls inet_net_pton_ipv4_internal(src, dst, true))
+ inet_pton_ipv4(const char *src, u_char *dst)
 (calls inet_net_pton_ipv4_internal(src, dst, false))

> Thanks for the review!
> --Jacob
> 
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=103752

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-16 Thread Yugo NAGATA

On Thu, 16 Dec 2021 20:56:59 -0700
"David G. Johnston"  wrote:

> On Thursday, December 16, 2021, Yugo NAGATA  wrote:
> 
> >
> > Also, here seem to be some use cases. For example,
> > - when you want to delete the specified number of rows from a table
> >   that doesn't have a primary key and contains tuple duplicated.
> 
> 
> Not our problem…use the tools correctly; there is always the hack
> work-around for the few who didn’t.
> 
> 
> > - when you want to delete the bottom 10 items with bad scores
> >   (without using rank() window function).
> 
> 
> This one doesn’t make sense to me.
> 
> - when you want to delete only some of rows because it takes time
> >   to delete all of them.
> >
> >
> This seems potentially compelling though I’d be more concerned about the
> memory aspects than simply taking a long amount of time.  If this is a
> problem then maybe discuss it without having a solution-in-hand?  But given
> the intense I/O cost that would happen spreading this out over time seems
> acceptable and it should be an infrequent thing to do.  Expecting users to
> plan and execute some custom code for their specific need seems reasonable.
> 
> So even if Tom’s technical concerns aren’t enough working on this based
> upon these uses cases doesn’t seem of high enough benefit.

Thank you for your comments.
Ok. I agree that there are not so strong use cases.

Regards,
Yugo Nagata


-- 
Yugo NAGATA

Re: Allow escape in application_name

2021-12-16 Thread Fujii Masao





On 2021/12/17 12:06, Kyotaro Horiguchi wrote:

At Fri, 17 Dec 2021 02:42:25 +, "kuroda.hay...@fujitsu.com" 
 wrote in

Dear Fujii-san,

Sorry I forgot replying your messages.


if (strcmp(keywords[i], "application_name") == 0 &&
values[i] != NULL && *(values[i]) != '\0')


I'm not sure but do we have a case that values[i] becomes NULL
even if keywords[i] is "application_name"?


No for now, I guess. But isn't it safer to check that, too? I also could not 
find strong
reason why that check should be dropped. But you'd like to drop that?


No, I agreed the new checking. I'm just afraid of my code missing.


FWIW, I don't understand why we care of the case where the function
itself changes its mind to set values[i] to null


Whether we add this check or not, the behavior is the same, i.e., that setting 
value is not used. But by adding the check we can avoid unnecessary call of 
process_pgfdw_appname() when the value is NULL. OTOH, of course we can also 
remove the check. So I'm ok to remove that if you're thinking it's more 
consistent to remove that.



while we ignore the
possibility that some module at remote is modified so that some global
variables to be NULL.  I don't mind wheter we care for NULLs or not
but I think we should be consistent at least in a single patch.


Probably you're mentioning that we got rid of something like the following code from 
process_pgfdw_appname(). In this case whether we add the check or not can affect the 
behavior (i.e., escape sequence is replace with "[unknown]" or not). So ISTM 
that the situations are similar but not the same.

+   if (appname == NULL || *appname == '\0')
+   appname = "[unknown]";

Probably now it's good chance to revisit this issue. As I commented at [1], at least for 
me it's intuitive to use empty string rather than "[unknown]" when appname or 
username, etc was NULL or '\0'. To implement this behavior, I argued to remove the check 
itself. But there are several options:

#1. use "[unknown]"
#2. add the check but not use "[unknown]"
#3. don't add the check (i.e., what the current patch does)

For now, I'm ok to choose #2 or #3.

[1]
https://postgr.es/m/0dbe50c3-c528-74b1-c577-035a4a68f...@oss.nttdata.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-16 Thread Yugo NAGATA

On Thu, 16 Dec 2021 22:17:58 -0500
Tom Lane  wrote:

> Yugo NAGATA  writes:
> > We cannot use ORDER BY or LIMIT/OFFSET in the current
> > DELETE statement syntax, so all the row matching the
> > WHERE condition are deleted. However, the tuple retrieving
> > process of DELETE is basically same as SELECT statement,
> > so I think that we can also allow DELETE to use ORDER BY
> > and LIMIT/OFFSET.
> 
> Indeed, this is technically possible, but we've rejected the idea
> before and I'm not aware of any reason to change our minds.
> The problem is that a partial DELETE is not very deterministic
> about which rows are deleted, and that does not seem like a
> great property for a data-updating command.  (The same applies
> to UPDATE, which is why we don't allow these options in that
> command either.)  The core issues are:
> 
> * If the sort order is underspecified, or you omit ORDER BY
> entirely, then it's not clear which rows will be operated on.
> The LIMIT might stop after just some of the rows in a peer
> group, and you can't predict which ones.
> 
> * UPDATE/DELETE necessarily involve the equivalent of SELECT
> FOR UPDATE, which may cause the rows to be ordered more
> surprisingly than you expected, ie the sort happens *before*
> rows are replaced by their latest versions, which might have
> different sort keys.
> 
> We live with this amount of indeterminism in SELECT, but that
> doesn't make it a brilliant idea to allow it in UPDATE/DELETE.

Thank you for your explaining it! 
I'm glad to understand why this idea is not good and has been rejected.

Regards,
Yugo Nagata

-- 
Yugo NAGATA

Re: parallel vacuum comments

2021-12-16 Thread Masahiko Sawada

On Thu, Dec 16, 2021 at 4:27 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wed, Dec 15, 2021 4:03 PM Masahiko Sawada  wrote:
> > On Tue, Dec 14, 2021 at 12:03 PM Amit Kapila  
> > wrote:
> > > There is still pending
> > > work related to moving parallel vacuum code to a separate file and a
> > > few other pending comments that are still under discussion. We can
> > > take care of those in subsequent patches. Do, let me know if you or
> > > others think differently?
> >
> > I'm on the same page.
> >
> > I've attached an updated patch. The patch incorporated several changes from
> > the last version:
> >
> > * Rename parallel_vacuum_begin() to parallel_vacuum_init()
> > * Unify the terminology; use "index bulk-deletion" and "index cleanup"
> > instead of "index vacuum" and "index cleanup".
> > * Fix the comment of parallel_vacuum_init() pointed out by Andres
> > * Fix a typo that is left in commit 22bd3cbe0c (pointed out by Hou)
> >
> > Please review it.
>
> Thanks for updating the patch.
> Here are a few comments:

Thank you for the comments!

>
> 1)
> +   case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
> +   errcontext("while cleanup index \"%s\" of relation 
> \"%s.%s\"",
>
> I noticed current code uses error msg "While cleaning up index xxx" which 
> seems a little
> different from the patch's maybe we can use the previous one ?

Right. Fixed.

>
> 2)
> static inline Size max_items_to_alloc_size(int max_items);
>
> This old function declaration can be deleted.

Removed.

>
> 3)
> --- a/src/tools/pgindent/typedefs.list
> +++ b/src/tools/pgindent/typedefs.list
>
> I think we need to remove LVShared, LVSharedIndStats, LVDeadItems and
> LVParallelState from typedefs.list and add PVShared and PVIndStats to the 
> file.

Fixed.

These comments are incorporated into the patch I just submitted[1].

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoB66GqEjHttbRd0_fy9hnBPJp8kBCWnMq87mG6V_BODSA%40mail.gmail.com

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: [PROPOSAL] new diagnostic items for the dynamic sql

2021-12-16 Thread Dinesh Chemuduru

On Fri, 3 Dec 2021 at 22:04, Zhihong Yu  wrote:

>
>
> On Fri, Dec 3, 2021 at 3:15 AM Dinesh Chemuduru 
> wrote:
>
>> Hi Michael,
>>
>> Attaching the latest patch here(It's the recent patch), and looking for
>> more suggestions/inputs from the team.
>>
>> On Fri, 3 Dec 2021 at 13:09, Michael Paquier  wrote:
>>
>>> On Wed, Nov 10, 2021 at 01:58:58PM +0530, Dinesh Chemuduru wrote:
>>> > The proposed statements are much clear, but will wait for other’s
>>> > suggestion, and will fix it accordingly.
>>>
>>> This update was three weeks ago, and no new version has been
>>> provided, so I am marking this as returned with feedback in the CF
>>> app.  If you can work more on this proposal and send an updated patch,
>>> please feel free to resubmit.
>>> --
>>> Michael
>>>
>> Hi,
>
> +int
> +set_errquery(const char *query)
>
> Agreed,

The other error log relateds functions are also not following the void as
return type and they are using the int.
So, I tried to submit the same behaviour.

See other error log related functions in src/backend/utils/error/elog.c


> Since the return value is ignored, the return type can be void.
>
> Cheers
>

Re: parallel vacuum comments

2021-12-16 Thread Masahiko Sawada

On Thu, Dec 16, 2021 at 10:38 PM Amit Kapila  wrote:
>
> On Thu, Dec 16, 2021 at 6:13 PM Masahiko Sawada  wrote:
> >
> > On Thu, Dec 16, 2021 at 1:56 PM Amit Kapila  wrote:
> > >
> > > On Wed, Dec 15, 2021 at 1:33 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > I've attached an updated patch. The patch incorporated several changes
> > > > from the last version:
> > > >
> > > > * Rename parallel_vacuum_begin() to parallel_vacuum_init()
> > > > * Unify the terminology; use "index bulk-deletion" and "index cleanup"
> > > > instead of "index vacuum" and "index cleanup".
> > > >
> > >
> > > I am not sure it is a good idea to do this as part of the main patch
> > > as the intention of that is to just refactor parallel vacuum code. I
> > > suggest doing this as a separate patch.
> >
> > Okay.
> >
> > >  Also, can we move the common
> > > code to be shared between vacuumparallel.c and vacuumlazy.c as a
> > > separate patch?
> >
> > You mean vac_tid_reaped() and vac_cmp_itemptr() etc.? If so, do both
> > vacuumparallel.c and vacuumlazy.c have the same functions?
> >
>
> Why that would be required? I think both can call the common exposed
> function like the one you have in your patch bulkdel_one_index or if
> we directly move lazy_vacuum_one_index as part of common code. Similar
> for cleanup function.

Understood.

>
> > >
> > > Few other comments and questions:
> > > 
> > > 1. /* Outsource everything to parallel variant */
> > > - parallel_vacuum_process_all_indexes(vacrel, true);
> > > + LVSavedErrInfo saved_err_info;
> > > +
> > > + /*
> > > + * Outsource everything to parallel variant. Since parallel vacuum will
> > > + * set the error context on an error we temporarily disable setting our
> > > + * error context.
> > > + */
> > > + update_vacuum_error_info(vacrel, _err_info,
> > > + VACUUM_ERRCB_PHASE_UNKNOWN,
> > > + InvalidBlockNumber, InvalidOffsetNumber);
> > > +
> > > + parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, 
> > > vacrel->old_live_tuples);
> > > +
> > > + /* Revert to the previous phase information for error traceback */
> > > + restore_vacuum_error_info(vacrel, _err_info);
> > >
> > > Is this change because you want a separate error callback for parallel
> > > vacuum? If so, I suggest we can discuss this as a separate patch from
> > > the refactoring patch.
> >
> > Because it seems natural to me that the leader and worker use the same
> > error callback.
> >
> > Okay, I'll remove that change in the next version patch.
> >
> > > 2. Is introducing bulkdel_one_index/cleanup_one_index related to new
> > > error context, or "Unify the terminology" task? Is there any other
> > > reason for the same?
> >
> > Because otherwise both vacuumlazy.c and vacuumparallel.c will have the
> > same functions.
> >
> > >
> > > 3. Why did you introduce
> > > parallel_vacuum_bulkdel_all_indexes()/parallel_vacuum_cleanup_all_indexes()?
> > > Is it because of your task "Unify the terminology"?
> >
> > This is because parallel bulk-deletion and cleanup require different
> > numbers of inputs (num_table_tuples etc.) and the caller
> > (vacuumlazy.c) cannot set them directly to ParallelVacuumState.
> >
>
> oh, yeah, the other possibility could be to have a common structure
> that can be used for both cases. I am not sure if that is better than
> what you have.

Yes, I left them as they are in an updated patch for now. But we can
change them if others think it’s not a good idea.

>
> > >
> > > 4.
> > > @@ -3086,7 +2592,6 @@ lazy_cleanup_one_index(Relation indrel,
> > > IndexBulkDeleteResult *istat,
> > >   ivinfo.report_progress = false;
> > >   ivinfo.estimated_count = estimated_count;
> > >   ivinfo.message_level = elevel;
> > > -
> > >   ivinfo.num_heap_tuples = reltuples;
> > >
> > > This seems like an unrelated change.
> >
> > Yes, but I think it's an unnecessary break so we can change it
> > together. Should it be done in a separate patch?
> >
>
> Isn't this just spurious line removal which shouldn't be part of any patch?

Okay.

I've attached updated patches. The first patch just moves common
function for index bulk-deletion and cleanup to vacuum.c. And the
second patch moves parallel vacuum code to vacuumparallel.c. The
comments I got so far are incorporated into these patches. Please
review them.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v9-0001-Move-index-vacuum-routines-to-vacuum.c.patch
Description: Binary data


v9-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patch
Description: Binary data

Re: Failed transaction statistics to measure the logical replication progress

2021-12-16 Thread Kyotaro Horiguchi

At Thu, 16 Dec 2021 11:36:46 +, "osumi.takami...@fujitsu.com" 
 wrote in 
> On Thursday, December 16, 2021 4:00 PM I wrote:
> > Thank you, everyone for confirming the direction.
> > I'll follow the consensus of the community and fix the patch, including 
> > other
> > comments.
> > I'll treat only the stats for apply workers.
> Hi, created a new version v17 according to the recent discussion
> with changes to address other review comments.
> 
> Kindly have a look at it.

It sends stats packets at every commit-like operation on apply
workers.  The current pgstat is so smart that it refrain from sending
stats packets at too high frequency.  We already suffer frequent stats
packets so apply workers need to bahave the same way.

That is, the new stats numbers are once accumulated locally then the
accumulated numbers are sent to stats collector by pgstat_report_stat.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Emit a warning if the extension's GUC is set incorrectly

2021-12-16 Thread Fujii Masao





On 2021/12/17 11:25, Shinya Kato wrote:

For now, I'v attached the patch that fixed the compilation error.


Thanks for updating the patch! I confirmed that the compilation was completed 
successfully with this new patch. But the regression test failed. You can see 
that Patch Tester [1] also reported the failure of regression test for your 
patch.

[1]
http://commitfest.cputube.org/

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Column Filtering in Logical Replication

2021-12-16 Thread Amit Kapila

On Wed, Dec 15, 2021 at 1:05 AM Alvaro Herrera  wrote:
>
> On 2021-Dec-14, Tomas Vondra wrote:
>
> > Yeah. I think it's not clear if this should behave more like an index or a
> > view. When an indexed column gets dropped we simply drop the index. But if
> > you drop a column referenced by a view, we fail with an error. I think we
> > should handle this more like a view, because publications are externally
> > visible objects too (while indexes are pretty much just an implementation
> > detail).
>
> I agree -- I think it's more like a view than like an index.  (The
> original proposal was that if you dropped a column that was part of the
> column list of a relation in a publication, the entire relation is
> dropped from the view,
>

I think in the above sentence, you mean to say "dropped from the
publication". So, IIUC, you are proposing that if one drops a column
that was part of the column list of a relation in a publication, an
error will be raised. Also, if the user specifies CASCADE in Alter
Table ... Drop Column, then we drop the relation from publication. Is
that right? BTW, this is somewhat on the lines of what row_filter
patch is also doing where if the user drops the column that was part
of row_filter for a relation in publication, we give an error and if
the user tries to drop the column with CASCADE then the relation is
removed from the publication.

-- 
With Regards,
Amit Kapila.

Re: WIP: WAL prefetch (another approach)

2021-12-16 Thread Tom Lane

Greg Stark  writes:
> But if you're interested and can explain the tests to run I can try to
> get the tests running on this machine:

I'm not sure that machine is close enough to prove much, but by all
means give it a go if you wish.  My test setup was explained in [1]:

>> To recap, the test lashup is:
>> * 2003 PowerMac G4 (1.25GHz PPC 7455, 7200 rpm spinning-rust drive)
>> * Standard debug build (--enable-debug --enable-cassert)
>> * Out-of-the-box configuration, except add wal_consistency_checking = all
>> and configure a wal-streaming standby on the same machine
>> * Repeatedly run "make installcheck-parallel", but skip the tablespace
>> test to avoid issues with the standby trying to use the same directory
>> * Delay long enough after each installcheck-parallel to let the 
>> standby catch up (the run proper is ~24 min, plus 2 min for catchup)

Remember also that the code in question is not in HEAD; you'd
need to apply Munro's patches, or check out some commit from
around 2021-04-22.

regards, tom lane

[1] https://www.postgresql.org/message-id/3502526.1619925367%40sss.pgh.pa.us

Re: Add sub-transaction overflow status in pg_stat_activity

2021-12-16 Thread Justin Pryzby

On Fri, Dec 17, 2021 at 09:00:04AM +0530, Dilip Kumar wrote:
> On Tue, Dec 14, 2021 at 3:57 AM Bossart, Nathan  wrote:
> >
> > On 12/13/21, 6:30 AM, "Dilip Kumar"  wrote:
> > > On Tue, Dec 7, 2021 at 11:11 AM Justin Pryzby  
> > > wrote:
> > >> Since I think this field is usually not interesting to most users of
> > >> pg_stat_activity, maybe this should instead be implemented as a function 
> > >> like
> > >> pg_backend_get_subxact_status(pid).
> > >>
> > >> People who want to could use it like:
> > >> SELECT * FROM pg_stat_activity psa, pg_backend_get_subxact_status(pid) 
> > >> sub;
> > >
> > > I have provided two function, one for subtransaction counts and other
> > > whether subtransaction cache is overflowed or not, we can use like
> > > this,  if we think this is better way to do it then we can also add
> > > another function for the lastOverflowedXid
> >
> > The general approach looks good to me.  I think we could have just one
> > function for all three values, though.
> 
> If we create just one function then the output type will be a tuple
> then we might have to add another view on top of that.  Is there any
> better way to do that?

I don't think you'd need to add a view on top of it.

Compare:

postgres=# SELECT 1, pg_config() LIMIT 1;
 ?column? | pg_config  
--+
1 | (BINDIR,/usr/pgsql-14/bin)

postgres=# SELECT 1, c FROM pg_config() c LIMIT 1;
 ?column? | c  
--+
1 | (BINDIR,/usr/pgsql-14/bin)

postgres=# SELECT 1, c.* FROM pg_config() c LIMIT 1;
 ?column? |  name  |  setting  
--++---
1 | BINDIR | /usr/pgsql-14/bin

Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-16 Thread David G. Johnston

On Thursday, December 16, 2021, Yugo NAGATA  wrote:

>
> Also, here seem to be some use cases. For example,
> - when you want to delete the specified number of rows from a table
>   that doesn't have a primary key and contains tuple duplicated.


Not our problem…use the tools correctly; there is always the hack
work-around for the few who didn’t.


> - when you want to delete the bottom 10 items with bad scores
>   (without using rank() window function).


This one doesn’t make sense to me.

- when you want to delete only some of rows because it takes time
>   to delete all of them.
>
>
This seems potentially compelling though I’d be more concerned about the
memory aspects than simply taking a long amount of time.  If this is a
problem then maybe discuss it without having a solution-in-hand?  But given
the intense I/O cost that would happen spreading this out over time seems
acceptable and it should be an infrequent thing to do.  Expecting users to
plan and execute some custom code for their specific need seems reasonable.

So even if Tom’s technical concerns aren’t enough working on this based
upon these uses cases doesn’t seem of high enough benefit.

David J.

Re: Add sub-transaction overflow status in pg_stat_activity

2021-12-16 Thread Dilip Kumar

On Tue, Dec 14, 2021 at 6:23 PM Ashutosh Sharma  wrote:
>
> Hi,
>
> I have looked into the v2 patch and here are my comments:
>
> +   PG_RETURN_INT32(local_beentry->subxact_overflowed);
> +}
>
> Should this be PG_RETURN_BOOL instead of PG_RETURN_INT32??
>
> --
>
> +{ oid => '6107', descr => 'statistics: cached subtransaction count of 
> backend',
> +  proname => 'pg_stat_get_backend_subxact_count', provolatile => 's', 
> proparallel => 'r',
> +  prorettype => 'int4', proargtypes => 'int4',
> +  prosrc => 'pg_stat_get_backend_subxact_count' },
> +{ oid => '6108', descr => 'statistics: subtransaction cache of backend 
> overflowed',
> +  proname => 'pg_stat_get_backend_subxact_overflow', provolatile => 's', 
> proparallel => 'r',
> +  prorettype => 'bool', proargtypes => 'int4',
> +  prosrc => 'pg_stat_get_backend_subxact_overflow' },
>
> The description says that the two new functions show the statistics for 
> "cached subtransaction count of backend" and "subtransaction cache of backend 
> overflowed". But, when these functions are called it shows these stats for 
> the non-backend processes like checkpointer, walwriter etc as well. Should 
> that happen?
>
> --
>
> typedef struct LocalPgBackendStatus
>  * not.
>  */
> TransactionId backend_xmin;
> +
> +   /*
> +* Number of cached subtransactions in the current session.
> +*/
> +   int subxact_count;
> +
> +   /*
> +* The number of subtransactions in the current session exceeded the 
> cached
> +* subtransaction limit.
> +*/
> +   bool subxact_overflowed;
>
> All the variables inside this LocalPgBackendStatus structure are prefixed 
> with "backend" word. Can we do the same for the newly added variables as well?
>
> --
>
> + * Get the xid and xmin, nsubxid and overflow status of the backend. The
>
> Should this be put as - "xid, xmin, nsubxid and overflow" instead of "xid and 
> xmin, nsubxid and overflow"?

Thanks, Ashutosh, I will work on your comments and post an updated
version by next week.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Add sub-transaction overflow status in pg_stat_activity

2021-12-16 Thread Dilip Kumar

On Tue, Dec 14, 2021 at 3:57 AM Bossart, Nathan  wrote:
>
> On 12/13/21, 6:30 AM, "Dilip Kumar"  wrote:
> > On Tue, Dec 7, 2021 at 11:11 AM Justin Pryzby  wrote:
> >> Since I think this field is usually not interesting to most users of
> >> pg_stat_activity, maybe this should instead be implemented as a function 
> >> like
> >> pg_backend_get_subxact_status(pid).
> >>
> >> People who want to could use it like:
> >> SELECT * FROM pg_stat_activity psa, pg_backend_get_subxact_status(pid) sub;
> >
> > I have provided two function, one for subtransaction counts and other
> > whether subtransaction cache is overflowed or not, we can use like
> > this,  if we think this is better way to do it then we can also add
> > another function for the lastOverflowedXid
>
> The general approach looks good to me.  I think we could have just one
> function for all three values, though.

If we create just one function then the output type will be a tuple
then we might have to add another view on top of that.  Is there any
better way to do that?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-16 Thread Tom Lane

Yugo NAGATA  writes:
> We cannot use ORDER BY or LIMIT/OFFSET in the current
> DELETE statement syntax, so all the row matching the
> WHERE condition are deleted. However, the tuple retrieving
> process of DELETE is basically same as SELECT statement,
> so I think that we can also allow DELETE to use ORDER BY
> and LIMIT/OFFSET.

Indeed, this is technically possible, but we've rejected the idea
before and I'm not aware of any reason to change our minds.
The problem is that a partial DELETE is not very deterministic
about which rows are deleted, and that does not seem like a
great property for a data-updating command.  (The same applies
to UPDATE, which is why we don't allow these options in that
command either.)  The core issues are:

* If the sort order is underspecified, or you omit ORDER BY
entirely, then it's not clear which rows will be operated on.
The LIMIT might stop after just some of the rows in a peer
group, and you can't predict which ones.

* UPDATE/DELETE necessarily involve the equivalent of SELECT
FOR UPDATE, which may cause the rows to be ordered more
surprisingly than you expected, ie the sort happens *before*
rows are replaced by their latest versions, which might have
different sort keys.

We live with this amount of indeterminism in SELECT, but that
doesn't make it a brilliant idea to allow it in UPDATE/DELETE.

regards, tom lane

Re: WIP: WAL prefetch (another approach)

2021-12-16 Thread Greg Stark

The actual hardware of this machine is a Mac Mini Core 2 Duo. I'm not
really clear how the emulation is done and whether it makes a
reasonable test environment or not.

Hardware Overview:

  Model Name: Mac mini
  Model Identifier: Macmini2,1
  Processor Name: Intel Core 2 Duo
  Processor Speed: 2 GHz
  Number Of Processors: 1
  Total Number Of Cores: 2
  L2 Cache: 4 MB
  Memory: 2 GB
  Bus Speed: 667 MHz
  Boot ROM Version: MM21.009A.B00

Re: Allow escape in application_name

2021-12-16 Thread Kyotaro Horiguchi

At Fri, 17 Dec 2021 02:42:25 +, "kuroda.hay...@fujitsu.com" 
 wrote in 
> Dear Fujii-san,
> 
> Sorry I forgot replying your messages.
> 
> > >>  if (strcmp(keywords[i], "application_name") == 0 &&
> > >>  values[i] != NULL && *(values[i]) != '\0')
> > >
> > > I'm not sure but do we have a case that values[i] becomes NULL
> > > even if keywords[i] is "application_name"?
> > 
> > No for now, I guess. But isn't it safer to check that, too? I also could 
> > not find strong
> > reason why that check should be dropped. But you'd like to drop that?
> 
> No, I agreed the new checking. I'm just afraid of my code missing.

FWIW, I don't understand why we care of the case where the function
itself changes its mind to set values[i] to null while we ignore the
possibility that some module at remote is modified so that some global
variables to be NULL.  I don't mind wheter we care for NULLs or not
but I think we should be consistent at least in a single patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Allow escape in application_name

2021-12-16 Thread kuroda.hay...@fujitsu.com

Dear Fujii-san,

Sorry I forgot replying your messages.

> >>if (strcmp(keywords[i], "application_name") == 0 &&
> >>values[i] != NULL && *(values[i]) != '\0')
> >
> > I'm not sure but do we have a case that values[i] becomes NULL
> > even if keywords[i] is "application_name"?
> 
> No for now, I guess. But isn't it safer to check that, too? I also could not 
> find strong
> reason why that check should be dropped. But you'd like to drop that?

No, I agreed the new checking. I'm just afraid of my code missing.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: WIP: WAL prefetch (another approach)

2021-12-16 Thread Greg Stark

On Fri, 26 Nov 2021 at 21:47, Tom Lane  wrote:
>
> Yeah ... on the one hand, that machine has shown signs of
> hard-to-reproduce flakiness, so it's easy to write off the failures
> I saw as hardware issues.  On the other hand, the flakiness I've
> seen has otherwise manifested as kernel crashes, which is nothing
> like the consistent test failures I was seeing with the patch.

Hm. I asked around and found a machine I can use that can run PPC
binaries, but it's actually, well, confusing. I think this is an x86
machine running Leopard which uses JIT to transparently run PPC
binaries. I'm not sure this is really a good test.

But if you're interested and can explain the tests to run I can try to
get the tests running on this machine:

IBUILD:~ gsstark$ uname -a
Darwin IBUILD.MIT.EDU 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15
16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

IBUILD:~ gsstark$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.5.8
BuildVersion: 9L31a

Re: Emit a warning if the extension's GUC is set incorrectly

2021-12-16 Thread Shinya Kato


On 2021-12-17 01:55, Fujii Masao wrote:

On 2021/12/16 16:31, Shinya Kato wrote:

Thank you for the review and sorry for the late reply.

On 2021-11-16 19:25, Bharath Rupireddy wrote:

> I observed an odd behaviour:
> 1) I set postgres_fdw.XXX = 'I_messed_up_conf_file' in postgresql.conf
> 2) With EmitWarningsOnPlaceholders("postgres_fdw"); in postgres_fdw
> contrib module, I created the extension, have seen the following
> warning:
> 2021-11-15 06:02:31.198 UTC [2018111] WARNING:  unrecognized
> configuration parameter "postgres_fdw.XXX"
> 3) I further did, "alter system set
> postgres_fdw.XXX='I_further_messed_up_conf_file';" and also "select
> pg_reload_conf();", it silently accepts.
>
> postgres=# create extension postgres_fdw ;
> WARNING:  unrecognized configuration parameter "postgres_fdw.XXX"
> CREATE EXTENSION
> postgres=# alter system set
> postgres_fdw.XXX='I_further_messed_up_conf_file';
> ALTER SYSTEM
> postgres=# select pg_reload_conf();
>  pg_reload_conf
> 
>  t
> (1 row)


I have made changes to achieve the above.


IMO this behavior change is not good. For example, because it seems to
break at least the following case. I think that these are the valid
steps, but with the patch, the server fails to start up at the step #2
because pg_trgm's custom parameters are treated as invalid ones.

1. Add the following two pg_trgm parameters to postgresql.conf
- pg_trgm.similarity_threshold
- pg_trgm.strict_word_similarity_threshold

2. Start the server

3. Run "CREATE EXTENSION pg_trgm" and then use pg_trgm


It can happen because the placeholder "pg_trgm" is not already 
registered.

I think too this behavior is not good.
I need to consider an another implementation or to allow undefined GUCs 
to be set.




When I compiled PostgreSQL with the patch, I got the following
compilation failure.

guc.c:5453:4: error: implicit declaration of function
'EmitErrorsOnPlaceholders' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
EmitErrorsOnPlaceholders(placeholder);
^


-   ereport(WARNING,
+   ereport(ERROR,


Thanks! There was a correction omission, so I fixed it.


I'm still not sure why you were thinking that ERROR is more proper 
here.


Since I get an ERROR when I set the wrong normal GUCs, I thought I 
should also get an ERROR when I set the wrong extension's GUCs.
However, there are likely to be harmful effects, and I may need to think 
again.



For now, I'v attached the patch that fixed the compilation error.

--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/contrib/auth_delay/auth_delay.c b/contrib/auth_delay/auth_delay.c
index 5820ac328d..d11dd1e416 100644
--- a/contrib/auth_delay/auth_delay.c
+++ b/contrib/auth_delay/auth_delay.c
@@ -67,6 +67,9 @@ _PG_init(void)
 			NULL,
 			NULL,
 			NULL);
+
+	EmitWarningsOnPlaceholders("auth_delay");
+
 	/* Install Hooks */
 	original_client_auth_hook = ClientAuthentication_hook;
 	ClientAuthentication_hook = auth_delay_checks;
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index fb38135f7a..0407c7dd64 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -100,6 +100,8 @@ _PG_init(void)
 			 NULL,
 			 NULL,
 			 NULL);
+
+	EmitWarningsOnPlaceholders("pg_trgm");
 }
 
 /*
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 48c7417e6e..36555398ec 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -469,4 +469,6 @@ _PG_init(void)
 			   NULL,
 			   NULL,
 			   NULL);
+
+	EmitWarningsOnPlaceholders("postgres_fdw");
 }
diff --git a/contrib/sepgsql/hooks.c b/contrib/sepgsql/hooks.c
index 19a3ffb7ff..44a09c35a7 100644
--- a/contrib/sepgsql/hooks.c
+++ b/contrib/sepgsql/hooks.c
@@ -455,6 +455,8 @@ _PG_init(void)
 			 NULL,
 			 NULL);
 
+	EmitWarningsOnPlaceholders("sepgsql");
+
 	/* Initialize userspace access vector cache */
 	sepgsql_avc_init();
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e91d5a3cfd..c7c0179c5c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -5431,11 +5431,19 @@ valid_custom_variable_name(const char *name)
 {
 	bool		saw_sep = false;
 	bool		name_start = true;
+	int 		placeholder_length = 0;
 
 	for (const char *p = name; *p; p++)
 	{
+		placeholder_length++;
 		if (*p == GUC_QUALIFIER_SEPARATOR)
 		{
+			/* Check invalid placeholder and custom parameter*/
+			char *placeholder = malloc(sizeof(char) * (placeholder_length - 1));
+			strncpy(placeholder, name, placeholder_length - 1);
+			placeholder[placeholder_length -1] = '\0';
+			EmitWarningsOnPlaceholders(placeholder);
+
 			if (name_start)
 return false;	/* empty name component */
 			saw_sep = true;
@@ -9322,6 +9330,13 @@ DefineCustomEnumVariable(const char *name,

RE: Allow escape in application_name

2021-12-16 Thread kuroda.hay...@fujitsu.com

Dear Fujii-san,

> Thanks for the review! At first I pushed 0001 patch.

I found your commit. Thanks!

> BTW, 0002 patch adds the regression test that checks
> pg_stat_activity.application_name. But three months before, we added the 
> similar
> test when introducing postgres_fdw.application_name GUC and
> reverted/removed it because it's not stable [1]. So we should review carefully
> whether the test 0002 patch adds may have the same issue or not. As far as I 
> read
> the patch, ISTM that the patch has no same issue. But could you double check
> that?

I agreed we will not face the problem.
When postgres_fdw_disconnect_all() is performed, we just send a character 'X' to
remote backends(in sendTerminateConn() and lower functions) and return without 
any blockings.
After receiving 'X' message in remote backends, proc_exit() is performed and 
processes
will be died. The test failure is caused because SELECT statement is performed
before dying backends perfectly. 
Currently we search pg_stat_activity and this is not affected by residual rows
because the condition is too strict to exist others.
Hence I think this test is stable. How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-16 Thread Yugo NAGATA

On Fri, 17 Dec 2021 09:47:18 +0900
Yugo NAGATA  wrote:

> Hello hackers,
> 
> We cannot use ORDER BY or LIMIT/OFFSET in the current
> DELETE statement syntax, so all the row matching the
> WHERE condition are deleted. However, the tuple retrieving
> process of DELETE is basically same as SELECT statement,
> so I think that we can also allow DELETE to use ORDER BY
> and LIMIT/OFFSET.
> 
> Attached is the concept patch. This enables the following
> operations:

After post this, I noticed that there are several similar
proposals in past:

https://www.postgresql.org/message-id/flat/AANLkTi%3D6fBZh9yZT7f7kKh%2BzmQngAyHgZWBPM3eiEMj1%40mail.gmail.com
https://www.postgresql.org/message-id/flat/1393112801.59251.YahooMailNeo%40web163006.mail.bf1.yahoo.com
https://www.postgresql.org/message-id/flat/CADB9FDf-Vh6RnKAMZ4Rrg_YP9p3THdPbji8qe4qkxRuiOwm%3Dmg%40mail.gmail.com
https://www.postgresql.org/message-id/flat/CALAY4q9fcrscybax7fg_uojFwjw_Wg0UMuSrf-FvN68SeSAPAA%40mail.gmail.com

Anyway, I'll review these threads before progressing it.

> 
> 
> postgres=# select * from t order by i;
>  i  
> 
>   1
>   2
>   2
>   2
>   2
>   5
>  10
>  20
>  33
>  35
>  53
> (11 rows)
> 
> postgres=# delete from t where i = 2 limit 2;
> DELETE 2
> postgres=# select * from t order by i;
>  i  
> 
>   1
>   2
>   2
>   5
>  10
>  20
>  33
>  35
>  53
> (9 rows)
> 
> postgres=# delete from t order by i offset 3 limit 3;
> DELETE 3
> postgres=# select * from t order by i;
>  i  
> 
>   1
>   2
>   2
>  33
>  35
>  53
> (6 rows)
> 
> 
> Although we can do the similar operations using ctid and a subquery
> such as
> 
>  DELETE FROM t WHERE ctid IN (SELECT ctid FROM t WHERE ... ORDER BY ... LIMIT 
> ...),
> 
> it is more user friendly and intuitive to allow it in the DELETE syntax
> because ctid is a system column and most users may not be familiar with it.
> 
> Although this is not allowed in the SQL standard, it is supported
> in MySQL[1]. DB2 also supports it although the syntax is somewhat
> strange.[2]
> 
> Also, here seem to be some use cases. For example, 
> - when you want to delete the specified number of rows from a table
>   that doesn't have a primary key and contains tuple duplicated.
> - when you want to delete the bottom 10 items with bad scores
>   (without using rank() window function).
> - when you want to delete only some of rows because it takes time
>   to delete all of them.
> 
> [1] https://dev.mysql.com/doc/refman/8.0/en/delete.html
> [2] 
> https://www.dba-db2.com/2015/04/delete-first-1000-rows-in-a-db2-table-using-fetch-first.html
> 
> How do you think it?
> 
> -- 
> Yugo NAGATA 


-- 
Yugo NAGATA

Re: Apple's ranlib warns about protocol_openssl.c

2021-12-16 Thread Thomas Munro

On Fri, Dec 17, 2021 at 9:38 AM Tom Lane  wrote:
> Daniel Gustafsson  writes:
> >> On 16 Dec 2021, at 19:22, Tom Lane  wrote:
> >> Having said that, I'm not seeing any such warning when I build
> >> with openssl 1.1.1k on my own Mac, so I'm a bit confused why
> >> Thomas sees it.
>
> > Maybe it's dependant on macOS/XCode release?  I see the warning on my 
> > Catalina
> > laptop.
>
> Could be.  I tried it on Monterey, but not anything older.
> (longfin is still on Big Sur, because I've been lazy about
> updating it.)

Hmm.  Happened[1] with Andres's CI scripts, which (at least on the
version I used here, may not be his latest) runs on macOS Monterey and
installs openssl from brew which is apparently 3.0.0.  Wild guess:
some versions of openssl define functions, and some define macros, and
here we're looking for the macros?

https://cirrus-ci.com/task/6100205941555200

Re: Post-CVE Wishlist

2021-12-16 Thread Jacob Champion

On Fri, 2021-12-10 at 15:43 +0200, Heikki Linnakangas wrote:
> ProcessStartupPacket() currently reads the first 4 bytes coming from the 
> client to decide what kind of a connection it is, and I believe a TLS 
> ClientHello message always begins with the same sequence of bytes, so it 
> would be easy to check for.
> 
> You could use recv(.., MSG_PEEK | MSG_WAITALL) flags to leave the bytes 
> in the OS buffer. Not sure how portable that is, though. Alternatively, 
> you could stash them e.g. in a global variable and modify 
> secure_raw_read() to return those bytes first.
> 
> Overall, doesn't seem very hard to me.

After further thought... Seems like sharing a port between implicit and
explicit TLS will still allow a MITM to put bytes on the wire to try to
attack the client-to-server communication, because they can craft the
SSLRequest themselves and then hand it off to the real client.

But they shouldn't be able to attack the server-to-client communication
if the client is using implicit TLS, so it's still an overall
improvement? I wonder if there are any other protocols out there doing this.

--Jacob

Re: row filtering for logical replication

2021-12-16 Thread Peter Smith

On Tue, Dec 14, 2021 at 10:12 PM Amit Kapila  wrote:
>
> On Tue, Dec 14, 2021 at 10:50 AM Amit Kapila  wrote:
> >
> > On Tue, Dec 14, 2021 at 4:44 AM Peter Smith  wrote:
> >
> > Few other comments:
> > ===
>
> Few more comments:
> ==
> v46-0001/0002
> ===
> 1. After rowfilter_walker() why do we need
> EXPR_KIND_PUBLICATION_WHERE? I thought this is primarily to identify
> the expressions that are not allowed in rowfilter which we are now
> able to detect upfront with the help of a walker. Can't we instead use
> EXPR_KIND_WHERE?
>

Fixed in v47 [1]

> 2.
> +Node *
> +GetTransformedWhereClause(ParseState *pstate, PublicationRelInfo *pri,
> +   bool bfixupcollation)
>
> Can we add comments atop this function?
>

Fixed in v47 [1]

> 3. In GetTransformedWhereClause, can we change the name of variables
> (a) bfixupcollation to fixup_collation or assign_collation, (b)
> transformedwhereclause to whereclause. I think that will make the
> function more readable.
>

Fixed in v47 [1]

>
> v46-0002
> 
> 4.
> + else if (IsA(node, List) || IsA(node, Const) || IsA(node, BoolExpr)
> || IsA(node, NullIfExpr) ||
> + IsA(node, NullTest) || IsA(node, BooleanTest) || IsA(node, CoalesceExpr) ||
> + IsA(node, CaseExpr) || IsA(node, CaseTestExpr) || IsA(node, MinMaxExpr) ||
> + IsA(node, ArrayExpr) || IsA(node, ScalarArrayOpExpr) || IsA(node, XmlExpr))
>
> Can we move this to a separate function say IsValidRowFilterExpr() or
> something on those lines and use Switch (nodetag(node)) to identify
> these nodes?
>

Fixed in v47 [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtjsj_OVMWEdYp2Wq19%3DH5D4Vgta43FbFVDYr2LuS_djg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-16 Thread Yugo NAGATA

Hello hackers,

We cannot use ORDER BY or LIMIT/OFFSET in the current
DELETE statement syntax, so all the row matching the
WHERE condition are deleted. However, the tuple retrieving
process of DELETE is basically same as SELECT statement,
so I think that we can also allow DELETE to use ORDER BY
and LIMIT/OFFSET.

Attached is the concept patch. This enables the following
operations:


postgres=# select * from t order by i;
 i  

  1
  2
  2
  2
  2
  5
 10
 20
 33
 35
 53
(11 rows)

postgres=# delete from t where i = 2 limit 2;
DELETE 2
postgres=# select * from t order by i;
 i  

  1
  2
  2
  5
 10
 20
 33
 35
 53
(9 rows)

postgres=# delete from t order by i offset 3 limit 3;
DELETE 3
postgres=# select * from t order by i;
 i  

  1
  2
  2
 33
 35
 53
(6 rows)


Although we can do the similar operations using ctid and a subquery
such as

 DELETE FROM t WHERE ctid IN (SELECT ctid FROM t WHERE ... ORDER BY ... LIMIT 
...),

it is more user friendly and intuitive to allow it in the DELETE syntax
because ctid is a system column and most users may not be familiar with it.

Although this is not allowed in the SQL standard, it is supported
in MySQL[1]. DB2 also supports it although the syntax is somewhat
strange.[2]

Also, here seem to be some use cases. For example, 
- when you want to delete the specified number of rows from a table
  that doesn't have a primary key and contains tuple duplicated.
- when you want to delete the bottom 10 items with bad scores
  (without using rank() window function).
- when you want to delete only some of rows because it takes time
  to delete all of them.

[1] https://dev.mysql.com/doc/refman/8.0/en/delete.html
[2] 
https://www.dba-db2.com/2015/04/delete-first-1000-rows-in-a-db2-table-using-fetch-first.html

How do you think it?

-- 
Yugo NAGATA 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 297b6ee715..c596bd80ea 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3241,6 +3241,11 @@ _copyDeleteStmt(const DeleteStmt *from)
 	COPY_NODE_FIELD(returningList);
 	COPY_NODE_FIELD(withClause);
 
+	COPY_NODE_FIELD(sortClause);
+	COPY_NODE_FIELD(limitOffset);
+	COPY_NODE_FIELD(limitCount);
+	COPY_SCALAR_FIELD(limitOption);
+
 	return newnode;
 }
 
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index f537d3eb96..84f509b57b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1050,6 +1050,11 @@ _equalDeleteStmt(const DeleteStmt *a, const DeleteStmt *b)
 	COMPARE_NODE_FIELD(returningList);
 	COMPARE_NODE_FIELD(withClause);
 
+	COMPARE_NODE_FIELD(sortClause);
+	COMPARE_NODE_FIELD(limitOffset);
+	COMPARE_NODE_FIELD(limitCount);
+	COMPARE_SCALAR_FIELD(limitOption);
+
 	return true;
 }
 
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e276264882..8a201749bf 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3668,6 +3668,12 @@ raw_expression_tree_walker(Node *node,
 	return true;
 if (walker(stmt->withClause, context))
 	return true;
+if (walker(stmt->sortClause, context))
+	return true;
+if (walker(stmt->limitOffset, context))
+	return true;
+if (walker(stmt->limitCount, context))
+	return true;
 			}
 			break;
 		case T_UpdateStmt:
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 146ee8dd1e..015e879f7a 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -471,6 +471,12 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt)
 	qual = transformWhereClause(pstate, stmt->whereClause,
 EXPR_KIND_WHERE, "WHERE");
 
+	qry->sortClause = transformSortClause(pstate,
+		  stmt->sortClause,
+		  >targetList,
+		  EXPR_KIND_ORDER_BY,
+		  false /* allow SQL92 rules */ );
+
 	qry->returningList = transformReturningList(pstate, stmt->returningList);
 
 	/* done building the range table and jointree */
@@ -482,6 +488,15 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt)
 	qry->hasTargetSRFs = pstate->p_hasTargetSRFs;
 	qry->hasAggs = pstate->p_hasAggs;
 
+	/* transform LIMIT */
+	qry->limitOffset = transformLimitClause(pstate, stmt->limitOffset,
+			EXPR_KIND_OFFSET, "OFFSET",
+			stmt->limitOption);
+	qry->limitCount = transformLimitClause(pstate, stmt->limitCount,
+		   EXPR_KIND_LIMIT, "LIMIT",
+		   stmt->limitOption);
+	qry->limitOption = stmt->limitOption;
+
 	assign_query_collations(pstate, qry);
 
 	/* this must be done after collations, for reliable comparison of exprs */
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 86ce33bd97..0c6d11c23c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11207,6 +11207,7 @@ returning_clause:
 
 DeleteStmt:

Re: row filtering for logical replication

2021-12-16 Thread Peter Smith

On Tue, Dec 14, 2021 at 4:20 PM Amit Kapila  wrote:
>
> On Tue, Dec 14, 2021 at 4:44 AM Peter Smith  wrote:
> >
> > On Tue, Dec 7, 2021 at 5:48 PM tanghy.f...@fujitsu.com
> >  wrote:
> > >
> > > I think for "FOR ALL TABLE" publication(p2 in my case), table tbl should 
> > > be
> > > treated as no filter, and table tbl should have no filter in subscription 
> > > sub. Thoughts?
> > >
> > > But for now, the filter(a > 10) works both when copying initial data and 
> > > later changes.
> > >
> > > To fix it, I think we can check if the table is published in a 'FOR ALL 
> > > TABLES'
> > > publication or published as part of schema in function 
> > > pgoutput_row_filter_init
> > > (which was introduced in v44-0003 patch), also we need to make some 
> > > changes in
> > > tablesync.c.
> > >
> >
> > Partly fixed in v46-0005  [1]
> >
> > NOTE
> > - The initial COPY part of the tablesync does not take the publish
> > operation into account so it means that if any of the subscribed
> > publications have "puballtables" flag then all data will be copied
> > sans filters.
> >
>
> I think this should be okay but the way you have implemented it in the
> patch doesn't appear to be the optimal way. Can't we fetch
> allpubtables info and qual info as part of one query instead of using
> separate queries?

Fixed in v47 [1]. Now code uses a unified SQL query provided by Vignesh.

>
> > I guess this is consistent with the other decision to
> > ignore publication operations [2].
> >
> > TODO
> > - Documentation
> > - IIUC there is a similar case yet to be addressed - FOR ALL TABLES IN 
> > SCHEMA
> >
>
> Yeah, "FOR ALL TABLES IN SCHEMA" should also be addressed. In this
> case, the difference would be that we need to check the presence of
> schema corresponding to the table (for which we are fetching
> row_filter information) is there in pg_publication_namespace. If it
> exists then we don't need to apply row_filter for the table. I feel it
> is better to fetch all this information as part of the query which you
> are using to fetch row_filter info. The idea is to avoid the extra
> round-trip between subscriber and publisher.
>

Fixed in v47 [1]. Added code and TAP test case for ALL TABLES IN SCHEMA.

> Few other comments:
> ===
> 1.
> @@ -926,6 +928,22 @@ pgoutput_row_filter_init(PGOutputData *data,
> Relation relation, RelationSyncEntr
>   bool rfisnull;
>
>   /*
> + * If the publication is FOR ALL TABLES then it is treated same as if this
> + * table has no filters (even if for some other publication it does).
> + */
> + if (pub->alltables)
> + {
> + if (pub->pubactions.pubinsert)
> + no_filter[idx_ins] = true;
> + if (pub->pubactions.pubupdate)
> + no_filter[idx_upd] = true;
> + if (pub->pubactions.pubdelete)
> + no_filter[idx_del] = true;
> +
> + continue;
> + }
>
> Is there a reason to continue checking the other publications if
> no_filter is true for all kind of pubactions?
>

Fixed in v47 [1].

> 2.
> + * All row filter expressions will be discarded if there is one
> + * publication-relation entry without a row filter. That's because
> + * all expressions are aggregated by the OR operator. The row
> + * filter absence means replicate all rows so a single valid
> + * expression means publish this row.
>
> This same comment is at two places, remove from one of the places. I
> think keeping it atop for loop is better.
>

Fixed in v47 [1]

> 3.
> + {
> + int idx;
> + bool found_filters = false;
>
> I am not sure if starting such ad-hoc braces in the code to localize
> the scope of variables is a regular practice. Can we please remove
> this?
>

Fixed in v47 [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtjsj_OVMWEdYp2Wq19%3DH5D4Vgta43FbFVDYr2LuS_djg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Transparent column encryption

2021-12-16 Thread Jacob Champion

On Thu, 2021-12-09 at 11:04 +0100, Tomas Vondra wrote:
> On 12/9/21 01:12, Jacob Champion wrote:
> > 
> > The rabbit hole I led you down is one where we use the rest of the row
> > as AD, to try to freeze pieces of it in place. That might(?) have some
> > useful security properties (if the client defines its use and doesn't
> > defer to the server). But it's not what I intended to propose and I'd
> > have to think about that case some more.

So after thinking about it some more, in the case where the client is
relying on the server to return both the encrypted data and its
associated data -- and you don't trust the server -- then tying even
the entire row together doesn't help you.

I was briefly led astray by the idea that you could include a unique or
primary key column in the associated data, and then SELECT based on
that column -- but a motivated DBA could simply corrupt state so that
the row they wanted got returned regardless of the query. So the client
still has to have prior knowledge.

> > In my credit card example, I'm imagining something like (forgive the
> > contrived syntax):
> > 
> > SELECT address, :{aead(users.credit_card, 'u...@example.com')}
> >   FROM users WHERE email = 'u...@example.com';
> > 
> > UPDATE users
> >SET :{aead(users.credit_card, 'u...@example.com')} = '1234-...'
> >  WHERE email = 'u...@example.com';
> > 
> > The client explicitly links a table's column to its AD for the duration
> > of the query. This approach can't scale to
> > 
> > SELECT credit_card FROM users;
> > 
> > because in this case the AD for each row is different, but I'd argue
> > that's ideal for this particular case. The client doesn't need to (and
> > probably shouldn't) grab everyone's credit card details all at once, so
> > there's no reason to optimize for it.
> 
> Maybe, but it seems like a rather annoying limitation, as it restricts
> the client to single-row queries (or at least it looks like that to me).
> Yes, it may be fine for some use cases, but I'd bet a DBA who can modify
> data can do plenty other things - swapping "old" values, which will have
> the right AD, for example.

Resurrecting old data doesn't help the DBA read the values, right? I
view that as similar to the "increasing account balance" problem, in
that it's definitely a problem but not one we're trying to tackle here.

(And I'm not familiar with any solutions for resurrections -- other
than having data expire and tying the timestamp into the
authentication, which I think again requires AD. Revoking signed data
is one of those hard problems. Do you know a better way?)

> OK. In any case, I think we shouldn't require this capability from the
> get go - it's fine to get the simple version done first, which gives us
> privacy / protects against passive attacker. And then sometime in the
> future improve this further.

Agreed. (And I think the client should be able to enforce encryption in
the first place, before I distract you too much with other stuff.)

--Jacob

Re: Confused comment about drop replica identity index

2021-12-16 Thread Euler Taveira

On Thu, Dec 16, 2021, at 8:55 PM, Michael Paquier wrote:
> On Thu, Dec 16, 2021 at 03:08:46PM -0300, Alvaro Herrera wrote:
> > Hmm, so if a table has REPLICA IDENTITY INDEX and there is a publication
> > with an explicit column list, then we need to forbid the DROP INDEX for
> > that index.
> 
> Hmm.  I have not followed this thread very closely.
> 
> > I wonder why don't we just forbid DROP INDEX of an index that's been
> > defined as replica identity.  It seems quite silly an operation to
> > allow.
It would avoid pilot errors.

> The commit logs talk about b23b0f55 here for this code, to ease the
> handling of relcache entries for rd_replidindex.  07cacba is the
> origin of the logic (see RelationGetIndexList).  Andres?
> 
> I don't think that this is really an argument against putting more
> restrictions as anything that deals with an index drop, including the
> internal ones related to constraints, would need to go through
> index_drop(), and new features may want more restrictions in place as
> you say.
> 
> Now, I don't see a strong argument in changing this behavior either
> (aka I have not looked at what this implies for the new publication
> types), and we still need to do something for the comment/docs in
> existing branches, anyway.  So I would still fix this gap as a first
> step, then deal with the rest on HEAD as necessary.
> 
I've never understand the weak dependency between the REPLICA IDENTITY and the
index used by it. I'm afraid we will receive complaints about this unexpected
behavior (my logical replication setup is broken because I dropped an index) as
far as new logical replication features are added.  Row filtering imposes some
restrictions in UPDATEs and DELETEs (an error message is returned and the
replication stops) if a column used in the expression isn't part of the REPLICA
IDENTITY anymore.

It seems we already have some code in RangeVarCallbackForDropRelation() that
deals with a system index error condition. We could save a syscall and provide
a test for indisreplident there.

If this restriction is undesirable, we should at least document this choice and
probably emit a WARNING for DROP INDEX.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Confused comment about drop replica identity index

2021-12-16 Thread Michael Paquier

On Thu, Dec 16, 2021 at 03:08:46PM -0300, Alvaro Herrera wrote:
> Hmm, so if a table has REPLICA IDENTITY INDEX and there is a publication
> with an explicit column list, then we need to forbid the DROP INDEX for
> that index.

Hmm.  I have not followed this thread very closely.

> I wonder why don't we just forbid DROP INDEX of an index that's been
> defined as replica identity.  It seems quite silly an operation to
> allow.

The commit logs talk about b23b0f55 here for this code, to ease the
handling of relcache entries for rd_replidindex.  07cacba is the
origin of the logic (see RelationGetIndexList).  Andres?

I don't think that this is really an argument against putting more
restrictions as anything that deals with an index drop, including the
internal ones related to constraints, would need to go through
index_drop(), and new features may want more restrictions in place as
you say.

Now, I don't see a strong argument in changing this behavior either
(aka I have not looked at what this implies for the new publication
types), and we still need to do something for the comment/docs in
existing branches, anyway.  So I would still fix this gap as a first
step, then deal with the rest on HEAD as necessary.
--
Michael

signature.asc
Description: PGP signature

[PoC] Delegating pg_ident to a third party

2021-12-16 Thread Jacob Champion

Hi all,

In keeping with my theme of expanding the authentication/authorization
options for the server, attached is an experimental patchset that lets
Postgres determine an authenticated user's allowed roles by querying an
LDAP server, and enables SASL binding for those queries.

This lets you delegate pieces of pg_ident.conf to a central server, so
that you don't have to run any synchronization scripts (or deal with
associated staleness problems, repeated load on the LDAP deployment,
etc.). And it lets you make those queries with a client certificate
instead of a bind password, or at the very least protect your bind
password with some SCRAM crypto. You don't have to use the LDAP auth
method for this to work; you can combine it with Kerberos or certs or
any auth method that already supports pg_ident.

The target users, in my mind, are admins who are already using an auth
method with user maps, but have many deployments and want easier
control over granting and revoking database access from one location.
This won't help you so much if you need to have exactly one role per
user -- there's no logic to automatically create roles, so it can't
fully replace the existing synchronization scripts that are out there.
But if all you need is "X, Y, and Z are allowed to log in as guest, and
A and B may connect as admins", then this is meant to simplify your
life.

This is a smaller step than my previous proof-of-concept, which handled
fully federated authentication and authorization via an OAuth provider
[1], and it should be a nice companion to my patch that adds user
mappings to the LDAP auth method [2], though I haven't tried them
together yet. (I've also been thinking about pulling group membership
information out of Kerberos authorization data, for those of you using
Active Directory. Things for later.)

= How-To =

If you want to try it out -- on a non-production system please -- take
a look at the test suite in src/test/ldap, which has been filled out
with some example usage. The core features are the "ldapmap" HBA option
(which you would use instead of "map" in your existing HBA) and the
"ldapsaslmechs" HBA option, which you can set to a list of SASL
mechanisms that you will accept. (The list of supported mechanisms is
determined by both systems' LDAP and SASL libraries, not by Postgres.)

The tricky part is writing the pg_ident line correctly, because it's
currently not a very good user experience. The query is in the form of
an LDAP URL. It needs to return exactly one entry for the user being
authorized; the attribute values contained in that entry will be
interpreted as the list of roles that the user is allowed to connect
as. Regex matching and substitution are supported as they are for
regular maps. Here's a sample:

pg_ident.conf:

  myldapmap  /^(.*)$  
ldap://example.com/dc=example,dc=com?postgresRole?sub?(uid=\1)

pg_hba.conf:

  hostssl all all all cert ldapmap=myldapmap ldaptls=1 
ldapsaslmechs=scram-sha-1 ldapbinddn=admin ldapbindpasswd=secret

This particular setup can be described as follows:

- Clients must use client certificates to authenticate to Postgres.
- Once the certificate is verified, Postgres will connect to the LDAP
server at example.com, issue StartTLS, and begin a SCRAM-SHA-1 exchange
using the bind username and password (admin/secret).
- Once that completes, Postgres will issue a query for the LDAP user
that has a uid matching the CN of the client certificate. (If more than
one user matches, authorization fails.)
- The client's PGUSER will be compared with the list of postgresRole
attributes belonging to that LDAP user, and if one matches,
authorization succeeds.

= Areas for Improvement =

I think it would be nice to support LDAP group membership in addition
to object attributes.

Settings for the LDAP connection are currently spread between pg_hba,
pg_ident, and environment variables like LDAPTLS_CERT. I made the
situation worse by allowing the pg_ident query to contain a scheme,
host, and port. That makes it seem like you could send different users
to different LDAP servers, but since they would all have to share
exactly the same TLS settings anyway, I think this was a mistake on my
part.

That mistake aside, I think the current URL query syntax is powerful
but unintuitive. I would rather see that as an option for power users,
and let other people just specify the user filter and role attribute
separately. And there needs to be more logging around the feature, to
help debug problems.

Regex substitution of user-controlled data into an LDAP query is
perilous, and I don't like it. For now I have restricted the allowed
characters as a first mitigation.

Is it safe to use listen_addresses in the test suite, as I have done,
as long as the HBA requires authentication? Or is that reopening a
security hole? I seem to recall discussion on this but my search-fu has
failed me.

There's a lot of code duplication in the current patchset that would
need to be undone.

...and more; see TODOs

Re: row filtering for logical replication

2021-12-16 Thread Peter Smith

On Wed, Dec 15, 2021 at 3:50 PM Greg Nancarrow  wrote:
>
> On Mon, Dec 13, 2021 at 8:49 PM Peter Smith  wrote:
> >
> > PSA the v46* patch set.
> >
>
> 0001
>
...

> (2) In the 0001 patch comment, the term "publication filter" is used
> in one place, and in others "row filter" or "row-filter".
>

Fixed in v47 [1]

>
> src/backend/catalog/pg_publication.c
> (3) GetTransformedWhereClause() is missing a function comment.
>

Fixed in v47 [1]

> (4)
> The following comment seems incomplete:
>
> + /* Fix up collation information */
> + whereclause = GetTransformedWhereClause(pstate, pri, true);
>
>

Fixed in v47 [1]

> src/backend/parser/parse_relation.c
> (5)
> wording? consistent?
> Shouldn't it be "publication WHERE expression" for consistency?
>

In v47 [1]  this message is removed when the KIND is removed.

> + errmsg("publication row-filter WHERE invalid reference to table \"%s\"",
> + relation->relname),
>
>
> src/backend/replication/logical/tablesync.c
> (6)
>
> (i) Improve wording:
>
> BEFORE:
>  /*
>   * Get information about remote relation in similar fashion the RELATION
> - * message provides during replication.
> + * message provides during replication. This function also returns the 
> relation
> + * qualifications to be used in COPY command.
>   */
>
> AFTER:
>  /*
> - * Get information about remote relation in similar fashion the RELATION
> - * message provides during replication.
> + * Get information about a remote relation, in a similar fashion to
> how the RELATION
> + * message provides information during replication. This function
> also returns the relation
> + * qualifications to be used in the COPY command.
>   */
>

Fixed in v47 [1]

> (ii) fetch_remote_table_info() doesn't currently account for ALL
> TABLES and ALL TABLES IN SCHEMA.
>
>

Fixed in v47 [1]

...

>
>
> 0002
>
> src/backend/catalog/pg_publication.c
> (1) rowfilter_walker()
> One of the errdetail messages doesn't begin with an uppercase letter:
>
> +   errdetail_msg = _("user-defined types are not allowed");
>
>

Fixed in v47 [1]

> src/backend/executor/execReplication.c
> (2) CheckCmdReplicaIdentity()
>
> Strictly speaking, the following:
>
> + if (invalid_rfcolnum)
>
> should be:
>
> + if (invalid_rfcolnum != InvalidAttrNumber)
>
>
Fixed in v47 [1]

> 0003
>
> src/backend/replication/logical/tablesync.c
> (1)
> Column name in comment should be "puballtables" not "puballtable":
>
> + * If any publication has puballtable true then all row-filtering is
>

Fixed in v47 [1]

> (2) pgoutput_row_filter_init()
>
> There should be a space before the final "*/" (so the asterisks align).
> Also, should say "... treated the same".
>
>   /*
> + * If the publication is FOR ALL TABLES then it is treated same as if this
> + * table has no filters (even if for some other publication it does).
> + */
>
>
Fixed in v47 [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtjsj_OVMWEdYp2Wq19%3DH5D4Vgta43FbFVDYr2LuS_djg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: row filtering for logical replication

2021-12-16 Thread Peter Smith

On Fri, Dec 17, 2021 at 7:11 AM Alvaro Herrera  wrote:
>
> Kindly do not change the mode of src/backend/parser/gram.y.
>

Oops. Sorry that was not deliberate.

I will correct that in the next version.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Add index scan progress to pg_stat_progress_vacuum

2021-12-16 Thread Greg Stark

I had a similar question. And I'm still not clear from the response
what exactly index_blks_total is and whether it addresses it.

I think I agree that a user is likely to want to see the progress in a
way they can understand which means for a single index at a time.

I think what you're describing is that index_blks_total and
index_blks_scanned are the totals across all the indexes? That isn't
clear from the definitions but if that's what you intend then I think
that would work.

(For what it's worth what I was imagining was having a pair of
counters for blocks scanned and max blocks in this index and a second
counter for number of indexes processed and max number of indexes. But
I don't think that's necessarily any better than what you have)

Re: Add index scan progress to pg_stat_progress_vacuum

2021-12-16 Thread Imseih (AWS), Sami

On 12/15/21, 4:10 PM, "Bossart, Nathan"  wrote:

On 12/1/21, 3:02 PM, "Imseih (AWS), Sami"  wrote:
> The current implementation of pg_stat_progress_vacuum does not
> provide progress on which index is being vacuumed making it
> difficult for a user to determine if the "vacuuming indexes" phase
> is making progress. By exposing which index is being scanned as well
> as the total progress the scan has made for the current cycle, a
> user can make better estimations on when the vacuum will complete.

+1

> The proposed patch adds 4 new columns to pg_stat_progress_vacuum:
>
> 1. indrelid - the relid of the index being vacuumed
> 2. index_blks_total - total number of blocks to be scanned in the
> current cycle
> 3. index_blks_scanned - number of blocks scanned in the current
> cycle
> 4. leader_pid - if the pid for the pg_stat_progress_vacuum entry is
> a leader or a vacuum worker. This patch places an entry for every
> worker pid ( if parallel ) as well as the leader pid

nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed?  IMO it
is more analogous to heap_blks_vacuumed.

No, What is being tracked is the number of index blocks scanned from the total 
index blocks. The block will be scanned regardless if it will be vacuumed or 
not. 

This will tell us which indexes are currently being vacuumed and the
current progress of those operations, but it doesn't tell us which
indexes have already been vacuumed or which ones are pending vacuum.
I think such information is necessary to truly understand the current
progress of vacuuming indexes, and I can think of a couple of ways we
might provide it:

  1. Make the new columns you've proposed return arrays.  This isn't
 very clean, but it would keep all the information for a given
 vacuum operation in a single row.  The indrelids column would be
 populated with all the indexes that have been vacuumed, need to
 be vacuumed, or are presently being vacuumed.  The other index-
 related columns would then have the associated stats and the
 worker PID (which might be the same as the pid column depending
 on whether parallel index vacuum was being done).  Alternatively,
 the index column could have an array of records, each containing
 all the information for a given index.
  2. Create a new view for just index vacuum progress information.
 This would have similar information as 1.  There would be an
 entry for each index that has been vacuumed, needs to be
 vacuumed, or is currently being vacuumed.  And there would be an
 easy way to join with pg_stat_progress_vacuum (e.g., leader_pid,
 which again might be the same as our index vacuum PID depending
 on whether we were doing parallel index vacuum).  Note that it
 would be possible for the PID of these entries to be null before
 and after we process the index.
  3. Instead of adding columns to pg_stat_progress_vacuum, adjust the
 current ones to be more general, and then add new entries for
 each of the indexes that have been, need to be, or currently are
 being vacuumed.  This is the most similar option to your current
 proposal, but instead of introducing a column like
 index_blks_total, we'd rename heap_blks_total to blks_total and
 use that for both the heap and indexes.  I think we'd still want
 to add a leader_pid column.  Again, we have to be prepared for
 the PID to be null in this case.  Or we could just make the pid
 column always refer to the leader, and we could introduce a
 worker_pid column.  That might create confusion, though.

I wish option #1 was cleaner, because I think it would be really nice
to have all this information in a single row.  However, I don't expect
much support for a 3-dimensional view, so I suspect option #2
(creating a separate view for index vacuum progress) is the way to go.
The other benefit of option #2 versus option #3 or your original
proposal is that it cleanly separates the top-level vacuum operations
and the index vacuum operations, which are related at the moment, but
which might not always be tied so closely together.

Option #1 is not clean as you will need to unnest the array to make sense out 
of it. It will be too complex to use.
Option #3 I am reluctant to spent time looking at this option. It's more 
valuable to see progress per index instead of total. 
Option #2 was one that I originally designed but backed away as it was 
introducing a new view. Thinking about it a bit more, this is a cleaner 
approach. 
1. Having a view called pg_stat_progress_vacuum_worker to join with 
pg_stat_progress_vacuum is clean
2. No changes required to pg_stat_progress_vacuum
3. I’ll lean towards calling the view "

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-12-16 Thread Alvaro Herrera

On 2021-Dec-15, Melanie Plageman wrote:

> I noticed after changing the docs on the "bgwriter" target for
> pg_stat_reset_shared to say "checkpointer", that it still said "bgwriter" in
>   src/backend/po/ko.po
>   src/backend/po/it.po
>   ...
> I presume these are automatically updated with some incantation, but I wasn't
> sure what it was nor could I find documentation on this.

Yes, feel free to ignore those files completely.  They are updated using
an external workflow that you don't need to concern yourself with.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"World domination is proceeding according to plan"(Andrew Morton)

Re: Buildfarm support for older versions

2021-12-16 Thread Andrew Dunstan



On 12/16/21 15:53, Larry Rosenman wrote:
>
>
> I get:
> ERROR for site owner:
> Invalid domain for site key
>
> on https://pgbuildfarm.org/cgi-bin/register-form.pl


try https://buildfarm.postgresql.org/cgi-bin/register-form.pl


cheers


andrew

Re: GUC flags

2021-12-16 Thread Justin Pryzby

On Thu, Dec 09, 2021 at 09:53:23AM -0600, Justin Pryzby wrote:
> On Thu, Dec 09, 2021 at 05:17:54PM +0900, Michael Paquier wrote:
> > On Wed, Dec 08, 2021 at 01:23:51PM +0100, Peter Eisentraut wrote:
> > > I wasn't really aware of this script either.  But I think it's a good idea
> > > to have it.  But only if it's run automatically as part of a test suite 
> > > run.
> > 
> > Okay.  If we do that, I am wondering whether it would be better to
> > rewrite this script in perl then, so as there is no need to worry
> > about the compatibility of grep.  And also, it would make sense to
> > return a non-zero exit code if an incompatibility is found for the
> > automation part.
> 
> One option is to expose the GUC flags in pg_settings, so this can all be done
> in SQL regression tests.
> 
> Maybe the flags should be text strings, so it's a nicer user-facing interface.
> But then the field would be pretty wide, even though we're only adding it for
> regression tests.  The only other alternative I can think of is to make a
> sql-callable function like pg_get_guc_flags(text guc).

Fixed regression tests caused by another patches.
>From f18854a561d779c3d77689f9778dc175db1a2349 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 5 Dec 2021 23:22:04 -0600
Subject: [PATCH v2 1/2] check_guc: fix absurd number of false positives

Avoid false positives;
Avoid various assumptions;
Avoid list of exceptions;
Simplify shell/awk/sed/grep;

This requires GNU awk for RS as a regex.
---
 src/backend/utils/misc/check_guc | 69 +---
 1 file changed, 10 insertions(+), 59 deletions(-)

diff --git a/src/backend/utils/misc/check_guc b/src/backend/utils/misc/check_guc
index b171ef0e4f..323ca13191 100755
--- a/src/backend/utils/misc/check_guc
+++ b/src/backend/utils/misc/check_guc
@@ -1,78 +1,29 @@
-#!/bin/sh
+#! /bin/sh
+set -e
 
-## currently, this script makes a lot of assumptions:
+## this script makes some assumptions about the formatting of files it parses.
 ## in postgresql.conf.sample:
 ##   1) the valid config settings may be preceded by a '#', but NOT '# '
 ##  (we use this to skip comments)
-##   2) the valid config settings will be followed immediately by  ' ='
-##  (at least one space preceding the '=')
-## in guc.c:
-##   3) the options have PGC_ on the same line as the option
-##   4) the options have '{' on the same line as the option
-
-##  Problems
-## 1) Don't know what to do with TRANSACTION ISOLATION LEVEL
-
-## if an option is valid but shows up in only one file (guc.c but not
-## postgresql.conf.sample), it should be listed here so that it
-## can be ignored
-INTENTIONALLY_NOT_INCLUDED="debug_deadlocks in_hot_standby \
-is_superuser lc_collate lc_ctype lc_messages lc_monetary lc_numeric lc_time \
-pre_auth_delay role seed server_encoding server_version server_version_num \
-session_authorization trace_lock_oidmin trace_lock_table trace_locks trace_lwlocks \
-trace_notify trace_userlocks transaction_isolation transaction_read_only \
-zero_damaged_pages"
 
 ### What options are listed in postgresql.conf.sample, but don't appear
 ### in guc.c?
 
-# grab everything that looks like a setting and convert it to lower case
-SETTINGS=`grep ' =' postgresql.conf.sample |
-grep -v '^# ' | # strip comments
-sed -e 's/^#//' |
-awk '{print $1}'`
-
-SETTINGS=`echo "$SETTINGS" | tr 'A-Z' 'a-z'`
+# grab everything that looks like a setting
+SETTINGS=`sed '/^#[[:alnum:]]/!d; s/^#//; s/ =.*//; /^include/d' postgresql.conf.sample`
 
 for i in $SETTINGS ; do
-  hidden=0
   ## it sure would be nice to replace this with an sql "not in" statement
-  ## it doesn't seem to make sense to have things in .sample and not in guc.c
-#  for hidethis in $INTENTIONALLY_NOT_INCLUDED ; do
-#if [ "$hidethis" = "$i" ] ; then
-#  hidden=1
-#fi
-#  done
-  if [ "$hidden" -eq 0 ] ; then
-grep -i '"'$i'"' guc.c > /dev/null
-if [ $? -ne 0 ] ; then
-  echo "$i seems to be missing from guc.c";
-fi;
-  fi
+  grep -i "\"$i\"" guc.c >/dev/null ||
+echo "$i seems to be missing from guc.c";
 done
 
 ### What options are listed in guc.c, but don't appear
 ### in postgresql.conf.sample?
 
 # grab everything that looks like a setting and convert it to lower case
-
-SETTINGS=`grep '{.* PGC_' guc.c | awk '{print $1}' | \
-  sed -e 's/{//g' -e 's/"//g' -e 's/,//'`
-
-SETTINGS=`echo "$SETTINGS" | tr 'A-Z' 'a-z'`
-
+SETTINGS=`gawk -F '[",]' 'BEGIN{RS="\n\t{\n"} /",[[:space:]]*PGC_.*.*gettext_noop/ && !/NOT_IN_SAMPLE/{print tolower($2)}' guc.c`
 for i in $SETTINGS ; do
-  hidden=0
-  ## it sure would be nice to replace this with an sql "not in" statement
-  for hidethis in $INTENTIONALLY_NOT_INCLUDED ; do
-if [ "$hidethis" = "$i" ] ; then
-  hidden=1
-fi
-  done
-  if [ "$hidden" -eq 0 ] ; then
-grep -i '#'$i' ' postgresql.conf.sample > /dev/null
-if [ $? -ne 0 ] ; then
-  echo "$i seems to be missing from postgresql.conf.sample";
-fi
-  fi
+  grep "#$i "

Re: Buildfarm support for older versions

2021-12-16 Thread Larry Rosenman


On 12/16/2021 2:47 pm, Andrew Dunstan wrote:

On 12/16/21 12:26, Larry Rosenman wrote:

On 12/16/2021 11:17 am, Andrew Dunstan wrote:

On 12/16/21 11:11, Larry Rosenman wrote:




A new animal, because we're not supporting every build option. On 
the

non-live branches you really only want:

    --enable-debug --enable-cassert --enable-nls

    --enable-tap-tests --with-perl

You can make it share the same storage as your existing animal 
(godwit

and crake do this). The client is smart enough to manage locks of
several animals appropriately.




So just create a new animal / config file, and set those options?
and FreeBSD head / main would be useful?
(Currently FreeBSD 14 and clang 13).



Sure. I think if we get coverage for modern Linux, FreeBSD and 
Windows

we should be in good shape.

I doubt we need a heck of a lot of animals - there's not going to be
much going on here.




Would you mind terribly giving me the exact steps?



  * register a new animal with the same details
  * copy your existing config file to $new_animal.conf
  * edit the file and change the animal name and secret, the 
config_opts

as above, and remove TestUpgrade form the modules setting
  * change branches_to_build to [qw(
    REL9_2_STABLE REL9_3_STABLE REL9_4_STABLE
    REL9_5_STABLE REL9_6_STABLE)]
  * you should probably unset CCACHEDIR in both config files
  * test with ./run_branches --test --config $newanimal.conf --run-all




I get:
ERROR for site owner:
Invalid domain for site key

on https://pgbuildfarm.org/cgi-bin/register-form.pl
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: Buildfarm support for older versions

2021-12-16 Thread Andrew Dunstan



On 12/16/21 12:26, Larry Rosenman wrote:
> On 12/16/2021 11:17 am, Andrew Dunstan wrote:
>> On 12/16/21 11:11, Larry Rosenman wrote:
>>>

 A new animal, because we're not supporting every build option. On the
 non-live branches you really only want:

     --enable-debug --enable-cassert --enable-nls

     --enable-tap-tests --with-perl

 You can make it share the same storage as your existing animal (godwit
 and crake do this). The client is smart enough to manage locks of
 several animals appropriately.


>>>
>>> So just create a new animal / config file, and set those options?
>>> and FreeBSD head / main would be useful?
>>> (Currently FreeBSD 14 and clang 13).
>>>
>>
>> Sure. I think if we get coverage for modern Linux, FreeBSD and Windows
>> we should be in good shape.
>>
>> I doubt we need a heck of a lot of animals - there's not going to be
>> much going on here.
>>
>>
>
> Would you mind terribly giving me the exact steps?


  * register a new animal with the same details
  * copy your existing config file to $new_animal.conf
  * edit the file and change the animal name and secret, the config_opts
as above, and remove TestUpgrade form the modules setting
  * change branches_to_build to [qw(
    REL9_2_STABLE REL9_3_STABLE REL9_4_STABLE
    REL9_5_STABLE REL9_6_STABLE)]
  * you should probably unset CCACHEDIR in both config files
  * test with ./run_branches --test --config $newanimal.conf --run-all

cheers

andew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Apple's ranlib warns about protocol_openssl.c

2021-12-16 Thread Tom Lane

Daniel Gustafsson  writes:
>> On 16 Dec 2021, at 19:22, Tom Lane  wrote:
>> Having said that, I'm not seeing any such warning when I build
>> with openssl 1.1.1k on my own Mac, so I'm a bit confused why
>> Thomas sees it.

> Maybe it's dependant on macOS/XCode release?  I see the warning on my Catalina
> laptop.

Could be.  I tried it on Monterey, but not anything older.
(longfin is still on Big Sur, because I've been lazy about
updating it.)

regards, tom lane

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-12-16 Thread Andres Freund

Hi,

On 2021-12-15 16:40:27 -0500, Melanie Plageman wrote:
> > > +/*
> > > + * Before exiting, a backend sends its IO op statistics to the collector 
> > > so
> > > + * that they may be persisted.
> > > + */
> > > +void
> > > +pgstat_send_buffers(void)
> > > +{
> > > + PgStat_MsgIOPathOps msg;
> > > +
> > > + PgBackendStatus *beentry = MyBEEntry;
> > > +
> > > + /*
> > > +  * Though some backends with type B_INVALID (such as the 
> > > single-user mode
> > > +  * process) do initialize and increment IO operations stats, there 
> > > is no
> > > +  * spot in the array of IO operations for backends of type 
> > > B_INVALID. As
> > > +  * such, do not send these to the stats collector.
> > > +  */
> > > + if (!beentry || beentry->st_backendType == B_INVALID)
> > > + return;
> >
> > Why does single user mode use B_INVALID? That doesn't seem quite right.
> 
> I think PgBackendStatus->st_backendType is set from MyBackendType which
> isn't set for the single user mode process. What BackendType would you
> expect to see?

Either B_BACKEND or something new like B_SINGLE_USER_BACKEND?



> I also thought about having pgstat_sum_io_path_ops() return a value to
> indicate if everything was 0 -- which could be useful to future callers
> potentially?
> 
> I didn't do this because I am not sure what the return value would be.
> It could be a bool and be true if any IO was done and false if none was
> done -- but that doesn't really make sense given the function's name it
> would be called like
> if (!pgstat_sum_io_path_ops())
>   return
> which I'm not sure is very clear

Yea, I think it's ok to not do something fancier here for nwo.


> > > From 9f22da9041e1e1fbc0ef003f5f78f4e72274d438 Mon Sep 17 00:00:00 2001
> > > From: Melanie Plageman 
> > > Date: Wed, 24 Nov 2021 12:20:10 -0500
> > > Subject: [PATCH v17 6/7] Remove superfluous bgwriter stats
> > >
> > > Remove stats from pg_stat_bgwriter which are now more clearly expressed
> > > in pg_stat_buffers.
> > >
> > > TODO:
> > > - make pg_stat_checkpointer view and move relevant stats into it
> > > - add additional stats to pg_stat_bgwriter
> >
> > When do you think it makes sense to tackle these wrt committing some of the
> > patches?
> 
> Well, the new stats are a superset of the old stats (no stats have been
> removed that are not represented in the new or old views). So, I don't
> see that as a blocker for committing these patches.

> Since it is weird that pg_stat_bgwriter had mostly checkpointer stats,
> I've edited this commit to rename that view to pg_stat_checkpointer.

> I have not made a separate view just for maxwritten_clean (presumably
> called pg_stat_bgwriter), but I would not be opposed to doing this if
> you thought having a view with a single column isn't a problem (in the
> event that we don't get around to adding more bgwriter stats right
> away).

How about keeping old bgwriter values in place in the view , but generated
from the new stats stuff?


> I noticed after changing the docs on the "bgwriter" target for
> pg_stat_reset_shared to say "checkpointer", that it still said "bgwriter" in
>   src/backend/po/ko.po
>   src/backend/po/it.po
>   ...
> I presume these are automatically updated with some incantation, but I wasn't
> sure what it was nor could I find documentation on this.

Yes, they are - and often some languages lag updating things.  There's a bit
of docs at https://www.postgresql.org/docs/devel/nls.html


Greetings,

Andres Freund

Re: Apple's ranlib warns about protocol_openssl.c

2021-12-16 Thread Daniel Gustafsson

> On 16 Dec 2021, at 19:22, Tom Lane  wrote:

> Having said that, I'm not seeing any such warning when I build
> with openssl 1.1.1k on my own Mac, so I'm a bit confused why
> Thomas sees it.

Maybe it's dependant on macOS/XCode release?  I see the warning on my Catalina
laptop.

--
Daniel Gustafsson   https://vmware.com/

Re: row filtering for logical replication

2021-12-16 Thread Alvaro Herrera

Kindly do not change the mode of src/backend/parser/gram.y.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Column Filtering in Logical Replication

2021-12-16 Thread Alvaro Herrera

On 2021-Dec-14, Tomas Vondra wrote:

> 7) There's a couple places doing this
> 
> if (att_map != NULL &&
> !bms_is_member(att->attnum - FirstLowInvalidHeapAttributeNumber,
>att_map) &&
> !bms_is_member(att->attnum - FirstLowInvalidHeapAttributeNumber,
>idattrs) &&
> !replidentfull)
> 
> which is really hard to understand (even if we get rid of the offset), so
> maybe let's move that to a function with sensible name. Also, some places
> don't check indattrs - seems a bit suspicious.

It is indeed pretty hard to read ... but I think this is completely
unnecessary.  Any column that is part of the identity should have been
included in the column filter, so there is no need to check for the
identity attributes separately.  Testing just for the columns in the
filter ought to be sufficient; and the cases "if att_map NULL" and "is
replica identity FULL" are also equivalent, because in the case of FULL,
you're disallowed from setting a column list.  So this whole thing can
be reduced to just this:

if (att_map != NULL && !bms_is_member(att->attnum, att_map))
   continue;/* that is, don't send this attribute */

so I don't think this merits a separate function.

[ says he, after already trying to write said function ]

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are."  -- Charles J. Sykes' advice to teenagers

Re: Support for NSS as a libpq TLS backend

2021-12-16 Thread Jacob Champion

On Wed, 2021-12-15 at 23:10 +0100, Daniel Gustafsson wrote:
> > On 30 Nov 2021, at 20:03, Jacob Champion  wrote:
> > 
> > On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote:
> > > > Speaking of IP addresses in SANs, it doesn't look like our OpenSSL
> > > > backend can handle those. That's a separate conversation, but I might
> > > > take a look at a patch for next commitfest.
> > > 
> > > Please do.
> > 
> > Didn't get around to it for November, but I'm putting the finishing
> > touches on that now.
> 
> Cool, thanks!

Done and registered in Commitfest.

> Yeah, that's clearly bogus.  I followed the bouncing ball reading NSS code and
> from what I can tell the comment is correct.  I removed the dead code, only
> realizing after the fact that I might cause conflict with your tree doing so,
> in that case sorry.

No worries, there weren't any issues with the rebase.

> I've attached a v50 which fixes the issues found by Joshua upthread, as well 
> as
> rebases on top of all the recent SSL and pgcrypto changes.

Thanks!

--Jacob

Re: [PATCH] Document heuristics for parameterized paths

2021-12-16 Thread Steinar H. Gunderson

On Wed, Dec 15, 2021 at 11:23:12PM -0500, Greg Stark wrote:
>>> +one that must cannot be delayed right away (because of outer join
>> must cannot?
> Actually on further reading... "delayed right away"?

There are two ways of writing this sentence, and I see that I tried to use
both of them in the same sentence :-) What about something like the
following?

  This is enforced by refusing to join parameterized paths together unless
  the parameterization is resolved, *or* the remaining parameterization is
  one that cannot be resolved right away (because of outer join restrictions). 

/* Steinar */
-- 
Homepage: https://www.sesse.net/

Re: [PATCH] Accept IP addresses in server certificate SANs

2021-12-16 Thread Jacob Champion

On Thu, 2021-12-16 at 10:50 -0500, Andrew Dunstan wrote:
> Good job, this is certainly going to be useful.

Thanks!

> I don't think we should fall back on the CN. It would seem quite odd to
> do so for IP addresses but not for DNS names.

So there's at least one compatibility concern with disabling the
fallback, in that there could be existing users that are happily using
a certificate with an IP address CN, and libpq is just ignoring any
iPAddress SANs that the certificate has. Once libpq becomes aware of
those, it will stop accepting the CN and the certificate might stop
working.

Personally I think that's acceptable, but it would probably warrant a
release note or some such.

I will work on implementing behavior that's modeled off of the NSS
matching logic (see my reply to Horiguchi-san), which will at least
make it more logically consistent, and we can see what that looks like?

Thanks for the review!
--Jacob

Re: [PATCH] Accept IP addresses in server certificate SANs

2021-12-16 Thread Jacob Champion

On Thu, 2021-12-16 at 14:54 +0900, Kyotaro Horiguchi wrote:
> In RFC2828 and 6125,
> 
> >   In some cases, the URI is specified as an IP address rather than a
> >   hostname.  In this case, the iPAddress subjectAltName must be present
> >   in the certificate and must exactly match the IP in the URI.

Ah, right, I misremembered. Disregard my statement that the spec is
"silent on the subject", sorry.

> It seems like saying that we must search for iPAddress and mustn't use
> CN nor dNSName if the client connected using IP address. Otherwise, if
> the host name is a domain name, we use only dNSName if present, and
> use CN otherwise.  That behavior seems agreeing to what you wrote as
> NSS's behavior.

NSS departs slightly from the spec and will additionally try to match
an IP address against the CN, but only if there are no iPAddresses in
the SAN. It roughly matches the logic for DNS names.

Here's the description of the NSS behavior and some of the reasoning
behind it, quoted from a developer on Bugzilla [1]:

> Elsewhere in RFC 2818, it says 
> 
>If a subjectAltName extension of type dNSName is present, that MUST
>be used as the identity. Otherwise, the (most specific) Common Name
>field in the Subject field of the certificate MUST be used. 
> 
> Notice that this section is not conditioned upon the URI being a hostname
> and not an IP address.  So this statement conflicts with the one cited 
> above.  
> 
> I implemented this policy:
> 
> if the URI contains a host name
> if the subject alt name is present and has one or more DNS names
> use the DNS names in that extension as the server identity
> else
> use the subject common name as the server identity
> else if the URI contains an IP address
> if the subject alt name is present and has one or more IP addresses
> use the IP addresses in that extension as the server identity
> else
> compare the URI IP address string with the subject common name.

It sounds like both you and Andrew might be comfortable with that same
behavior? I think it looks like a sane solution, so I'll implement that
and we can see what it looks like. (My work on this will be paused over
the end-of-year holidays.)

> That being said it seems to me we should preserve
> that behavior at least for OpenSSL as an established behavior.

That part is interesting. I'll talk more about that in my reply to
Andrew.

> In short, I think the current behavior of the patch is the direction
> we would go but some documentation is may be needed.

Great!

> I'm not sure about ipv4 comptible addresses.  However, I think we can
> identify ipv4 compatible address easily.

Yeah, it would probably not be a difficult feature to add later.

> +* pg_inet_net_pton() will accept CIDR masks, which we don't 
> want to
> +* match, so skip the comparison if the host string contains 
> a slash.
> +*/
> +   if (!strchr(host, '/')
> +   && pg_inet_net_pton(PGSQL_AF_INET6, host, addr, -1) 
> == 128)
> 
> If a cidr is given, pg_inet_net_pton returns a number less than 128 so
> we don't need to check '/' explicity?  (I'm not sure '/128' is
> sensible but doesn't harm..)

Personally I think that, if someone wants your libpq to connect to a
server with a hostname of "some:ipv6::address/128", then they are
trying to pull something (evading a poorly coded blocklist, perhaps?)
and we should not allow that to match an IP. Thoughts?

Thanks for the review!
--Jacob

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=103752

Re: Granting SET and ALTER SYSTE privileges for GUCs

2021-12-16 Thread Joshua Brindle

On Thu, Dec 16, 2021 at 12:53 PM Mark Dilger
 wrote:
>
>
>
> > On Dec 16, 2021, at 7:43 AM, Joshua Brindle 
> >  wrote:
> >
> > Ah, I understand now. Would it be possible to pass the
> > SettingAclRelationId if it exists or InvalidOid if not?
>
> SettingAclRelationId is always defined, so we can always pass that value.  
> But the settingId itself may sometimes be InvalidOid.

Yes, that is what I meant.

> > That way if a
> > MAC implementation cares about a particular GUC it'll ensure it's in
> > pg_setting_acl.
>
> A much cleaner solution would be to create new ObjectAccessTypes with a 
> corresponding new Invoke macro and Run function.  Those could take setting 
> names, not Oids, and include additional information about whether the 
> operation is SET, RESET or ALTER SYSTEM, what the new value is (if any), what 
> kind of setting it is (bool, int, ...), etc.  I don't think such a patch 
> would even be all that hard to write.
>
> What do you think?

Personally, I would be happy with that, but since it's a whole new
hooking method I suspect it'll be an uphill battle. That definitely
seems like another patchset though, if you do submit this I will test
and review.

Thank you.

Re: Apple's ranlib warns about protocol_openssl.c

2021-12-16 Thread Tom Lane

Thomas Munro  writes:
> On Fri, Dec 17, 2021 at 4:48 AM Peter Eisentraut
>  wrote:
>> These are not errors, right?  So why is this a problem?

> Yeah they're just warnings.  I don't personally care if we just ignore
> them until we drop OpenSSL < 1.1.0 or macOS < 10.something.  I
> mentioned it because in the past we worked to get rid of these sorts
> of  warnings (there have been a couple of rounds at least), and they
> show up more obviously in the CI scripts because they use -s, so the 3
> warning lines are the only output.

Yeah, "zero chatter from a successful build" has been a goal
for awhile now.

Having said that, I'm not seeing any such warning when I build
with openssl 1.1.1k on my own Mac, so I'm a bit confused why
Thomas sees it.

regards, tom lane

Re: Confused comment about drop replica identity index

2021-12-16 Thread Alvaro Herrera

On 2021-Dec-15, Michael Paquier wrote:

> On Tue, Dec 14, 2021 at 07:10:49PM +0530, Ashutosh Bapat wrote:
> > This code in RelationGetIndexList() is not according to that comment.
> > 
> >if (replident == REPLICA_IDENTITY_DEFAULT && OidIsValid(pkeyIndex))
> > relation->rd_replidindex = pkeyIndex;
> > else if (replident == REPLICA_IDENTITY_INDEX && 
> > OidIsValid(candidateIndex))
> > relation->rd_replidindex = candidateIndex;
> > else
> > relation->rd_replidindex = InvalidOid;
> 
> Yeah, the comment is wrong.  If the index of a REPLICA_IDENTITY_INDEX
> is dropped, I recall that the behavior is the same as
> REPLICA_IDENTITY_NOTHING.

Hmm, so if a table has REPLICA IDENTITY INDEX and there is a publication
with an explicit column list, then we need to forbid the DROP INDEX for
that index.

I wonder why don't we just forbid DROP INDEX of an index that's been
defined as replica identity.  It seems quite silly an operation to
allow.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Ed is the standard text editor."
  http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3

Re: Column Filtering in Logical Replication

2021-12-16 Thread Alvaro Herrera

On 2021-Dec-16, houzj.f...@fujitsu.com wrote:

> The patch ensures all columns of RT are in column list when CREATE/ALTER
> publication, but it seems doesn't prevent user from changing the replica
> identity or dropping the index used in replica identity. Do we also need to
> check those cases ?

Yes, we do.  As it happens, I spent a couple of hours yesterday writing
code for that, at least partially.  I haven't yet checked what happens
with cases like REPLICA NOTHING, or REPLICA INDEX  and then
dropping that index.

My initial ideas were a bit wrong BTW: I thought we should check the
combination of column lists in all publications (a bitwise-OR of column
bitmaps, so to speak).  But conceptually that's wrong: we need to check
the column list of each publication individually instead.  Otherwise, if
you wanted to hide a column from some publication but that column was
part of the replica identity, there'd be no way to identify the tuple in
the replica.  (Or, if the pgouput code disobeys the column list and
sends the replica identity even if it's not in the column list, then
you'd be potentially publishing data that you wanted to hide.)

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Re: Granting SET and ALTER SYSTE privileges for GUCs

2021-12-16 Thread Mark Dilger

> On Dec 16, 2021, at 7:43 AM, Joshua Brindle  
> wrote:
> 
> Ah, I understand now. Would it be possible to pass the
> SettingAclRelationId if it exists or InvalidOid if not?

SettingAclRelationId is always defined, so we can always pass that value.  But 
the settingId itself may sometimes be InvalidOid.

> That way if a
> MAC implementation cares about a particular GUC it'll ensure it's in
> pg_setting_acl.

A much cleaner solution would be to create new ObjectAccessTypes with a 
corresponding new Invoke macro and Run function.  Those could take setting 
names, not Oids, and include additional information about whether the operation 
is SET, RESET or ALTER SYSTEM, what the new value is (if any), what kind of 
setting it is (bool, int, ...), etc.  I don't think such a patch would even be 
all that hard to write.

What do you think?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Apple's ranlib warns about protocol_openssl.c

2021-12-16 Thread Thomas Munro

On Fri, Dec 17, 2021 at 4:48 AM Peter Eisentraut
 wrote:
> These are not errors, right?  So why is this a problem?

Yeah they're just warnings.  I don't personally care if we just ignore
them until we drop OpenSSL < 1.1.0 or macOS < 10.something.  I
mentioned it because in the past we worked to get rid of these sorts
of  warnings (there have been a couple of rounds at least), and they
show up more obviously in the CI scripts because they use -s, so the 3
warning lines are the only output.

Re: Buildfarm support for older versions

2021-12-16 Thread Larry Rosenman


On 12/16/2021 11:17 am, Andrew Dunstan wrote:

On 12/16/21 11:11, Larry Rosenman wrote:




A new animal, because we're not supporting every build option. On the
non-live branches you really only want:

    --enable-debug --enable-cassert --enable-nls

    --enable-tap-tests --with-perl

You can make it share the same storage as your existing animal 
(godwit

and crake do this). The client is smart enough to manage locks of
several animals appropriately.




So just create a new animal / config file, and set those options?
and FreeBSD head / main would be useful?
(Currently FreeBSD 14 and clang 13).



Sure. I think if we get coverage for modern Linux, FreeBSD and Windows
we should be in good shape.

I doubt we need a heck of a lot of animals - there's not going to be
much going on here.


cheers


andrew



Would you mind terribly giving me the exact steps?
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: incremental sort vs. gather paths

2021-12-16 Thread Robert Haas

On Thu, Dec 16, 2021 at 12:16 PM Tomas Vondra
 wrote:
> Maybe, but other places (predating incremental sort) creating Gather
> Merge do the same thing, and commit ba3e76cc57 merely copied this. For
> example generate_gather_paths() does this:
>
>  foreach(lc, rel->partial_pathlist)
>  {
>  Path   *subpath = (Path *) lfirst(lc);
>  GatherMergePath *path;
>
>  if (subpath->pathkeys == NIL)
>  continue;
>
>  rows = subpath->rows * subpath->parallel_workers;
>  path = create_gather_merge_path(root, rel, subpath,
>  rel->reltarget,
>  subpath->pathkeys, NULL, rowsp);
>  add_path(rel, >path);
>  }
>
> i.e. it's doing the same (rows * parallel_workers) calculation.

Ugh. I was hoping this mess wasn't my fault, but it seems that it is. :-(

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Buildfarm support for older versions

2021-12-16 Thread Andrew Dunstan



On 12/16/21 11:11, Larry Rosenman wrote:
>
>>
>> A new animal, because we're not supporting every build option. On the
>> non-live branches you really only want:
>>
>>     --enable-debug --enable-cassert --enable-nls
>>
>>     --enable-tap-tests --with-perl
>>
>> You can make it share the same storage as your existing animal (godwit
>> and crake do this). The client is smart enough to manage locks of
>> several animals appropriately.
>>
>>
>
> So just create a new animal / config file, and set those options?
> and FreeBSD head / main would be useful?
> (Currently FreeBSD 14 and clang 13).
>

Sure. I think if we get coverage for modern Linux, FreeBSD and Windows
we should be in good shape.

I doubt we need a heck of a lot of animals - there's not going to be
much going on here.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: incremental sort vs. gather paths

2021-12-16 Thread Tomas Vondra





On 12/16/21 17:48, Robert Haas wrote:

This code introduced in ba3e76cc571eba3dea19c9465ff15ac3ac186576 looks
wrong to me:

 total_groups = cheapest_partial_path->rows *
 cheapest_partial_path->parallel_workers;
 path = (Path *)
 create_gather_merge_path(root, ordered_rel,
  path,
  path->pathtarget,
  root->sort_pathkeys, NULL,
  _groups);

This too:

 total_groups = input_path->rows *
 input_path->parallel_workers;

This came to my attention because Dave Page sent me a query plan that
looks like this:

Gather Merge  (cost=22617.94..22703.35 rows=732 width=97) (actual
time=2561.476..2561.856 rows=879 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Sort  (cost=21617.92..21618.83 rows=366 width=97) (actual
time=928.329..928.378 rows=293 loops=3)
  Sort Key: aid
  Sort Method: quicksort  Memory: 155kB
  Worker 0:  Sort Method: quicksort  Memory: 25kB
  Worker 1:  Sort Method: quicksort  Memory: 25kB
  ->  Parallel Seq Scan on accounts_1m  (cost=0.00..21602.33
rows=366 width=97) (actual time=74.391..74.518 rows=293 loops=3)
Filter: (aid < 1000)
Rows Removed by Filter: 333040

If you look at the actual row count estimates, you see that the Gather
Merge produces 3 times the number of rows that the Parallel Seq Scan
produces, which is completely correct, because the raw number is 897
in both cases, but EXPLAIN unhelpfully divides the displayed row count
by the loop count, which in this case is 3. If you look at the
estimated row count, you see that the Gather Merge is estimated to
produce exactly 2 times the number of rows that the nodes under it
would produce. That's not a very good estimate, unless
parallel_leader_participation=off, which in this case it isn't.

What's happening here is that the actual number of rows produced by
accounts_1m is actually 879 and is estimated as 879.
get_parallel_divisor() decides to divide that number by 2.4, and gets
366, because it thinks the leader will do less work than the other
workers. Then the code above fires and, instead of letting the
original row count estimate for accounts_1m apply to the Gather Merge
path, it overrides it with 2 * 366 = 732. This cannot be right.
Really, I don't think it should be overriding the row count estimate
at all. But if it is, multiplying by the number of workers can't be
right, because the leader can also do stuff.



Maybe, but other places (predating incremental sort) creating Gather 
Merge do the same thing, and commit ba3e76cc57 merely copied this. For 
example generate_gather_paths() does this:


foreach(lc, rel->partial_pathlist)
{
Path   *subpath = (Path *) lfirst(lc);
GatherMergePath *path;

if (subpath->pathkeys == NIL)
continue;

rows = subpath->rows * subpath->parallel_workers;
path = create_gather_merge_path(root, rel, subpath,
rel->reltarget,
subpath->pathkeys, NULL, rowsp);
add_path(rel, >path);
}

i.e. it's doing the same (rows * parallel_workers) calculation.

It may not not be the right way to estimate this, of course. But I'd say 
if ba3e76cc57 is doing it wrong, so are the older places.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Use generation context to speed up tuplesorts

2021-12-16 Thread Tomas Vondra


On 12/16/21 17:03, Ronan Dunklau wrote:

Le jeudi 16 décembre 2021, 11:56:15 CET Ronan Dunklau a écrit :

I will follow up with a benchmark of the test sorting a table with a width
varying from 1 to 32 columns.



So please find attached another benchmark for that case.

The 3 different patchsets tested are:

  - master
  - fixed (David's original patch)
  - adjust (Thomas growing blocks patch)



Presumably Thomas is me, right?


So it looks like tuning malloc for this would be very benificial for any kind
of allocation, and by doing so we reduce the problems seen with the growing
blocks patch to next to nothing, while keeping the ability to not allocate too
much memory from the get go.



Thanks for running those tests and investigating the glibc behavior! I 
find those results very interesting. My conclusions from this is that 
the interaction interaction between "our" allocator and the allocator in 
malloc (e.g. glibc) can be problematic. Which makes benchmarking and 
optimization somewhat tricky because code changes may trigger behavior 
change in glibc (or whatever allocator backs malloc).


I think it's worth exploring if we can tune this in a reasonable way, 
but I have a couple concerns related to that:


1) I wonder how glibc-specific this is - I'd bet it applies to other 
allocators (either on another OS or just different allocator on Linux) 
too. Tweaking glibc parameters won't affect those systems, of course, 
but maybe we should allow tweaking those systems too ...


2) In fact, I wonder if different glibc versions behave differently? 
Hopefully it's not changing that much, though. Ditto kernel versions, 
but the mmap/sbrk interface is likely more stable. We can test this.


3) If we bump the thresholds, won't that work against reusing the 
memory? I mean, if we free a whole block (from any allocator we have), 
glibc might return it to kernel, depending on mmap threshold value. It's 
not guaranteed, but increasing the malloc thresholds will make that even 
less likely. So we might just as well increase the minimum block size, 
with about the same effect, no?




I would like to try to implement some dynamic glibc malloc tuning, if that is
something we don't reject on principle from the get go.



+1 to that


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Emit a warning if the extension's GUC is set incorrectly

2021-12-16 Thread Fujii Masao





On 2021/12/16 16:31, Shinya Kato wrote:

Thank you for the review and sorry for the late reply.

On 2021-11-16 19:25, Bharath Rupireddy wrote:

> I observed an odd behaviour:
> 1) I set postgres_fdw.XXX = 'I_messed_up_conf_file' in postgresql.conf
> 2) With EmitWarningsOnPlaceholders("postgres_fdw"); in postgres_fdw
> contrib module, I created the extension, have seen the following
> warning:
> 2021-11-15 06:02:31.198 UTC [2018111] WARNING:  unrecognized
> configuration parameter "postgres_fdw.XXX"
> 3) I further did, "alter system set
> postgres_fdw.XXX='I_further_messed_up_conf_file';" and also "select
> pg_reload_conf();", it silently accepts.
>
> postgres=# create extension postgres_fdw ;
> WARNING:  unrecognized configuration parameter "postgres_fdw.XXX"
> CREATE EXTENSION
> postgres=# alter system set
> postgres_fdw.XXX='I_further_messed_up_conf_file';
> ALTER SYSTEM
> postgres=# select pg_reload_conf();
>  pg_reload_conf
> 
>  t
> (1 row)


I have made changes to achieve the above.


IMO this behavior change is not good. For example, because it seems to break at 
least the following case. I think that these are the valid steps, but with the 
patch, the server fails to start up at the step #2 because pg_trgm's custom 
parameters are treated as invalid ones.

1. Add the following two pg_trgm parameters to postgresql.conf
- pg_trgm.similarity_threshold
- pg_trgm.strict_word_similarity_threshold

2. Start the server

3. Run "CREATE EXTENSION pg_trgm" and then use pg_trgm


When I compiled PostgreSQL with the patch, I got the following
compilation failure.

guc.c:5453:4: error: implicit declaration of function 
'EmitErrorsOnPlaceholders' is invalid in C99 
[-Werror,-Wimplicit-function-declaration]
EmitErrorsOnPlaceholders(placeholder);
^


-   ereport(WARNING,
+   ereport(ERROR,

I'm still not sure why you were thinking that ERROR is more proper here.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

incremental sort vs. gather paths

2021-12-16 Thread Robert Haas

This code introduced in ba3e76cc571eba3dea19c9465ff15ac3ac186576 looks
wrong to me:

total_groups = cheapest_partial_path->rows *
cheapest_partial_path->parallel_workers;
path = (Path *)
create_gather_merge_path(root, ordered_rel,
 path,
 path->pathtarget,
 root->sort_pathkeys, NULL,
 _groups);

This too:

total_groups = input_path->rows *
input_path->parallel_workers;

This came to my attention because Dave Page sent me a query plan that
looks like this:

Gather Merge  (cost=22617.94..22703.35 rows=732 width=97) (actual
time=2561.476..2561.856 rows=879 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=21617.92..21618.83 rows=366 width=97) (actual
time=928.329..928.378 rows=293 loops=3)
 Sort Key: aid
 Sort Method: quicksort  Memory: 155kB
 Worker 0:  Sort Method: quicksort  Memory: 25kB
 Worker 1:  Sort Method: quicksort  Memory: 25kB
 ->  Parallel Seq Scan on accounts_1m  (cost=0.00..21602.33
rows=366 width=97) (actual time=74.391..74.518 rows=293 loops=3)
   Filter: (aid < 1000)
   Rows Removed by Filter: 333040

If you look at the actual row count estimates, you see that the Gather
Merge produces 3 times the number of rows that the Parallel Seq Scan
produces, which is completely correct, because the raw number is 897
in both cases, but EXPLAIN unhelpfully divides the displayed row count
by the loop count, which in this case is 3. If you look at the
estimated row count, you see that the Gather Merge is estimated to
produce exactly 2 times the number of rows that the nodes under it
would produce. That's not a very good estimate, unless
parallel_leader_participation=off, which in this case it isn't.

What's happening here is that the actual number of rows produced by
accounts_1m is actually 879 and is estimated as 879.
get_parallel_divisor() decides to divide that number by 2.4, and gets
366, because it thinks the leader will do less work than the other
workers. Then the code above fires and, instead of letting the
original row count estimate for accounts_1m apply to the Gather Merge
path, it overrides it with 2 * 366 = 732. This cannot be right.
Really, I don't think it should be overriding the row count estimate
at all. But if it is, multiplying by the number of workers can't be
right, because the leader can also do stuff.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Privilege required for IF EXISTS event if the object already exists

2021-12-16 Thread Shay Rojansky

> As I said before, your position seems reasonable.  I've also got a couple
of reasonable complaints about IF EXISTS out there.  But there is little
interest in changing the status quo with regards to the promises that IF
EXISTS makes. And even with my less constrained views I find that doing
anything but returning an error to a user that issues CREATE SCHEMA on a
database for which they lack CREATE privileges is of limited benefit.
Would I support a well-written patch that made this the new rule?
Probably.  Would I write one or spend time trying to convince others to
write one?  No.

Fair enough, thanks.

Re: Buildfarm support for older versions

2021-12-16 Thread Larry Rosenman


On 12/16/2021 10:02 am, Andrew Dunstan wrote:

On 12/15/21 21:36, Larry Rosenman wrote:

On 12/15/2021 11:15 am, Andrew Dunstan wrote:
OK, old_branches_of_interest.txt now exists on the buildfarm server, 
and
the code has been modified to take notice of it (i.e. to accept 
builds
for branches listed there). The contents are the non-live versions 
from

9.2 on.

I have set up a test buildfarm client (which will eventually report
under the name 'godwit') alongside crake (Fedora 34). So far testing 
has

run smoothly, there are only two glitches:

  * 9.3 and 9.2 don't have a show_dl_suffix make target. This would
    require backpatching b40cb99b85 and d9cdb1ba9e. That's a tiny
    change, and I propose to do it shortly unless there's an 
objection.

  * I need to undo the removal of client logic that supported 9.2's
    unix_socket_directory setting as opposed to the later
    unix_socket_directories.




Would a FreeBSD head (peripatus or a new animal) help?



A new animal, because we're not supporting every build option. On the
non-live branches you really only want:

    --enable-debug --enable-cassert --enable-nls

    --enable-tap-tests --with-perl

You can make it share the same storage as your existing animal (godwit
and crake do this). The client is smart enough to manage locks of
several animals appropriately.

cheers

andrew

--


So just create a new animal / config file, and set those options?
and FreeBSD head / main would be useful?
(Currently FreeBSD 14 and clang 13).

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: Use generation context to speed up tuplesorts

2021-12-16 Thread Ronan Dunklau

Le jeudi 16 décembre 2021, 11:56:15 CET Ronan Dunklau a écrit :
> I will follow up with a benchmark of the test sorting a table with a width
> varying from 1 to 32 columns.
> 

So please find attached another benchmark for that case.

The 3 different patchsets tested are:

 - master
 - fixed (David's original patch)
 - adjust (Thomas growing blocks patch)

So it looks like tuning malloc for this would be very benificial for any kind 
of allocation, and by doing so we reduce the problems seen with the growing 
blocks patch to next to nothing, while keeping the ability to not allocate too 
much memory from the get go.

I would like to try to implement some dynamic glibc malloc tuning, if that is 
something we don't reject on principle from the get go. 


-- 
Ronan Dunklau

bench_results.ods
Description: application/vnd.oasis.opendocument.spreadsheet

Re: Buildfarm support for older versions

2021-12-16 Thread Andrew Dunstan



On 12/15/21 21:36, Larry Rosenman wrote:
> On 12/15/2021 11:15 am, Andrew Dunstan wrote:
>> OK, old_branches_of_interest.txt now exists on the buildfarm server, and
>> the code has been modified to take notice of it (i.e. to accept builds
>> for branches listed there). The contents are the non-live versions from
>> 9.2 on.
>>
>> I have set up a test buildfarm client (which will eventually report
>> under the name 'godwit') alongside crake (Fedora 34). So far testing has
>> run smoothly, there are only two glitches:
>>
>>   * 9.3 and 9.2 don't have a show_dl_suffix make target. This would
>>     require backpatching b40cb99b85 and d9cdb1ba9e. That's a tiny
>>     change, and I propose to do it shortly unless there's an objection.
>>   * I need to undo the removal of client logic that supported 9.2's
>>     unix_socket_directory setting as opposed to the later
>>     unix_socket_directories.
>>
>>
>
> Would a FreeBSD head (peripatus or a new animal) help?


A new animal, because we're not supporting every build option. On the
non-live branches you really only want:

    --enable-debug --enable-cassert --enable-nls

    --enable-tap-tests --with-perl

You can make it share the same storage as your existing animal (godwit and 
crake do this). The client is smart enough to manage locks of several animals 
appropriately.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Privilege required for IF EXISTS event if the object already exists

2021-12-16 Thread David G. Johnston

On Thu, Dec 16, 2021 at 3:38 AM Shay Rojansky  wrote:

> >> Now, before creating tables, the ORM generates CREATE SCHEMA IF NOT
> EXISTS, to ensure that the schema exists before CREATE TABLE; that's
> reasonable general-purpose behavior.
> >
> > If the user hasn’t specified they want the schema created it’s arguable
> that executing create schema anyway is reasonable.  The orm user should
> know whether they are expected/able to create the schema as part of their
> responsibilities and instruct the orm accordingly and expect it to only
> create what it has been explicitly directed to create.
>
> I think the point being missed here, is that the user isn't interacting
> directly with PostgreSQL - they're doing so via an ORM which isn't
> necessarily aware of everything. Yes, a switch could be added to the ORM
> where the user instructs it to not ensure that the schema exists, but
> that's placing unnecessary burden on the user - having the "ensure"
> operation silently no-op (instead of throwing) if the schema already exists
> just makes everything smoother.
>

I get that point, and even have sympathy for it.  But I'm also fond of the
position that "ensuring a schema exists" is not something the ORM should be
doing.  But, if you want to do it anyway you can, with a minimal amount of
pl/pgsql code.

> Is there any specific reason you think this shouldn't be done?
>

As I said before, your position seems reasonable.  I've also got a couple
of reasonable complaints about IF EXISTS out there.  But there is little
interest in changing the status quo with regards to the promises that IF
EXISTS makes. And even with my less constrained views I find that doing
anything but returning an error to a user that issues CREATE SCHEMA on a
database for which they lack CREATE privileges is of limited benefit.
Would I support a well-written patch that made this the new rule?
Probably.  Would I write one or spend time trying to convince others to
write one?  No.

David J.

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-16 Thread Neha Sharma

Hi,

While testing the v8 patches in a hot-standby setup, it was observed the
master is crashing with the below error;

2021-12-16 19:32:47.757 +04 [101483] PANIC:  could not fsync file
"pg_tblspc/16385/PG_15_202112111/16386/16391": No such file or directory
2021-12-16 19:32:48.917 +04 [101482] LOG:  checkpointer process (PID
101483) was terminated by signal 6: Aborted

Parameters configured at master:
wal_level = hot_standby
max_wal_senders = 3
hot_standby = on
max_standby_streaming_delay= -1
wal_consistency_checking='all'
max_wal_size= 10GB
checkpoint_timeout= 1d
log_min_messages=debug1

Test Case:
create tablespace tab1 location
'/home/edb/PGsources/postgresql/inst/bin/test1';
create tablespace tab location
'/home/edb/PGsources/postgresql/inst/bin/test';
create database test tablespace tab;
\c test
create table t( a int PRIMARY KEY,b text);
CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS 'select
array_agg(md5(g::text))::text from generate_series(1, 256) g';
insert into t values (generate_series(1,10), large_val());
alter table t set tablespace tab1 ;
\c postgres
create database test1 template test;
\c test1
alter table t set tablespace tab;
\c postgres
alter database test1 set tablespace tab1;

--cancel the below command
alter database test1 set tablespace pg_default; --press ctrl+c
\c test1
alter table t set tablespace tab1;


Log file attached for reference.

Thanks.
--
Regards,
Neha Sharma


On Thu, Dec 16, 2021 at 4:17 PM Dilip Kumar  wrote:

> On Thu, Dec 16, 2021 at 12:15 AM Bruce Momjian  wrote:
> >
> > On Thu, Dec  2, 2021 at 07:19:50PM +0530, Dilip Kumar wrote:
> > From the patch:
> >
> > > Currently, CREATE DATABASE forces a checkpoint, then copies all the
> files,
> > > then forces another checkpoint. The comments in the createdb() function
> > > explain the reasons for this. The attached patch fixes this problem by
> making
> > > create database completely WAL logged so that we can avoid the
> checkpoints.
> > >
> > > This can also be useful for supporting the TDE. For example, if we
> need different
> > > encryption for the source and the target database then we can not
> re-encrypt the
> > > page data if we copy the whole directory.  But with this patch, we are
> copying
> > > page by page so we have an opportunity to re-encrypt the page before
> copying that
> > > to the target database.
> >
> > Uh, why is this true?  Why can't we just copy the heap/index files 8k at
> > a time and reencrypt them during the file copy, rather than using shared
> > buffers?
>
> Hi Bruce,
>
> Yeah, you are right that if we copy in 8k block then we can re-encrypt
> the page, but in the current system, we are not copying block by
> block.  So the main effort for this patch is not only for TDE but to
> get rid of the checkpoint we are forced to do before and after create
> database.  So my point is that in this patch since we are copying page
> by page we get an opportunity to re-encrypt the page.  I agree that if
> the re-encryption would have been the main goal of this patch then
> true we can copy files in 8k blocks and re-encrypt those blocks, that
> time even if we have to access some page data for re-encryption (like
> nonce) then also we can do it, but that is not the main objective.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>
>
>


master_logfile
Description: Binary data

Re: [PATCH] Accept IP addresses in server certificate SANs

2021-12-16 Thread Andrew Dunstan



On 12/15/21 20:13, Jacob Champion wrote:
> Hello all,
>
> libpq currently supports server certificates with a single IP address
> in the Common Name. It's fairly brittle; as far as I can tell, the
> single name you choose has to match the client's address exactly.
>
> Attached is a patch for libpq to support IP addresses in the server's
> Subject Alternative Names, which would allow admins to issue certs for
> multiple IP addresses, both IPv4 and IPv6, and mix them with
> alternative DNS hostnames. These addresses are compared bytewise
> instead of stringwise, so the client can contact the server via
> alternative spellings of the same IP address.


Good job, this is certainly going to be useful.



>
> This patch arose because I was writing tests for the NSS implementation
> that used a server cert with both DNS names and IP addresses, and then
> they failed when I ran those tests against the OpenSSL implementation.
> NSS supports this functionality natively. Anecdotally, I've heard from
> at least one client group who is utilizing IP-based certificates in
> their cloud deployments. It seems uncommon but still useful.
>
> There are two open questions I have; they're based on NSS
> implementation details that I did not port here:
>
> - NSS allows an IPv4 SAN to match an IPv6 mapping of that same address,
>   and vice-versa. I chose not to implement that behavior, figuring it
>   is easy enough for people to issue a certificate with both addresses.
>   Is that okay?


Sure.


>
> - If a certificate contains only iPAddress SANs, and none of them
>   match, I fall back to check the certificate Common Name. OpenSSL will
>   not do this (its X509_check_ip considers only SANs). NSS will only do
>   this if the client's address is itself a DNS name. The spec says that
>   we can't fall back to Common Name if the SANs contain any DNS
>   entries, but it's silent on the subject of IP addresses. What should
>   the behavior be?


I don't think we should fall back on the CN. It would seem quite odd to
do so for IP addresses but not for DNS names.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Apple's ranlib warns about protocol_openssl.c

2021-12-16 Thread Peter Eisentraut


On 16.12.21 16:25, Tom Lane wrote:

Thomas Munro  writes:

Apple's ranlib doesn't like empty translation units[1], but
protocol_openssl.c doesn't define any symbols (unless you have an
ancient EOL'd openssl), so longfin and CI say:



/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib:
file: libpgcommon.a(protocol_openssl.o) has no symbols



I guess we still can't switch to (Apple) libtool.  Maybe configure
should be doing a test and adding it to LIBOBJS or a similar variable
only if necessary, or something like that?


Hmm ... right now, with only one test to make, the configure change
wouldn't be that hard; but that might change in future, plus I'm
unsure how to do it in MSVC.

A lazy man's answer could be to ensure the translation unit isn't
empty, say by adding


These are not errors, right?  So why is this a problem?

Re: Granting SET and ALTER SYSTE privileges for GUCs

2021-12-16 Thread Joshua Brindle

On Wed, Dec 15, 2021 at 1:18 PM Mark Dilger
 wrote:
>
> > On Dec 15, 2021, at 10:02 AM, Joshua Brindle 
> >  wrote:
> >
> > Ah, I was actually requesting a hook where the acl check was done for
> > setting a GUC, such that we could deny setting them in a hook,
> > something that would be useful for the set_user extension
> > (github.com/pgaudit/set_user)
>
> Hmm, this seems orthogonal to the patch under discussion.  This patch only 
> adds a pg_setting_acl_aclcheck in ExecSetVariableStmt() for settings which 
> have been explicitly granted, otherwise it works the traditional way 
> (checking whether the setting is suset/userset).  I don't think you'd get MAC 
> support without finding a way to fire the hook for all settings, regardless 
> of their presence in the new pg_setting_acl table.  That is hard, because 
> InvokeObjectPostAlterHook expects the classId (SettingAclRelationId) and the 
> objectId (pg_setting_acl.oid), but you don't have those for many (most?) 
> settings.  As discussed upthread, we *do not* want to force an entry into the 
> table for all settings, only for ones that have been explicitly granted.
>
> Do you agree?  I'm happy to support MAC in this patch if can explain a simple 
> way of doing so.

Ah, I understand now. Would it be possible to pass the
SettingAclRelationId if it exists or InvalidOid if not? That way if a
MAC implementation cares about a particular GUC it'll ensure it's in
pg_setting_acl.

I don't know if others will object to that but it seems like an okay
compromise.

> > but having a hook for grant/revoke is
> > also helpful.
>
> Yes, I see no reason to rip this out.
>

Re: Apple's ranlib warns about protocol_openssl.c

2021-12-16 Thread Tom Lane

Thomas Munro  writes:
> Apple's ranlib doesn't like empty translation units[1], but
> protocol_openssl.c doesn't define any symbols (unless you have an
> ancient EOL'd openssl), so longfin and CI say:

> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib:
> file: libpgcommon.a(protocol_openssl.o) has no symbols

> I guess we still can't switch to (Apple) libtool.  Maybe configure
> should be doing a test and adding it to LIBOBJS or a similar variable
> only if necessary, or something like that?

Hmm ... right now, with only one test to make, the configure change
wouldn't be that hard; but that might change in future, plus I'm
unsure how to do it in MSVC.

A lazy man's answer could be to ensure the translation unit isn't
empty, say by adding

+#else
+
+int dummy_protocol_openssl_variable = 0;
+
 #endif/* !SSL_CTX_set_min_proto_version */


regards, tom lane

Re: pg_dump: Refactor getIndexes()

2021-12-16 Thread Tom Lane

Peter Eisentraut  writes:
> This rearranges the version-dependent pieces in the new more modular style.
> I had originally written this before pre-9.2 support was removed and it 
> had a few more branches then.  But I think it is still useful, and there 
> are some pending patches that might add more branches for newer versions.

I didn't double-check the details, but +1 for doing this (and similarly
elsewhere, whenever anyone feels motivated to do so).

regards, tom lane

Re: logical decoding and replication of sequences

2021-12-16 Thread Tomas Vondra

On 12/16/21 12:59, Amit Kapila wrote:
> On Wed, Dec 15, 2021 at 7:21 PM Tomas Vondra
>  wrote:
>>
>> On 12/15/21 14:20, Amit Kapila wrote:
>>> On Tue, Dec 14, 2021 at 7:02 AM Tomas Vondra
>>>  wrote:

 Hi,

 here's an updated version of the patches, dealing with almost all of the
 issues (at least in the 0001 and 0002 parts). The main changes:

 1) I've removed the  'created' flag from fill_seq_with_data, as
 discussed. I don't think it's needed by any of the parts (not even 0003,
 AFAICS). We still need it in xl_seq_rec, though.

 2) GetCurrentTransactionId() added to sequence.c are called only with
 wal_level=logical, to minimize the overhead.

 There's still one remaining problem, that I already explained in [1].
 The problem is that with this:

   BEGIN;
   SELECT nextval('s') FROM generate_series(1,100);
   ROLLBACK;

 The root cause is that pg_current_wal_lsn() uses the LogwrtResult.Write,
 which is updated by XLogFlush() - but only in RecordTransactionCommit.
 Which makes sense, because only the committed stuff is "visible".

 But the non-transactional behavior of sequence decoding disagrees with
 this, because now some of the changes from aborted transactions may be
 replicated. Which means the wait_for_catchup() ends up not waiting for
 the sequence change to be replicated. This is an issue for tests in
 patch 0003, at least.

 My concern is this actually affects other places waiting for things
 getting replicated :-/

>>>
>>> By any chance, will this impact synchronous replication as well which
>>> waits for commits to be replicated?
>>>
>>
>> Physical or logical replication?
>>
> 
> logical replication.
> 
>> Physical is certainly not replicated.
>>
>> For logical replication, it's more complicated.
>>
>>> How is this patch dealing with prepared transaction case where at
>>> prepare we will send transactional changes and then later if rollback
>>> prepared happens then the publisher will rollback changes related to
>>> new relfilenode but subscriber would have already replayed the updated
>>> seqval change which won't be rolled back?
>>>
>>
>> I'm not sure what exact scenario you are describing, but in general we
>> don't rollback sequence changes anyway, so this should not cause any
>> divergence between the publisher and subscriber.
>>
>> Or are you talking about something like ALTER SEQUENCE? I think that
>> should trigger the transactional behavior for the new relfilenode, so
>> the subscriber won't see that changes.
>>
> 
> Yeah, something like Alter Sequence and nextval combination. I see
> that it will be handled because of the transactional behavior for the
> new relfilenode as for applying each sequence change, a new
> relfilenode is created.

Right.

> I think on apply side, the patch always creates a new relfilenode
> irrespective of whether the sequence message is transactional or not.
> Do we need to do that for the non-transactional messages as well?
> 

Good question. I don't think that's necessary, I'll see if we can simply
update the tuple (mostly just like redo).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: parallel vacuum comments

2021-12-16 Thread Amit Kapila

On Thu, Dec 16, 2021 at 6:13 PM Masahiko Sawada  wrote:
>
> On Thu, Dec 16, 2021 at 1:56 PM Amit Kapila  wrote:
> >
> > On Wed, Dec 15, 2021 at 1:33 PM Masahiko Sawada  
> > wrote:
> > >
> > > I've attached an updated patch. The patch incorporated several changes
> > > from the last version:
> > >
> > > * Rename parallel_vacuum_begin() to parallel_vacuum_init()
> > > * Unify the terminology; use "index bulk-deletion" and "index cleanup"
> > > instead of "index vacuum" and "index cleanup".
> > >
> >
> > I am not sure it is a good idea to do this as part of the main patch
> > as the intention of that is to just refactor parallel vacuum code. I
> > suggest doing this as a separate patch.
>
> Okay.
>
> >  Also, can we move the common
> > code to be shared between vacuumparallel.c and vacuumlazy.c as a
> > separate patch?
>
> You mean vac_tid_reaped() and vac_cmp_itemptr() etc.? If so, do both
> vacuumparallel.c and vacuumlazy.c have the same functions?
>

Why that would be required? I think both can call the common exposed
function like the one you have in your patch bulkdel_one_index or if
we directly move lazy_vacuum_one_index as part of common code. Similar
for cleanup function.

> >
> > Few other comments and questions:
> > 
> > 1. /* Outsource everything to parallel variant */
> > - parallel_vacuum_process_all_indexes(vacrel, true);
> > + LVSavedErrInfo saved_err_info;
> > +
> > + /*
> > + * Outsource everything to parallel variant. Since parallel vacuum will
> > + * set the error context on an error we temporarily disable setting our
> > + * error context.
> > + */
> > + update_vacuum_error_info(vacrel, _err_info,
> > + VACUUM_ERRCB_PHASE_UNKNOWN,
> > + InvalidBlockNumber, InvalidOffsetNumber);
> > +
> > + parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
> > +
> > + /* Revert to the previous phase information for error traceback */
> > + restore_vacuum_error_info(vacrel, _err_info);
> >
> > Is this change because you want a separate error callback for parallel
> > vacuum? If so, I suggest we can discuss this as a separate patch from
> > the refactoring patch.
>
> Because it seems natural to me that the leader and worker use the same
> error callback.
>
> Okay, I'll remove that change in the next version patch.
>
> > 2. Is introducing bulkdel_one_index/cleanup_one_index related to new
> > error context, or "Unify the terminology" task? Is there any other
> > reason for the same?
>
> Because otherwise both vacuumlazy.c and vacuumparallel.c will have the
> same functions.
>
> >
> > 3. Why did you introduce
> > parallel_vacuum_bulkdel_all_indexes()/parallel_vacuum_cleanup_all_indexes()?
> > Is it because of your task "Unify the terminology"?
>
> This is because parallel bulk-deletion and cleanup require different
> numbers of inputs (num_table_tuples etc.) and the caller
> (vacuumlazy.c) cannot set them directly to ParallelVacuumState.
>

oh, yeah, the other possibility could be to have a common structure
that can be used for both cases. I am not sure if that is better than
what you have.

> >
> > 4.
> > @@ -3086,7 +2592,6 @@ lazy_cleanup_one_index(Relation indrel,
> > IndexBulkDeleteResult *istat,
> >   ivinfo.report_progress = false;
> >   ivinfo.estimated_count = estimated_count;
> >   ivinfo.message_level = elevel;
> > -
> >   ivinfo.num_heap_tuples = reltuples;
> >
> > This seems like an unrelated change.
>
> Yes, but I think it's an unnecessary break so we can change it
> together. Should it be done in a separate patch?
>

Isn't this just spurious line removal which shouldn't be part of any patch?

-- 
With Regards,
Amit Kapila.

RE: Failed transaction statistics to measure the logical replication progress

2021-12-16 Thread osumi.takami...@fujitsu.com

On Thursday, December 16, 2021 8:37 PM I wrote:
> Hi, created a new version v17 according to the recent discussion with changes
> to address other review comments.
FYI, in v17 I've removed one part of commit message
about spool file statistics on the subscriber.
My intention is just to make the patches more committable shape.

Although I deleted it, I'd say still there would be some room
for the discussion of the necessity. It's because to begin with,
we are interested in the disk writes (for the logical replication,
pg_stat_replication_slots is an example), and secondly there can
be a scenario that if the user of logical replication dislikes and wants
to suppress unnecessary writes of file on the subscriber
(STREAM ABORT causes truncate of file with changes, IIUC)
they can increase the logical_decoding_work_mem on the publisher.
I'll postpone this discussion, till it becomes necessary
or will abandon this idea, if it's rejected. Anyway,
I detached the discussion by removing it from the commit message.

Best Regards,
Takamichi Osumi

RE: Optionally automatically disable logical replication subscriptions on error

2021-12-16 Thread osumi.takami...@fujitsu.com

On Thursday, December 16, 2021 2:32 PM Greg Nancarrow  
wrote:
> On Tue, Dec 14, 2021 at 4:34 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Besides all of those changes, I've removed the obsolete comment of
> > DisableSubscriptionOnError in v12.
> >
> 
> I have a few minor comments, otherwise the patch LGTM at this point:
Thank you for your review !

> doc/src/sgml/catalogs.sgml
> (1)
> Current comment says:
> 
> +   If true, the subscription will be disabled when subscription's
> +   worker detects any errors
> 
> However, in create_subscription.sgml, it says "disabled if any errors are
> detected by subscription workers ..."
> 
> For consistency, I think it should be:
> 
> +   If true, the subscription will be disabled when subscription
> +   workers detect any errors
Okay. Fixed.
 
> src/bin/psql/describe.c
> (2)
> I think that:
> 
> +  gettext_noop("Disable On Error"));
> 
> should be:
> 
> +  gettext_noop("Disable on error"));
> 
> for consistency with the uppercase/lowercase usage on other similar entries?
> (e.g. "Two phase commit")
Agreed. Fixed.

> src/include/catalog/pg_subscription.h
> (3)
> 
> +  bool subdisableonerr; /* True if apply errors should disable the
> +   * subscription upon error */
> 
> The comment should just say "True if occurrence of apply errors should disable
> the subscription"
Fixed.

Attached the updated patch v14.

Best Regards,
Takamichi Osumi



v14-0001-Optionally-disable-subscriptions-on-error.patch
Description: v14-0001-Optionally-disable-subscriptions-on-error.patch

Re: parallel vacuum comments

2021-12-16 Thread Masahiko Sawada

On Thu, Dec 16, 2021 at 1:56 PM Amit Kapila  wrote:
>
> On Wed, Dec 15, 2021 at 1:33 PM Masahiko Sawada  wrote:
> >
> > I've attached an updated patch. The patch incorporated several changes
> > from the last version:
> >
> > * Rename parallel_vacuum_begin() to parallel_vacuum_init()
> > * Unify the terminology; use "index bulk-deletion" and "index cleanup"
> > instead of "index vacuum" and "index cleanup".
> >
>
> I am not sure it is a good idea to do this as part of the main patch
> as the intention of that is to just refactor parallel vacuum code. I
> suggest doing this as a separate patch.

Okay.

>  Also, can we move the common
> code to be shared between vacuumparallel.c and vacuumlazy.c as a
> separate patch?

You mean vac_tid_reaped() and vac_cmp_itemptr() etc.? If so, do both
vacuumparallel.c and vacuumlazy.c have the same functions?

>
> Few other comments and questions:
> 
> 1. /* Outsource everything to parallel variant */
> - parallel_vacuum_process_all_indexes(vacrel, true);
> + LVSavedErrInfo saved_err_info;
> +
> + /*
> + * Outsource everything to parallel variant. Since parallel vacuum will
> + * set the error context on an error we temporarily disable setting our
> + * error context.
> + */
> + update_vacuum_error_info(vacrel, _err_info,
> + VACUUM_ERRCB_PHASE_UNKNOWN,
> + InvalidBlockNumber, InvalidOffsetNumber);
> +
> + parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
> +
> + /* Revert to the previous phase information for error traceback */
> + restore_vacuum_error_info(vacrel, _err_info);
>
> Is this change because you want a separate error callback for parallel
> vacuum? If so, I suggest we can discuss this as a separate patch from
> the refactoring patch.

Because it seems natural to me that the leader and worker use the same
error callback.

Okay, I'll remove that change in the next version patch.

> 2. Is introducing bulkdel_one_index/cleanup_one_index related to new
> error context, or "Unify the terminology" task? Is there any other
> reason for the same?

Because otherwise both vacuumlazy.c and vacuumparallel.c will have the
same functions.

>
> 3. Why did you introduce
> parallel_vacuum_bulkdel_all_indexes()/parallel_vacuum_cleanup_all_indexes()?
> Is it because of your task "Unify the terminology"?

This is because parallel bulk-deletion and cleanup require different
numbers of inputs (num_table_tuples etc.) and the caller
(vacuumlazy.c) cannot set them directly to ParallelVacuumState.

>
> 4.
> @@ -3086,7 +2592,6 @@ lazy_cleanup_one_index(Relation indrel,
> IndexBulkDeleteResult *istat,
>   ivinfo.report_progress = false;
>   ivinfo.estimated_count = estimated_count;
>   ivinfo.message_level = elevel;
> -
>   ivinfo.num_heap_tuples = reltuples;
>
> This seems like an unrelated change.

Yes, but I think it's an unnecessary break so we can change it
together. Should it be done in a separate patch?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-16 Thread Dilip Kumar

On Thu, Dec 16, 2021 at 12:15 AM Bruce Momjian  wrote:
>
> On Thu, Dec  2, 2021 at 07:19:50PM +0530, Dilip Kumar wrote:
> From the patch:
>
> > Currently, CREATE DATABASE forces a checkpoint, then copies all the files,
> > then forces another checkpoint. The comments in the createdb() function
> > explain the reasons for this. The attached patch fixes this problem by 
> > making
> > create database completely WAL logged so that we can avoid the checkpoints.
> >
> > This can also be useful for supporting the TDE. For example, if we need 
> > different
> > encryption for the source and the target database then we can not 
> > re-encrypt the
> > page data if we copy the whole directory.  But with this patch, we are 
> > copying
> > page by page so we have an opportunity to re-encrypt the page before 
> > copying that
> > to the target database.
>
> Uh, why is this true?  Why can't we just copy the heap/index files 8k at
> a time and reencrypt them during the file copy, rather than using shared
> buffers?

Hi Bruce,

Yeah, you are right that if we copy in 8k block then we can re-encrypt
the page, but in the current system, we are not copying block by
block.  So the main effort for this patch is not only for TDE but to
get rid of the checkpoint we are forced to do before and after create
database.  So my point is that in this patch since we are copying page
by page we get an opportunity to re-encrypt the page.  I agree that if
the re-encryption would have been the main goal of this patch then
true we can copy files in 8k blocks and re-encrypt those blocks, that
time even if we have to access some page data for re-encryption (like
nonce) then also we can do it, but that is not the main objective.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: logical decoding and replication of sequences

2021-12-16 Thread Amit Kapila

On Wed, Dec 15, 2021 at 7:21 PM Tomas Vondra
 wrote:
>
> On 12/15/21 14:20, Amit Kapila wrote:
> > On Tue, Dec 14, 2021 at 7:02 AM Tomas Vondra
> >  wrote:
> >>
> >> Hi,
> >>
> >> here's an updated version of the patches, dealing with almost all of the
> >> issues (at least in the 0001 and 0002 parts). The main changes:
> >>
> >> 1) I've removed the  'created' flag from fill_seq_with_data, as
> >> discussed. I don't think it's needed by any of the parts (not even 0003,
> >> AFAICS). We still need it in xl_seq_rec, though.
> >>
> >> 2) GetCurrentTransactionId() added to sequence.c are called only with
> >> wal_level=logical, to minimize the overhead.
> >>
> >>
> >> There's still one remaining problem, that I already explained in [1].
> >> The problem is that with this:
> >>
> >>   BEGIN;
> >>   SELECT nextval('s') FROM generate_series(1,100);
> >>   ROLLBACK;
> >>
> >>
> >> The root cause is that pg_current_wal_lsn() uses the LogwrtResult.Write,
> >> which is updated by XLogFlush() - but only in RecordTransactionCommit.
> >> Which makes sense, because only the committed stuff is "visible".
> >>
> >> But the non-transactional behavior of sequence decoding disagrees with
> >> this, because now some of the changes from aborted transactions may be
> >> replicated. Which means the wait_for_catchup() ends up not waiting for
> >> the sequence change to be replicated. This is an issue for tests in
> >> patch 0003, at least.
> >>
> >> My concern is this actually affects other places waiting for things
> >> getting replicated :-/
> >>
> >
> > By any chance, will this impact synchronous replication as well which
> > waits for commits to be replicated?
> >
>
> Physical or logical replication?
>

logical replication.

> Physical is certainly not replicated.
>
> For logical replication, it's more complicated.
>
> > How is this patch dealing with prepared transaction case where at
> > prepare we will send transactional changes and then later if rollback
> > prepared happens then the publisher will rollback changes related to
> > new relfilenode but subscriber would have already replayed the updated
> > seqval change which won't be rolled back?
> >
>
> I'm not sure what exact scenario you are describing, but in general we
> don't rollback sequence changes anyway, so this should not cause any
> divergence between the publisher and subscriber.
>
> Or are you talking about something like ALTER SEQUENCE? I think that
> should trigger the transactional behavior for the new relfilenode, so
> the subscriber won't see that changes.
>

Yeah, something like Alter Sequence and nextval combination. I see
that it will be handled because of the transactional behavior for the
new relfilenode as for applying each sequence change, a new
relfilenode is created. I think on apply side, the patch always
creates a new relfilenode irrespective of whether the sequence message
is transactional or not. Do we need to do that for the
non-transactional messages as well?

--
With Regards,
Amit Kapila.

RE: Failed transaction statistics to measure the logical replication progress

2021-12-16 Thread osumi.takami...@fujitsu.com

On Tuesday, December 14, 2021 12:45 AM vignesh C  wrote:
> Thanks for the updated patch, few comments:
> 1) Can we change this:
> /*
> +* Report the success of table sync as one commit to consolidate all
> +* transaction stats into one record.
> +*/
> +   pgstat_report_subworker_xact_end(MyLogicalRepWorker->subid,
> +
>   LOGICAL_REP_MSG_COMMIT);
> +
> To:
> /* Report the success of table sync */
> +   pgstat_report_subworker_xact_end(MyLogicalRepWorker->subid,
> +
>   LOGICAL_REP_MSG_COMMIT);
> +
This function call that the table sync worker reports
an update of stats has been removed according to the recent discussion.


> 2) Typo: ealier should be earlier
> +   /*
> +* Report ealier than the call of process_syncing_tables() not
> to miss an
> +* increment of commit_count in case it leads to the process exit. See
> +* process_syncing_tables_for_apply().
> +*/
> +   pgstat_report_subworker_xact_end(MyLogicalRepWorker->subid,
> +
>   LOGICAL_REP_MSG_COMMIT);
> +
Thanks ! Fixed.

 
> 3) Should we add an Assert for subwentry:
> +   /*
> +* If this is a new error reported by table sync worker,
> consolidate this
> +* error count into the entry of apply worker, by swapping the stats
> +* entries.
> +*/
> +   if (OidIsValid(msg->m_subrelid))
> +   subwentry = pgstat_get_subworker_entry(dbentry,
> + msg->m_subid,
> +
> InvalidOid, true);
> +   subwentry->error_count++;
The latest implementation doesn't require
the call of pgstat_get_subworker_entry().
So, I skipped.

> 4) Can we slightly change it to :We can change it:
> +# Check the update of stats counters.
> +confirm_transaction_stats_update(
> +   $node_subscriber,
> +   'commit_count = 1',
> +   'the commit_count increment by table sync');
> +
> +confirm_transaction_stats_update(
> +   $node_subscriber,
> +   'error_count = 1',
> +   'the error_count increment by table sync');
> to:
> +# Check updation of subscription worker transaction count statistics.
> +confirm_transaction_stats_update(
> +   $node_subscriber,
> +   'commit_count = 1',
> +   'check table sync worker commit count is updated');
> +
> +confirm_transaction_stats_update(
> +   $node_subscriber,
> +   'error_count = 1',
> +   'check table sync worker error count is updated');
I've removed the corresponding tests for table sync workers in the patch. 

But, I adopted the comment suggestion partly for the tests of the apply worker.
On the other hand, I didn't fix the 3rd arguments of 
confirm_transaction_stats_update().
It needs to be a noun, because it's connected to another string
"Timed out while waiting for ". in the function.
See the definition of the function.

The new patch v17 is shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB83734A7A0596AC7ADB0DCB51ED779%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi

RE: Failed transaction statistics to measure the logical replication progress

2021-12-16 Thread osumi.takami...@fujitsu.com

On Monday, December 13, 2021 6:19 PM Amit Kapila  
wrote:
> On Tue, Dec 7, 2021 at 3:12 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> 
> Few questions and comments:
> 
> 1.
> The pg_stat_subscription_workers view will
> contain
> one row per subscription worker on which errors have occurred, for workers
> applying logical replication changes and workers handling the initial data
> -   copy of the subscribed tables.  The statistics entry is removed when the
> -   corresponding subscription is dropped.
> +   copy of the subscribed tables. Also, the row corresponding to the apply
> +   worker shows all transaction statistics of both types of workers on the
> +   subscription. The statistics entry is removed when the corresponding
> +   subscription is dropped.
> 
> Why did you choose to show stats for both types of workers in one row?
Now, the added stats show only the statistics of apply worker
as we agreed.


> 2.
> + PGSTAT_MTYPE_SUBWORKERXACTEND,
>  } StatMsgType;
> 
> I don't think we comma with the last message type.
Fixed.

 
> 3.
> + Oid m_subrelid;
> +
> + /* necessary to determine column to increment */ LogicalRepMsgType
> + m_command;
> +
> +} PgStat_MsgSubWorkerXactEnd;
> 
> Is m_subrelid used in this patch? If not, why did you keep it? I think if you
> choose to show separate stats for table sync and apply worker then probably it
> will be used.
Removed.


> 4.
>   /*
> + * Cumulative transaction statistics of subscription worker */
> + PgStat_Counter commit_count; PgStat_Counter error_count;
> + PgStat_Counter abort_count;
> +
> 
> I think it is better to keep the order of columns as commit_count, 
> abort_count,
> error_count in the entire patch.
Fixed.


The new patch is shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB83734A7A0596AC7ADB0DCB51ED779%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi

RE: Failed transaction statistics to measure the logical replication progress

2021-12-16 Thread osumi.takami...@fujitsu.com

On Thursday, December 16, 2021 4:00 PM I wrote:
> Thank you, everyone for confirming the direction.
> I'll follow the consensus of the community and fix the patch, including other
> comments.
> I'll treat only the stats for apply workers.
Hi, created a new version v17 according to the recent discussion
with changes to address other review comments.

Kindly have a look at it.

Best Regards,
Takamichi Osumi



v17-0002-Extend-pg_stat_subscription_workers-to-include-g.patch
Description:  v17-0002-Extend-pg_stat_subscription_workers-to-include-g.patch


v17-0001-Fix-alignments-of-TAP-tests-for-pg_stat_subscrip.patch
Description:  v17-0001-Fix-alignments-of-TAP-tests-for-pg_stat_subscrip.patch

Re: Transparent column encryption

2021-12-16 Thread Peter Eisentraut


On 16.12.21 05:47, Greg Stark wrote:

In the server, the encrypted datums are stored in types called
encryptedr and encryptedd (for randomized and deterministic
encryption).  These are essentially cousins of bytea.


Does that mean someone could go in with psql and select out the data
without any keys and just get a raw bytea-like representation? That
seems like a natural and useful thing to be able to do. For example to
allow dumping a table and loading it elsewhere and transferring keys
through some other channel (perhaps only as needed).


Yes to all of that.

Re: pg_upgrade should truncate/remove its logs before running

2021-12-16 Thread Daniel Gustafsson

> On 16 Dec 2021, at 12:11, Peter Eisentraut 
>  wrote:

> Could we make it write just one log file?  Is having multiple log files 
> better?

Having individual .txt files from checks with additional information
on how to handle the error are quite convenient when writing wrappers around
pg_upgrade (speaking from experience of having written multiple pg_upgraade
frontends).  Parsing a single logfile is more work, and will break existing
scripts.

I'm in favor of a predictable by default logpath, with a parameter to override,
as mentioned upthread.

--
Daniel Gustafsson   https://vmware.com/

Re: pg_upgrade should truncate/remove its logs before running

2021-12-16 Thread Peter Eisentraut


On 16.12.21 02:39, Michael Paquier wrote:

On Wed, Dec 15, 2021 at 04:13:10PM -0600, Justin Pryzby wrote:

On Wed, Dec 15, 2021 at 05:04:54PM -0500, Andrew Dunstan wrote:

The directory name needs to be predictable somehow, or maybe optionally
set as a parameter. Having just a timestamped directory name would make
life annoying for a poor buildfarm maintainer. Also, please don't change
anything before I have a chance to adjust the buildfarm code to what is
going to be done.


Feel free to suggest the desirable behavior.
It could write to pg_upgrade.log/* and refuse to run if the dir already exists.


Andrew's point looks rather sensible to me.  So, this stuff should
have a predictable name (pg_upgrade.log, pg_upgrade_log or upgrade_log
would be fine).  But I would also add an option to be able to define a
custom log path.  The latter would be useful for the regression tests
so as everything gets could get redirected to a path already filtered
out.


Could we make it write just one log file?  Is having multiple log files 
better?

Re: Use generation context to speed up tuplesorts

2021-12-16 Thread Ronan Dunklau

Le mercredi 8 décembre 2021, 22:07:12 CET Tomas Vondra a écrit :
> On 12/8/21 16:51, Ronan Dunklau wrote:
> > Le jeudi 9 septembre 2021, 15:37:59 CET Tomas Vondra a écrit :
> >> And now comes the funny part - if I run it in the same backend as the
> >> 
> >> "full" benchmark, I get roughly the same results:
> >>   block_size | chunk_size | mem_allocated | alloc_ms | free_ms
> >>  
> >>  ++---+--+-
> >>  
> >>32768 |512 | 806256640 |37159 |   76669
> >> 
> >> but if I reconnect and run it in the new backend, I get this:
> >>   block_size | chunk_size | mem_allocated | alloc_ms | free_ms
> >>  
> >>  ++---+--+-
> >>  
> >>32768 |512 | 806158336 |   233909 |  100785
> >>  
> >>  (1 row)
> >> 
> >> It does not matter if I wait a bit before running the query, if I run it
> >> repeatedly, etc. The machine is not doing anything else, the CPU is set
> >> to use "performance" governor, etc.
> > 
> > I've reproduced the behaviour you mention.
> > I also noticed asm_exc_page_fault showing up in the perf report in that
> > case.
> > 
> > Running an strace on it shows that in one case, we have a lot of brk
> > calls,
> > while when we run in the same process as the previous tests, we don't.
> > 
> > My suspicion is that the previous workload makes glibc malloc change it's
> > trim_threshold and possibly other dynamic options, which leads to
> > constantly moving the brk pointer in one case and not the other.
> > 
> > Running your fifo test with absurd malloc options shows that indeed that
> > might be the case (I needed to change several, because changing one
> > disable the dynamic adjustment for every single one of them, and malloc
> > would fall back to using mmap and freeing it on each iteration):
> > 
> > mallopt(M_TOP_PAD, 1024 * 1024 * 1024);
> > mallopt(M_TRIM_THRESHOLD, 256 * 1024 * 1024);
> > mallopt(M_MMAP_THRESHOLD, 4*1024*1024*sizeof(long));
> > 
> > I get the following results for your self contained test. I ran the query
> > twice, in each case, seeing the importance of the first allocation and the
> > subsequent ones:
> > 
> > With default malloc options:
> >  block_size | chunk_size | mem_allocated | alloc_ms | free_ms
> > 
> > ++---+--+-
> > 
> >   32768 |512 | 795836416 |   300156 |  207557
> >  
> >  block_size | chunk_size | mem_allocated | alloc_ms | free_ms
> > 
> > ++---+--+-
> > 
> >   32768 |512 | 795836416 |   211942 |   77207
> > 
> > With the oversized values above:
> >  block_size | chunk_size | mem_allocated | alloc_ms | free_ms
> > 
> > ++---+--+-
> > 
> >   32768 |512 | 795836416 |   219000 |   36223
> >  
> >  block_size | chunk_size | mem_allocated | alloc_ms | free_ms
> > 
> > ++---+--+-
> > 
> >   32768 |512 | 795836416 |75761 |   78082
> > 
> > (1 row)
> > 
> > I can't tell how representative your benchmark extension would be of real
> > life allocation / free patterns, but there is probably something we can
> > improve here.
> 
> Thanks for looking at this. I think those allocation / free patterns are
> fairly extreme, and there probably are no workloads doing exactly this.
> The idea is the actual workloads are likely some combination of these
> extreme cases.
> 
> > I'll try to see if I can understand more precisely what is happening.
> 
> Thanks, that'd be helpful. Maybe we can learn something about tuning
> malloc parameters to get significantly better performance.
> 

Apologies for the long email, maybe what I will state here is obvious for 
others but I learnt a lot, so... 

I think I understood what the problem was in your generation tests: depending 
on the sequence of allocations we will raise a different maximum for 
mmap_threshold and trim_threshold. When an mmap block is freed, malloc will 
raise it's threshold as it consider memory freed to be better served by 
regular moving of the sbrk pointer, instead of using mmap to map memory.  This 
threshold is upped by multiplying it by two anytime we free a mmap'ed block.

At the same time, the trim_threshold is raised to double the mmap_threshold, 
considering  that this amount of memory should not be released to the OS 
because we have a good chance of reusing it. 

This can be demonstrated using the attached systemtap script, along with a 
patch adding new traces to generation_context for this purpose:
When running your query:

select block_size, chunk_size, x.* from
(values (512)) AS a(chunk_size),
(values (32768)) AS b(block_size),
 lateral generation_bench_fifo(100, block_size, chunk_size,

Re: Privilege required for IF EXISTS event if the object already exists

2021-12-16 Thread Shay Rojansky

>> Now, before creating tables, the ORM generates CREATE SCHEMA IF NOT
EXISTS, to ensure that the schema exists before CREATE TABLE; that's
reasonable general-purpose behavior.
>
> If the user hasn’t specified they want the schema created it’s arguable
that executing create schema anyway is reasonable.  The orm user should
know whether they are expected/able to create the schema as part of their
responsibilities and instruct the orm accordingly and expect it to only
create what it has been explicitly directed to create.

I think the point being missed here, is that the user isn't interacting
directly with PostgreSQL - they're doing so via an ORM which isn't
necessarily aware of everything. Yes, a switch could be added to the ORM
where the user instructs it to not ensure that the schema exists, but
that's placing unnecessary burden on the user - having the "ensure"
operation silently no-op (instead of throwing) if the schema already exists
just makes everything smoother.

Put another way, let's say I introduce a user-facing flag in my ORM to opt
out of CREATE SCHEMA IF NOT EXISTS. If the user forget to pre-create the
schema, they would still get an error when trying to create the tables
(since the schema doesn't exist). So failure to properly set up would
generate an error in any case, either when trying to create the schema (if
no flag is added), or when trying to create the table (if the flag is
added). This makes the flag pretty useless and an unnecesary extra burden
on the user, when the database could simply be ignoring CREATE SCHEMA IF
NOT EXISTS for the case where the schema already exists.

Is there any specific reason you think this shouldn't be done?

Re: Unnecessary delay in streaming replication due to replay lag

2021-12-16 Thread Kyotaro Horiguchi

At Wed, 15 Dec 2021 17:01:24 -0800, Soumyadeep Chakraborty 
 wrote in 
> Sure, that makes more sense. Fixed.

As I played with this briefly.  I started a standby from a backup that
has an access to archive.  I had the following log lines steadily.


[139535:postmaster] LOG:  database system is ready to accept read-only 
connections
[139542:walreceiver] LOG:  started streaming WAL from primary at 0/200 on 
timeline 1
cp: cannot stat '/home/horiguti/data/arc_work/00010003': No 
such file or directory
[139542:walreceiver] FATAL:  could not open file 
"pg_wal/00010003": No such file or directory
cp: cannot stat '/home/horiguti/data/arc_work/0002.history': No such file 
or directory
cp: cannot stat '/home/horiguti/data/arc_work/00010003': No 
such file or directory
[139548:walreceiver] LOG:  started streaming WAL from primary at 0/300 on 
timeline 1

The "FATAL:  could not open file" message from walreceiver means that
the walreceiver was operationally prohibited to install a new wal
segment at the time.  Thus the walreceiver ended as soon as started.
In short, the eager replication is not working at all.


I have a comment on the behavior and objective of this feature.

In the case where archive recovery is started from a backup, this
feature lets walreceiver start while the archive recovery is ongoing.
If walreceiver (or the eager replication) worked as expected, it would
write wal files while archive recovery writes the same set of WAL
segments to the same directory. I don't think that is a sane behavior.
Or, if putting more modestly, an unintended behavior.

In common cases, I believe archive recovery is faster than
replication.  If a segment is available from archive, we don't need to
prefetch it via stream.

If this feature is intended to use only for crash recovery of a
standby, it should fire only when it is needed.

If not, that is, if it is intended to work also for archive recovery,
I think the eager replication should start from the next segment of
the last WAL in archive but that would invite more complex problems.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pg_dump: Refactor getIndexes()

2021-12-16 Thread Peter Eisentraut


This rearranges the version-dependent pieces in the new more modular style.

I had originally written this before pre-9.2 support was removed and it 
had a few more branches then.  But I think it is still useful, and there 
are some pending patches that might add more branches for newer versions.From e4b049ec68995f1fda50b545e42ef2d2a7d8f8df Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 16 Dec 2021 06:21:04 +0100
Subject: [PATCH] pg_dump: Refactor getIndexes()

Rearrange the version-dependent pieces in the new more modular style.
---
 src/bin/pg_dump/pg_dump.c | 119 ++
 1 file changed, 44 insertions(+), 75 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 15dae8bd88..c368808dd9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -6508,6 +6508,50 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int 
numTables)
}
appendPQExpBufferChar(tbloids, '}');
 
+   resetPQExpBuffer(query);
+
+   appendPQExpBuffer(query,
+ "SELECT t.tableoid, t.oid, 
i.indrelid, "
+ "t.relname AS indexname, "
+ 
"pg_catalog.pg_get_indexdef(i.indexrelid) AS indexdef, "
+ "i.indkey, i.indisclustered, "
+ "c.contype, c.conname, "
+ "c.condeferrable, c.condeferred, "
+ "c.tableoid AS contableoid, "
+ "c.oid AS conoid, "
+ 
"pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, "
+ "(SELECT spcname FROM 
pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
+ "t.reloptions AS indreloptions, ");
+
+
+   if (fout->remoteVersion >= 90400)
+   appendPQExpBuffer(query,
+ "i.indisreplident, ");
+   else
+   appendPQExpBuffer(query,
+ "false AS indisreplident, ");
+
+   if (fout->remoteVersion >= 11)
+   appendPQExpBuffer(query,
+ "inh.inhparent AS parentidx, "
+ "i.indnkeyatts AS 
indnkeyatts, "
+ "i.indnatts AS indnatts, "
+ "(SELECT 
pg_catalog.array_agg(attnum ORDER BY attnum) "
+ "  FROM 
pg_catalog.pg_attribute "
+ "  WHERE attrelid = 
i.indexrelid AND "
+ "attstattarget >= 0) AS 
indstatcols, "
+ "(SELECT 
pg_catalog.array_agg(attstattarget ORDER BY attnum) "
+ "  FROM 
pg_catalog.pg_attribute "
+ "  WHERE attrelid = 
i.indexrelid AND "
+ "attstattarget >= 0) AS 
indstatvals ");
+   else
+   appendPQExpBuffer(query,
+ "0 AS parentidx, "
+ "i.indnatts AS indnkeyatts, "
+ "i.indnatts AS indnatts, "
+ "'' AS indstatcols, "
+ "'' AS indstatvals ");
+
/*
 * The point of the messy-looking outer join is to find a constraint 
that
 * is related by an internal dependency link to the index. If we find 
one,
@@ -6520,29 +6564,6 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int 
numTables)
if (fout->remoteVersion >= 11)
{
appendPQExpBuffer(query,
- "SELECT t.tableoid, t.oid, 
i.indrelid, "
- "t.relname AS indexname, "
- "inh.inhparent AS parentidx, "
- 
"pg_catalog.pg_get_indexdef(i.indexrelid) AS indexdef, "
- "i.indnkeyatts AS 
indnkeyatts, "
- "i.indnatts AS indnatts, "
- "i.indkey, i.indisclustered, "
- "i.indisreplident, "
- "c.contype, c.conname, "
- "c.condeferrable, 
c.condeferred, "
-

1 2 >

1 - 100 of 104 matches

Mail list logo