Re: PoC Refactor AM analyse API

2021-09-30 Thread Michael Paquier
On Wed, Sep 08, 2021 at 10:06:25AM -0500, Jaime Casanova wrote:
> This doesn't apply anymore because of c6fc50c, can you resubmit a new
> patch?

Activity has stalled here, so I have marked the entry as RwF in the CF
app.
--
Michael


signature.asc
Description: PGP signature


Re: Numeric x^y for negative x

2021-09-30 Thread Dean Rasheed
On Thu, 30 Sept 2021 at 18:25, Jaime Casanova
 wrote:
>
> Are you planning to commit this soon?
>

Yes, I'll take a look at it next week.

I think it's worth backpatching, despite the fact that it's a pretty
obscure corner case that probably isn't affecting anyone -- similar
fixes in this area have been backpatched, and keeping the code in the
back branches in sync will help with future maintenance and testing,
if any other bugs are found.

Regards,
Dean




Re: Support for NSS as a libpq TLS backend

2021-09-30 Thread Daniel Gustafsson
> On 1 Oct 2021, at 02:02, Jacob Champion  wrote:

> On my machine, at least, exit() is coming in due to a few calls to
> psprintf(), pstrdup(), and pg_malloc() in the new NSS code.
> (Disassembly via `objdump -S libpq.so` helped me track those down.) I'm
> working on a patch.

Ah, that makes perfect sense.  I was too focused on hunting in what new was
linked against that I overlooked the obvious.  Thanks for finding these.

--
Daniel Gustafsson   https://vmware.com/





Re: [UNVERIFIED SENDER] Re: Challenges preventing us moving to 64 bit transaction id (XID)?

2021-09-30 Thread Michael Paquier
On Wed, Sep 08, 2021 at 03:08:28PM +0900, Michael Paquier wrote:
> On top of the tests for needed for custom GUCs, this needs tests for
> the new int64 reloption.  I would suggest to add something in
> dummy_index_am, where we test all the reloption APIs.

My review here was three weeks ago, and there has been no replies from
the author, so I am marking this patch set as RwF.
--
Michael


signature.asc
Description: PGP signature


Re: Autovacuum (analyze) on partitioned tables for ATTACH/DETACH/DROP commands

2021-09-30 Thread Michael Paquier
On Wed, Sep 01, 2021 at 11:11:08AM +0200, Daniel Gustafsson wrote:
> This patch no longer applies to HEAD, can you please submit a rebased version
> for the commitfest?

Four weeks later, nothing has happened.  So I have marked the patch as
RwF.
--
Michael


signature.asc
Description: PGP signature


Re: Add statistics refresh materialized view

2021-09-30 Thread Michael Paquier
On Tue, Sep 07, 2021 at 06:11:14PM +0900, Seino Yuki wrote:
> I would like to withdraw this proposal.

This was registered in the CF, so marked as RwF.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Error out if SKIP LOCKED and WITH TIES are both specified

2021-09-30 Thread Michael Paquier
On Mon, Aug 30, 2021 at 03:55:10PM -0500, David Christensen wrote:
> On Mon, Aug 30, 2021 at 3:51 PM Alvaro Herrera  
> wrote:
>> I think we should do this, given that it has show potential to bite
>> people.  We should also add a small mentioned to this in the docs, as in
>> the attached.
>>
>> What do others think?

There is a CF entry for this patch:
https://commitfest.postgresql.org/34/3286/

Alvaro, are you planning to wrap that?
--
Michael


signature.asc
Description: PGP signature


Re: Slow standby snapshot

2021-09-30 Thread Michael Paquier
On Tue, Aug 10, 2021 at 12:45:17AM +0300, Michail Nikolaev wrote:
> Thanks for the feedback again.

From what I can see, there has been some feedback from Andres here,
and the thread is idle for six weeks now, so I have marked this patch
as RwF in the CF app.
--
Michael


signature.asc
Description: PGP signature


Re: pgbench bug candidate: negative "initial connection time"

2021-09-30 Thread Michael Paquier
On Wed, Sep 29, 2021 at 10:11:53PM +0900, Fujii Masao wrote:
> BTW, when logfile fails to be opened, pgbench gets stuck due to commit
> aeb57af8e6. So even if we decided not to back-patch those changes,
> we should improve the handling of logfile open failure, to fix the issue.

There is an entry in the CF for this thread:
https://commitfest.postgresql.org/34/3219/

I have moved that to the next one as some pieces are missing.  If you
are planning to handle the rest, could you register your name as a
committer?
--
Michael


signature.asc
Description: PGP signature


Re: when the startup process doesn't (logging startup delays)

2021-09-30 Thread Michael Paquier
On Thu, Sep 30, 2021 at 05:08:17PM -0400, Robert Haas wrote:
> It's certainly less of an issue than it used to be back in my day.
> 
> Any thoughts on the patch I attached?

I don't know.  Anyway, this is actively worked on, so I have taken the
liberty to move that to the next CF.
--
Michael


signature.asc
Description: PGP signature


Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2021-09-30 Thread Michael Paquier
On Thu, May 20, 2021 at 03:07:56PM +0900, Michael Paquier wrote:
> This stuff still needs to be expanded depending on how PostgresNode is
> made backward-compatible, but I'll wait for that to happen before
> going further down here.  I have also spent some time testing all that
> with MSVC, and the installation paths used for pg_regress&co make the
> script a tad more confusing, so I have dropped this part for now.

Andrew, as this is a bit tied to the buildfarm code and any
simplifications that could happen there, do you have any comments
and/or suggestions for this patch?

This still applies on HEAD and it holds all the properties of the
existing test by using PostgresNodes that point to older installations
for the business with binaries and libraries business.  There is one
part where pg_upgrade logs into src/test/regress/, which is not good,
but that should be easily fixable.
--
Michael


signature.asc
Description: PGP signature


Re: Skipping logical replication transactions on subscriber side

2021-09-30 Thread Amit Kapila
On Fri, Oct 1, 2021 at 6:30 AM Masahiko Sawada  wrote:
>
> On Fri, Oct 1, 2021 at 5:05 AM Peter Eisentraut
>  wrote:
> >
> > Also, what happens when you forget to reset the xid after it has passed?
> >   Will it get skipped again after wraparound?
>
> Yes.
>

Aren't we resetting the skip_xid once we skip that transaction in
stop_skipping_changes()? If so, it shouldn't be possible to skip it
again after the wraparound. Am I missing something?

Now, if the user has wrongly set some XID which we can't skip as that
is already in past or something like that then I think it is the
user's problem and that's why it can be done only by super users. I
think we have even thought of protecting that via cross-checking with
the information in view but as the view data is lossy, we can't rely
on that. I think users can even set some valid XID that never has any
error and we will still skip it which is what can be done today also
by pg_replication_origin_advance(). I am not sure if we can do much
about such scenarios except to carefully document them.

-- 
With Regards,
Amit Kapila.




Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-09-30 Thread Michael Paquier
On Mon, May 31, 2021 at 09:37:17PM +0900, Michael Paquier wrote:
> I have been looking at all that for the last couple of days, and
> checked the code to make sure that relying on RecoveryInProgress() as
> the tipping point is logically correct in terms of virtual XID,
> snapshot build and KnownAssignedXids cleanup.  This stuff is tricky
> enough that I may have missed something, but my impression (and
> testing) is that we should be safe.

A couple of months later, I have looked back at this thread and this
report.  I have rechecked all the standby handling and snapshot builds
involving KnownAssignedXids and looked at all the phases that are
getting called until we call ShutdownRecoveryTransactionEnvironment()
to release these, and I don't think that there is a problem with the
solution proposed here.  So I propose to move on and apply this
patch.  Please let me know if there are any objections.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Document XLOG_INCLUDE_XID a little better

2021-09-30 Thread Amit Kapila
On Fri, Oct 1, 2021 at 12:32 AM Robert Haas  wrote:
>
> On Thu, Sep 30, 2021 at 6:08 AM Amit Kapila  wrote:
> > I think we can do better than using XLOG_INCLUDE_XID flag in the
> > record being inserted. We need this flag so that we can mark
> > SubTransaction assigned after XLogInsertRecord()  is successful. We
> > can instead output a flag (say sub_xact_assigned) from
> > XLogRecordAssemble() and pass it to XLogInsertRecord(). Then in
> > XLogInsertRecord(), we can mark SubTransactionAssigned once the record
> > is inserted (after or before calling
> > MarkCurrentTransactionIdLoggedIfAny()).
>
> Isn't there other communication between these routines that just uses
> global variables?
>

AFAICS, there are two possibilities w.r.t global variables: (a) use
curinsert_flags which we are doing now, (b) another is to introduce a
new global variable, set it after we make the association, and then
reset it after we mark SubTransaction assigned and on error. I have
also thought of passing it via XLogCtlInsert but as that is shared by
different processes, it can be set by one process and be read by
another process which we don't want here.

I am not sure if any of these is better than doing this communication
via local variable. Do you have something specific in mind?

-- 
With Regards,
Amit Kapila.




Re: non-superusers are allowed to drop the replication user, but are not allowed to alter or even create them, is that ok?

2021-09-30 Thread Ashutosh Sharma
On Thu, Sep 30, 2021 at 7:45 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Sep 30, 2021 at 3:37 PM Ashutosh Sharma 
> wrote:
> >
> > Hi All,
> >
> > While working on one of the internal projects I noticed that currently
> in Postgres, we do not allow normal users to alter attributes of the
> replication user. However we do allow normal users to drop replication
> users or to even rename it using the alter command. Is that behaviour ok?
> If yes, can someone please help me understand how and why this is okay.
> >
> > Here is an example illustrating this behaviour:
> >
> > supusr@postgres=# create user repusr with password 'repusr' replication;
> > CREATE ROLE
> >
> > supusr@postgres=# create user nonsu with password 'nonsu' createrole
> createdb;
> > CREATE ROLE
> >
> > supusr@postgres=# \c postgres nonsu;
> > You are now connected to database "postgres" as user "nonsu".
> >
> > nonsu@postgres=> alter user repusr nocreatedb;
> > ERROR:  42501: must be superuser to alter replication roles or change
> replication attribute
> >
> > nonsu@postgres=> alter user repusr rename to refusr;
> > ALTER ROLE
> >
> > nonsu@postgres=> drop user refusr;
> > DROP ROLE
> >
> > nonsu@postgres=> create user repusr2 with password 'repusr2'
> replication;
> > ERROR:  42501: must be superuser to create replication users
>
> I think having createrole for a non-super allows them to rename/drop a
> user with a replication role. Because renaming/creating/dropping roles
> is what createrole/nocreaterole is meant for.
>

Well, if we go by this theory then the CREATE ROLE command shouldn't have
failed, right?

--
With Regards,
Ashutosh Sharma.


Re: Reserve prefixes for loaded libraries proposal

2021-09-30 Thread Laurenz Albe
On Thu, 2021-09-30 at 18:26 -0400, Chapman Flack wrote:
> On 09/30/21 17:54, Florin Irion wrote:
> > We could also help users get a warning if they set a parameter with the
> > `SET` command.
> 
> So I am in favor of patching this.

+1 on the idea.

Yours,
Laurenz Albe





Re: non-superusers are allowed to drop the replication user, but are not allowed to alter or even create them, is that ok?

2021-09-30 Thread Ashutosh Sharma
On Thu, Sep 30, 2021 at 8:40 PM Mark Dilger 
wrote:

>
>
> > On Sep 30, 2021, at 3:07 AM, Ashutosh Sharma 
> wrote:
> >
> > While working on one of the internal projects I noticed that currently
> in Postgres, we do not allow normal users to alter attributes of the
> replication user. However we do allow normal users to drop replication
> users or to even rename it using the alter command. Is that behaviour ok?
> If yes, can someone please help me understand how and why this is okay.
>
> The definition of CREATEROLE is a bit of a mess.  Part of the problem is
> that roles do not have owners, which makes the permissions to drop roles
> work differently than for other object types.  I have a patch pending [1]
> for the version 15 development cycle that fixes this and other problems.
> I'd appreciate feedback on the design and whether it addresses your
> concerns.
>
> [1] https://commitfest.postgresql.org/34/3223/


Thanks Mark. I'll take a look at this thread in detail to see if
it addresses the issue raised here. Although from the first email it seems
like the proposal is about allowing normal users to set some of the GUC
params that can only be set by the superusers.

With Regards,
Ashutosh Sharma.


Re: Diagnostic comment in LogicalIncreaseXminForSlot

2021-09-30 Thread Amit Kapila
On Fri, Oct 1, 2021 at 6:36 AM Masahiko Sawada  wrote:
>
> On Fri, Oct 1, 2021 at 1:45 AM Jaime Casanova
>  wrote:
> >
> > On Tue, Sep 07, 2021 at 11:14:23AM +0530, Amit Kapila wrote:
> > > On Mon, Sep 6, 2021 at 5:29 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > Yeah, I agree. Sorry for missing that.
> > > >
> > > > The updated patch looks good to me.
> > > >
> > >
> > > Pushed!
> > >
> >
> > This patch is still on "Needs review"!
> > Should we change it to Committed or is expected something else
> > about it?
>
> Yes, the patch already gets committed (4c347885). So I also think we
> should mark it as Committed.
>

Right, I have changed the status to Committed.

-- 
With Regards,
Amit Kapila.




minor gripe about lax reloptions parsing for views

2021-09-30 Thread Mark Dilger
Does this bother anyone else:

CREATE INDEX uses an amoptions parser specific for the index type and, at least 
for btree, rejects relation options from the "toast" namespace:

+-- Bad reloption for index draws an error
+CREATE INDEX idx ON test_tbl USING btree (i) WITH (toast.nonsense=insanity);
+ERROR:  unrecognized parameter namespace "toast"

No so for CREATE VIEW, which shares logic with CREATE TABLE:

+-- But not for views, where "toast" namespace relopts are ignored
+CREATE VIEW nonsense_1 WITH (toast.nonsense=insanity, toast.foo="bar baz")
+   AS SELECT * FROM test_tbl;
+SELECT relname, reloptions FROM pg_class WHERE relname = 'nonsense_1';
+  relname   | reloptions 
++
+ nonsense_1 | 
+(1 row)
+
+-- Well-formed but irrelevant toast options are also silently ignored
+CREATE VIEW vac_opts_1 WITH (toast.autovacuum_enabled=false)
+   AS SELECT * FROM test_tbl;
+SELECT relname, reloptions FROM pg_class WHERE relname = 'vac_opts_1';
+  relname   | reloptions 
++
+ vac_opts_1 | 
+(1 row)

So far as I can see, this does no harm other than to annoy me.  It might 
confuse new users, though, as changing to a MATERIALIZED VIEW makes the toast 
options relevant, but the user feedback for the command is no different:

+-- But if we upgrade to a materialized view, they are not ignored, but
+-- they attach to the toast table, not the view, so users might not notice
+-- the difference
+CREATE MATERIALIZED VIEW vac_opts_2 WITH (toast.autovacuum_enabled=false)
+   AS SELECT * FROM test_tbl;
+SELECT relname, reloptions FROM pg_class WHERE relname = 'vac_opts_2';
+  relname   | reloptions 
++
+ vac_opts_2 | 
+(1 row)
+
+-- They can find the difference if they know where to look
+SELECT rel.relname, toast.relname, toast.reloptions
+   FROM pg_class rel LEFT JOIN pg_class toast ON rel.reltoastrelid = toast.oid 
+   WHERE rel.relname IN ('nonsense_1', 'vac_opts_1', 'vac_opts_2');
+  relname   |relname | reloptions 
+++
+ nonsense_1 || 
+ vac_opts_1 || 
+ vac_opts_2 | pg_toast_19615 | {autovacuum_enabled=false}
+(3 rows)

The solution is simple enough:  stop using HEAP_RELOPT_NAMESPACES when parsing 
reloptions for views and instead create a VIEW_RELOPT_NAMESPACES array which 
does not include "toast".

I've already fixed this, mixed into some other work.  I'll pull it out as its 
own patch if there is any interest.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Allow escape in application_name

2021-09-30 Thread Kyotaro Horiguchi
At Mon, 27 Sep 2021 04:10:50 +, "kuroda.hay...@fujitsu.com" 
 wrote in 
> I'm sorry for sending a bad patch...

Thank you for the new version, and sorry for making the discussion go
back and forth:p

> > + * Note: StringInfo datatype cannot be accepted
> > + * because elog.h should not include postgres-original header file.
> > 
> > How about moving the function to guc.c from elog.c because it's for
> > the parameters, i.e., log_line_prefix and postgres_fdw.application_name?
> > This allows us to use StringInfo in the function?
> 
> Yeah, StringInfo can be used in guc.c. Hence moved it.
> Some variables and functions for timestamp became non-static function,
> because they were used both normal logging and log_line_prefix.
> I think their name is not too generic so I did not fix them.
> 
> > +   parse_pgfdw_appname(buf, values[i]);
> > +   /*
> > +* Note that appname may becomes an empty
> > string
> > +* if an input string has wrong format.
> > +*/
> > +   values[i] = *buf;
> > 
> > If postgres_fdw.application_name contains only invalid escape characters 
> > like
> > "%b", parse_pgfdw_appname() returns an empty string. We discussed
> > there are four options to handle this case and we concluded (4) is better.
> > Right? But ISTM that the patch uses (2).
> > 
> > > (1) Use the original postgres_fdw.application_name like "%b"
> > > (2) Use the application_name of the server object (if set)
> > > (3) Use fallback_application_name
> > > (4) Use empty string as application_name
> 
> Yeah, currently my patch uses case (2). I tried to implement (4),
> but I found that libpq function(may be conninfo_array_parse()) must be 
> modified in order to that.
> We moved the functionality to libpq layer because we want to avoid some side 
> effects,
> so we started to think case (4) might be wrong.
> 
> Now we propose the following senario:
> 1. Use postgres_fdw.application_name when it is set and the parsing result is 
> not empty
> 2. If not, use the foreign-server option when it is set and the parsing 
> result is not empty
> 3. If not, use fallback_application_name
> 
> How do you think?

I think we don't have a predecessor of the case like this where a
behavior is decided from object option and GUC.

I'm a bit uncomfortable with .conf configuration overrides server
options, but I also think in-session-set GUC should override server
options.  So, it's slightly against POLA but from the standpoint of
usability +0.5 to that prioritization since I cannot come up with a
better idea.


I thought it is nice to share process_format_string but the function
is too tightly coupled with ErrorData detail as you pointed off-list.
So I came to think we cannot use the function from outside.  It could
be a possibility to make the function be just a skeleton that takes a
list of pairs of an escaped character and the associated callback
function but that is apparently too-much complex.  (It would be worth
doing if we share the same backbone processing with archive_command,
restore_command, recover_end_command and so on, but that is
necessarily accompanied by the need to change the behavior either
log_line_prefix or others.)

I personally don't care even if this feature implements
padding-reading logic differently from process_format_string, but if
we don't like that, I would return to suggest using strtol in both
functions.

As Fujii-san pointed upthread, strtol behaves a bit different way than
we expect here, but that can be handled by checking the starting
characters.

>   if (*p == '-' ? isdigit(p[1]) : isdigit(p[0]))
>   {
>   char *endptr;
>   padding = strtol(p, &endptr, 10);
>   if (p == endptr)
>   break;
>   p = endptr;
>   }
>   else
>   padding = 0;

The code gets a bit more complex but simplification by removing the
helper function wins. strtol is slower than the original function but
it can be thought in noise-level? isdigit on some platforms seems
following locale, but it is already widely used for example while
numeric parsing so I don't think that matters. (Of course we can get
rid of isdigit by using bare comparisons.)

I think it can be a separate preparatory patch of this patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Record a Bitmapset of non-pruned partitions

2021-09-30 Thread David Rowley
On Fri, 1 Oct 2021 at 13:37, Amit Langote  wrote:
> > I've attached a patch that updates the comments to mention this.
>
> Looks good to me.  Thanks.

Thanks. Pushed.

David




Re: Addition of authenticated ID to pg_stat_activity

2021-09-30 Thread Michael Paquier
On Wed, Jul 21, 2021 at 01:21:17PM +0900, Michael Paquier wrote:
> The authenticated ID could be a SSL DN longer than the default of
> 128kB that this patch is proposing.  I think that it is a good idea to
> provide some way to the user to be able to control that without a
> recompilation.

I got to think about this patch more for the last couple of days, and
I'd still think that having a GUC to control how much shared memory we
need for the authenticated ID in each BackendStatusArray.  Now, the
thread has been idle for two months now, and it does not seem to
attract much attention.  This also includes a split of
pg_stat_activity for client_addr, client_hostname and client_port into
a new catalog, which may be hard to justify for this feature.  So I am
dropping the patch.
--
Michael


signature.asc
Description: PGP signature


Re: Diagnostic comment in LogicalIncreaseXminForSlot

2021-09-30 Thread Masahiko Sawada
On Fri, Oct 1, 2021 at 1:45 AM Jaime Casanova
 wrote:
>
> On Tue, Sep 07, 2021 at 11:14:23AM +0530, Amit Kapila wrote:
> > On Mon, Sep 6, 2021 at 5:29 PM Ashutosh Bapat
> >  wrote:
> > >
> > > Yeah, I agree. Sorry for missing that.
> > >
> > > The updated patch looks good to me.
> > >
> >
> > Pushed!
> >
>
> This patch is still on "Needs review"!
> Should we change it to Committed or is expected something else
> about it?

Yes, the patch already gets committed (4c347885). So I also think we
should mark it as Committed.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-09-30 Thread Masahiko Sawada
On Fri, Oct 1, 2021 at 5:05 AM Peter Eisentraut
 wrote:
>
> On 30.09.21 07:45, Masahiko Sawada wrote:
> > I've attached updated patches that incorporate all comments I got so
> > far. Please review them.
>
> I'm uneasy about the way the xids-to-be-skipped are presented as
> subscriptions options, similar to settings such as "binary".  I see how
> that is convenient, but it's not really the same thing, in how you use
> it, is it?  Even if we share some details internally, I feel that there
> should be a separate syntax somehow.

Since I was thinking that ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION, in the first several
version patches it added a separate syntax for this feature like ALTER
SUBSCRIPTION ... SET SKIP TRANSACTION xxx. But Amit was concerned
about an additional syntax and consistency with disable_on_error[1]
which is proposed by Mark Diliger[2], so I’ve changed it to a
subscription option. I tried to find a policy of that by checking the
existing syntaxes but I could not find, and interestingly when it
comes to ALTER SUBSCRIPTION syntax, we support both ENABLE/DISABLE
syntax and SET (enabled = on/off) option.

> Also, what happens when you forget to reset the xid after it has passed?
>   Will it get skipped again after wraparound?

Yes. Currently it's a user's responsibility. We thoroughly documented
the risk of this feature and thus it should be used as a last resort
since it may easily make the subscriber inconsistent, especially if a
user specifies the wrong transaction ID.

Regards,

[1] 
https://www.postgresql.org/message-id/CAA4eK1LjrU8x%2Bx%3DbFazVD10pgOVy0PEE8mpz3nQhDG%2BmmU8ivQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/DB35438F-9356-4841-89A0-412709EBD3AB%40enterprisedb.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Record a Bitmapset of non-pruned partitions

2021-09-30 Thread Amit Langote
On Fri, Oct 1, 2021 at 7:07 AM David Rowley  wrote:
> On Thu, 30 Sept 2021 at 20:25, Amit Langote  wrote:
> > Related to the above, I noticed while looking at
> > build_merged_partition_bounds() that db632fbca3 missed adding a line
> > to that function to set interleaved_parts to NULL.  Because the
> > PartitionBoundInfo is only palloc'd (not palloc0'd), interleaved_parts
> > of a "merged" bounds struct ends up pointing to garbage, so let's fix
> > that.  Attached a patch.
>
> Thanks for the patch.
>
> I think we also need to document that interleaved_parts is not set for
> join relations, otherwise someone may in the future try to use that
> field for an optimisation for join relations.  At the moment, per
> generate_orderedappend_paths, we only handle IS_SIMPLE_REL type
> relations.
>
> I've attached a patch that updates the comments to mention this.

Looks good to me.  Thanks.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Support for NSS as a libpq TLS backend

2021-09-30 Thread Jacob Champion
On Thu, 2021-09-30 at 16:04 +, Jacob Champion wrote:
> On Thu, 2021-09-30 at 14:17 +0200, Daniel Gustafsson wrote:
> > The libpq libnss implementation doesn't call either of these, and neither 
> > does
> > libnss.
> 
> I thought the refs check only searched for direct symbol dependencies;
> is that piece of NSPR being statically included somehow?

On my machine, at least, exit() is coming in due to a few calls to
psprintf(), pstrdup(), and pg_malloc() in the new NSS code.
(Disassembly via `objdump -S libpq.so` helped me track those down.) I'm
working on a patch.

--Jacob


Re: 002_types.pl fails on some timezones on windows

2021-09-30 Thread Tom Lane
Thomas Munro  writes:
> Yes, it's been a while but IIRC Windows in the UK uses confusing
> terminology here even in user interfaces, so that in summer it appears
> to be wrong, which is annoying to anyone brought up on Eggert's
> system.  The CLDR windowsZones.xml file shows this.

Oh, thanks for the pointer to CLDR!  I tried re-generating our data
based on theirs, and ended up with the attached draft patch.
My notes summarizing the changes say:


Choose Europe/London for "Greenwich Standard Time"
(CLDR doesn't do this, but all their mappings for it are insane)

Alphabetize a bit better


Zone name changes:

Jerusalem Standard Time -> Israel Standard Time

Numerous Russian zones slightly renamed

Should we preserve the old spellings of the above?  It's not clear
how long-obsolete the old spellings are.


Maybe politically sensitive:

Asia/Hong_Kong -> Asia/Shanghai

I think the latter has way better claim on "China Standard Time",
and CLDR agrees.


Resolve Links to underlying real zones:

Asia/Kuwait -> Asia/Riyadh
Asia/Muscat -> Asia/Dubai
Australia/Canberra -> Australia/Sydney
Canada/Atlantic -> America/Halifax
Canada/Newfoundland -> America/St_Johns
Canada/Saskatchewan -> America/Regina
US/Alaska -> America/Anchorage
US/Arizona -> America/Phoenix
US/Central -> America/Chicago
US/Eastern -> America/New_York
US/Hawaii -> Pacific/Honolulu
US/Mountain -> America/Denver
US/Pacific -> America/Los_Angeles


Just plain wrong:

US/Aleutan (misspelling of US/Aleutian, which is a link anyway)

America/Salvador does not exist; tzdb says
# There are too many Salvadors elsewhere, so use America/Bahia instead
# of America/Salvador.

Etc/UTC+12 doesn't exist in tzdb

Indiana (East) is not the regular US/Eastern zone

Asia/Baku -> Asia/Yerevan (Baku is in Azerbaijan, Yerevan is in Armenia)

Asia/Dhaka -> Asia/Almaty (Dhaka has its own zone, and it's in Bangladesh
not Astana)

Europe/Sarajevo is a link to Europe/Belgrade these days, so use Warsaw

Chisinau is in Moldova not Romania

Chetumal is in Quintana Roo, which is represented by Cancun not Mexico City

Haiti has its own zone

America/Araguaina seems to just be a mistake; use Sao_Paulo

America/Buenos_Aires for SA Eastern Standard Time is a mistake
(it has its own zone)
likewise America/Caracas for SA Western Standard Time

Africa/Harare seems to be obsoleted by Africa/Johannesburg

Karachi is in Pakistan, not Tashkent


New Windows zones:

"South Sudan Standard Time" -> Africa/Juba

"West Bank Standard Time" -> Asia/Hebron
(CLDR seem to have this replacing Gaza, but I kept that one too)

"Yukon Standard Time" -> America/Whitehorse

uncomment "W. Central Africa Standard Time" as Africa/Lagos

regards, tom lane

diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c
index 3c2b8d4e29..5ae5a576c9 100644
--- a/src/bin/initdb/findtimezone.c
+++ b/src/bin/initdb/findtimezone.c
@@ -750,12 +750,12 @@ static const struct
 	{
 		/* (UTC-09:00) Alaska */
 		"Alaskan Standard Time", "Alaskan Daylight Time",
-		"US/Alaska"
+		"America/Anchorage"
 	},
 	{
 		/* (UTC-10:00) Aleutian Islands */
 		"Aleutian Standard Time", "Aleutian Daylight Time",
-		"US/Aleutan"
+		"America/Adak"
 	},
 	{
 		/* (UTC+07:00) Barnaul, Gorno-Altaysk */
@@ -765,12 +765,12 @@ static const struct
 	{
 		/* (UTC+03:00) Kuwait, Riyadh */
 		"Arab Standard Time", "Arab Daylight Time",
-		"Asia/Kuwait"
+		"Asia/Riyadh"
 	},
 	{
 		/* (UTC+04:00) Abu Dhabi, Muscat */
 		"Arabian Standard Time", "Arabian Daylight Time",
-		"Asia/Muscat"
+		"Asia/Dubai"
 	},
 	{
 		/* (UTC+03:00) Baghdad */
@@ -795,7 +795,7 @@ static const struct
 	{
 		/* (UTC-04:00) Atlantic Time (Canada) */
 		"Atlantic Standard Time", "Atlantic Daylight Time",
-		"Canada/Atlantic"
+		"America/Halifax"
 	},
 	{
 		/* (UTC+09:30) Darwin */
@@ -810,7 +810,7 @@ static const struct
 	{
 		/* (UTC+10:00) Canberra, Melbourne, Sydney */
 		"AUS Eastern Standard Time", "AUS Eastern Daylight Time",
-		"Australia/Canberra"
+		"Australia/Sydney"
 	},
 	{
 		/* (UTC+04:00) Baku */
@@ -825,37 +825,32 @@ static const struct
 	{
 		/* (UTC-03:00) Salvador */
 		"Bahia Standard Time", "Bahia Daylight Time",
-		"America/Salvador"
+		"America/Bahia"
 	},
 	{
 		/* (UTC+06:00) Dhaka */
 		"Bangladesh Standard Time", "Bangladesh Daylight Time",
 		"Asia/Dhaka"
 	},
-	{
-		/* (UTC+11:00) Bougainville Island */
-		"Bougainville Standard Time", "Bougainville Daylight Time",
-		"Pacific/Bougainville"
-	},
 	{
 		/* (UTC+03:00) Minsk */
 		"Belarus Standard Time", "Belarus Daylight Time",
 		"Europe/Minsk"
 	},
+	{
+		/* (UTC+11:00) Bougainville Island */
+		"Bougainville Standard Time", "Bougainville Daylight Time",
+		"Pacific/Bougainville"
+	},
 	{
 		/* (UTC-01:00) Cabo Verde Is. */
 		"Cabo Verde Standard Time", "Cabo Verde Daylight Time",
 		"Atlantic/Cape_Verde"
 	},
-	{
-		/* (UTC+12:45) Chatham Islands */
-		"Chatham Islands Standard Time", "Chatham Islands Daylight Time",
-		"Pacific/Chatham"
-	},
 	{
 		/* (UT

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-09-30 Thread Alvaro Herrera
Can you say more about 0001?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"




Re: Parallel Full Hash Join

2021-09-30 Thread Thomas Munro
On Tue, Sep 21, 2021 at 9:29 AM Jaime Casanova
 wrote:
> Do you intend to commit 0001 soon? Specially if this apply to 14 should
> be committed in the next days.

Thanks for the reminder.  Yes, I'm looking at this now, and looking
into the crash of this patch set on CI:

https://cirrus-ci.com/task/5282889613967360

Unfortunately, cfbot is using very simple and old CI rules which don't
have a core dump analysis step on that OS.  :-(  (I have a big upgrade
to all this CI stuff in the pipeline to fix that, get full access to
all logs, go faster, and many other improvements, after learning a lot
of tricks about running these types of systems over the past year --
more soon.)

> 0003: i'm testing this now, not at a big scale but just to try to find
> problems

Thanks!




Re: Enabling deduplication with system catalog indexes

2021-09-30 Thread Peter Geoghegan
On Wed, Sep 29, 2021 at 3:32 PM Peter Geoghegan  wrote:
> I decided to run a simple experiment, to give us some idea of what
> benefits my proposal gives users: I ran "make installcheck" on a newly
> initdb'd database (master branch), and then with the attached patch
> (which enables deduplication with system catalog indexes) applied.

I will commit this patch in a few days, barring objections.

-- 
Peter Geoghegan




Re: Reserve prefixes for loaded libraries proposal

2021-09-30 Thread Chapman Flack
On 09/30/21 17:54, Florin Irion wrote:

> We could also help users get a warning if they set a parameter with the
> `SET` command.

This is funny. For years I have been so confident I knew how this worked
that I, obviously, hadn't tried it. :)

My first setting of a made-up variable gets no warning, as I already expected:

postgres=# set plpgsql.no_such_setting = false;
SET

Then as soon as I do the first thing in the session involving plpgsql,
I get the warning for that one:

postgres=# do language plpgsql
postgres-# 'begin delete from pg_class where false; end';
WARNING:  unrecognized configuration parameter "plpgsql.no_such_setting"
DO


But then, I have always assumed I would get warnings thereafter:

postgres=# set plpgsql.not_this_one_neither = false;
SET

But no!

So I am in favor of patching this.

Regards,
-Chap




Re: Record a Bitmapset of non-pruned partitions

2021-09-30 Thread David Rowley
On Thu, 30 Sept 2021 at 20:25, Amit Langote  wrote:
> Related to the above, I noticed while looking at
> build_merged_partition_bounds() that db632fbca3 missed adding a line
> to that function to set interleaved_parts to NULL.  Because the
> PartitionBoundInfo is only palloc'd (not palloc0'd), interleaved_parts
> of a "merged" bounds struct ends up pointing to garbage, so let's fix
> that.  Attached a patch.

Thanks for the patch.

I think we also need to document that interleaved_parts is not set for
join relations, otherwise someone may in the future try to use that
field for an optimisation for join relations.  At the moment, per
generate_orderedappend_paths, we only handle IS_SIMPLE_REL type
relations.

I've attached a patch that updates the comments to mention this.

David


initialize-inteleaved_parts_v2.patch
Description: Binary data


Reserve prefixes for loaded libraries proposal

2021-09-30 Thread Florin Irion
Hello,

If we set a parameter in the postgresql.conf that the loaded library doesn't
recognize at startup, it throws a warning.
For example if one sets `plpgsql.no_such_setting` for plpgsql:

```
WARNING: unrecognized configuration parameter "plpgsql.no_such_setting"
```

We could also help users get a warning if they set a parameter with the
`SET`
command. I've seen many cases where users make typos and break things badly,
check the following example:

```
postgres=# BEGIN;
BEGIN
postgres=*# SET plpgsql.no_such_setting = false;
SET
postgres=*# -- do critical queries taking into account that
plpgsql.no_such_setting is false;
postgres=*# COMMIT;
COMMIT
```

I propose to make the user aware of such mistakes. I also made the patch
only
to warn the user but still correctly `SET` the parameter so that he is the
one
that chooses if he wants to continue or `ROLLBACK`. I don't know if this
last
part is correct, but at least it doesn't break any previous implementation.

This is what I mean:

```
postgres=# BEGIN;
BEGIN
postgres=*# SET plpgsql.no_such_setting = false;
WARNING: unrecognized configuration parameter "plpgsql.no_such_setting"
DETAIL: "plpgsql" is a reserved prefix.
HINT: If you need to create a custom placeholder use a different prefix.
SET
postgres=*# -- choose to continue or not based on the warning
postgres=*# ROLLBACK or COMMIT
```

The patch I'm attaching is registering the prefix for all the loaded
libraries,
and eventually, it uses them to check if any parameter is recognized,just
as we
do at startup.

Please, let me know what you think.

Cheers,
Florin Irion


reservePrefixWarnUser.patch
Description: Binary data


PATH manipulation in 001_libpq_pipeline.pl fails on windows

2021-09-30 Thread Andres Freund
Hi,

For me 001_libpq_pipeline.pl doesn't reliably work on windows, because it
tries to add something to PATH, using unix syntax (vs ; used on windows).

$ENV{PATH} = "$ENV{TESTDIR}:$ENV{PATH}";

If the first two elements in PATH are something needed, this can cause the
test to fail... I'm surprised this doesn't cause problems on the buildfarm - a
plain
  perl src\tools\msvc\vcregress.pl taptest src\test\modules\libpq_pipeline\
fails for me.

Greetings,

Andres Freund




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-09-30 Thread Melanie Plageman
On Wed, Sep 29, 2021 at 4:46 PM Melanie Plageman
 wrote:
>
> On Mon, Sep 27, 2021 at 2:58 PM Melanie Plageman
>  wrote:
> >
> > On Fri, Sep 24, 2021 at 5:58 PM Melanie Plageman
> >  wrote:
> > >
> > > On Thu, Sep 23, 2021 at 5:05 PM Melanie Plageman
> > >  wrote:
> > > The only remaining TODOs are described in the commit message.
> > > most critical one is that the reset message doesn't work.
> >
> > v10 is attached with updated comments and some limited code refactoring
>
> v11 has fixed the oversize message issue by sending a reset message for
> each backend type. Now, we will call GetCurrentTimestamp
> BACKEND_NUM_TYPES times, so maybe I should add some kind of flag to the
> reset message that indicates the first message so that all the "do once"
> things can be done at that point.
>
> I've also fixed a few style/cosmetic issues and updated the commit
> message with a link to the thread [1] where I proposed smgrwrite() and
> smgrextend() wrappers (which is where I propose to call
> pgstat_incremement_buffer_access_type() for unbuffered writes and
> extends).
>
> - Melanie
>
> [1] 
> https://www.postgresql.org/message-id/CAAKRu_aw72w70X1P%3Dba20K8iGUvSkyz7Yk03wPPh3f9WgmcJ3g%40mail.gmail.com


v12 (attached) has various style and code clarity updates (it is
pgindented as well). I also added a new commit which creates a utility
function to make a tuplestore for views that need one in pgstatfuncs.c.

Having received some offlist feedback about the names BufferAccessType
and BufferType being confusing, I am planning to rename these variables
and all of the associated functions. I agree that BufferType and
BufferAccessType are confusing for the following reasons:
  - They sound similar.
  - They aren't very precise.
  - One of the types of buffers is not using a Postgres buffer.

So far, the proposed alternative is IO_Op or IOOp for BufferAccessType
and IOPath for BufferType.

- Melanie
From 5c3f382ba4eef310fc82b2b676029097eb99cd70 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 29 Sep 2021 15:44:51 -0400
Subject: [PATCH v12 4/4] Remove superfluous bgwriter stats

Remove stats from pg_stat_bgwriter which are now more clearly expressed
in pg_stat_buffers.

TODO:
- make pg_stat_checkpointer view and move relevant stats into it
- add additional stats to pg_stat_bgwriter
---
 doc/src/sgml/monitoring.sgml  | 47 ---
 src/backend/catalog/system_views.sql  |  6 +---
 src/backend/postmaster/checkpointer.c | 26 ---
 src/backend/postmaster/pgstat.c   |  5 ---
 src/backend/storage/buffer/bufmgr.c   |  6 
 src/backend/utils/adt/pgstatfuncs.c   | 30 -
 src/include/catalog/pg_proc.dat   | 22 -
 src/include/pgstat.h  | 10 --
 src/test/regress/expected/rules.out   |  5 ---
 9 files changed, 1 insertion(+), 156 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 75753c3339..5852c45246 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3416,24 +3416,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
- 
-  
-   buffers_checkpoint bigint
-  
-  
-   Number of buffers written during checkpoints
-  
- 
-
- 
-  
-   buffers_clean bigint
-  
-  
-   Number of buffers written by the background writer
-  
- 
-
  
   
maxwritten_clean bigint
@@ -3444,35 +3426,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
- 
-  
-   buffers_backend bigint
-  
-  
-   Number of buffers written directly by a backend
-  
- 
-
- 
-  
-   buffers_backend_fsync bigint
-  
-  
-   Number of times a backend had to execute its own
-   fsync call (normally the background writer handles those
-   even when the backend does its own write)
-  
- 
-
- 
-  
-   buffers_alloc bigint
-  
-  
-   Number of buffers allocated
-  
- 
-
  
   
stats_reset timestamp with time zone
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 30280d520b..c45c261f4b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1058,18 +1058,14 @@ CREATE VIEW pg_stat_archiver AS
 s.stats_reset
 FROM pg_stat_get_archiver() s;
 
+-- TODO: make separate pg_stat_checkpointer view
 CREATE VIEW pg_stat_bgwriter AS
 SELECT
 pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
 pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
 pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
 pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
-pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
-pg_stat_get_bgwriter_buf_wri

Re: when the startup process doesn't (logging startup delays)

2021-09-30 Thread Robert Haas
On Thu, Sep 30, 2021 at 3:10 PM Tom Lane  wrote:
> That would be lovely, certainly.  But aren't you moving the goalposts
> rather far?  I don't think we make any promises about such things
> today, so why has the issue suddenly gotten more pressing?

Yeah, perhaps it's best to not to worry about it. I dislike failure to
worry about that case on general principle, but I agree with you that
it seems to be moving the goalposts a disproportionate distance.

> In particular,
> why do you think Nitin's patch is proof against this?  Seems to me it's
> probably got *more* failure cases, not fewer, if the system clock is
> acting funny.

You might be right. I sort of assumed that timeout.c had some defense
against this, but since that seems not to be the case, I suppose no
facility that depends on it can hope to stay out of trouble either.

> On the whole, in these days of NTP, I'm not sure I care to spend
> large amounts of effort on dealing with a bogus system clock.

It's certainly less of an issue than it used to be back in my day.

Any thoughts on the patch I attached?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: 002_types.pl fails on some timezones on windows

2021-09-30 Thread Andres Freund
Hi,

On 2021-09-30 16:31:33 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > It turns out to be dependant on the current timezone. I have just about zero
> > understanding how timezones work on windows, so I can't really interpret why
> > that causes a problem on windows, but apparently not on linux.
> 
> As of 20f8671ef, "TZ=Africa/Casablanca make check-world" passes here,
> so your CI should be okay.  We still oughta fix the Windows
> translation, though.

Indeed, it just passed (after reverting my timezone workaround):
https://cirrus-ci.com/task/5899963000422400?logs=check#L129

It still fails in t/026_overwrite_contrecord.pl though. But that's another
thread.


Thanks!

Andres




Re: Atomic rename feature for Windows.

2021-09-30 Thread Victor Spirin

Thanks.

IsWindowsVersionOrGreater(10,0,1607) always returns false

Only IsWindowsVersionOrGreater(10,0,0) is a valid call. (There are no 
service packs in Windows 10.)


I haven't found a way to determine the Windows 10 release ID.
The RtlGetVersion function returns dwBuildNumber = 19042 on my Windows.

I heard that Microsoft does not support older versions of Windows 10 and 
requires a mandatory update.



Victor Spirin
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

23.09.2021 14:18, Juan José Santamaría Flecha пишет:


On Tue, Sep 7, 2021 at 11:40 PM Victor Spirin > wrote:



I checked the pgrename_windows_posix_semantics() function on
Windows 7.
It returns error 87: Parameter is incorrect. Hence, it is
necessary to
check the Windows version and call the old pgrename function for old
Windows.

The FILE_RENAME_FLAGs are available starting from Windows 10 Release 
id 1607, NTDDI_WIN10_RS1. The check should be using something like 
IsWindowsVersionOrGreater(10, 0, 1607). Or you could test this 
using RtlGetVersion(), loading it from ntdll infrastructure coming 
from stat() patch [1], which doesn't need a manifest.


[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BoLqfBVJ_j3C03QgoshrX1KxYq0LB1vJV0OXPOcZZfhA%40mail.gmail.com#bfcc256e4eda369e369275f5b4e38185 



Regards,

Juan José Santamaría Flecha


Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-30 Thread Robert Haas
On Thu, Sep 30, 2021 at 7:59 AM Amul Sul  wrote:
> To find the value of InRecovery after we clear it, patch still uses
> ControlFile's DBState, but now the check condition changed to a more
> specific one which is less confusing.
>
> In casual off-list discussion, the point was made to check
> SharedRecoveryState to find out the InRecovery value afterward, and
> check that using RecoveryInProgress().  But we can't depend on
> SharedRecoveryState because at the start it gets initialized to
> RECOVERY_STATE_CRASH irrespective of InRecovery that happens later.
> Therefore, we can't use RecoveryInProgress() which always returns
> true if SharedRecoveryState != RECOVERY_STATE_DONE.

Uh, this change has crept into 0002, but it should be in 0004 with the
rest of the changes to remove dependencies on variables specific to
the startup process. Like I said before, we should really be trying to
separate code movement from functional changes. Also, 0002 doesn't
actually apply for me. Did you generate these patches with 'git
format-patch'?

[rhaas pgsql]$ patch -p1 <
~/Downloads/v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch
patching file src/backend/access/transam/xlog.c
Hunk #1 succeeded at 889 (offset 9 lines).
Hunk #2 succeeded at 939 (offset 12 lines).
Hunk #3 succeeded at 5734 (offset 37 lines).
Hunk #4 succeeded at 8038 (offset 70 lines).
Hunk #5 succeeded at 8248 (offset 70 lines).
[rhaas pgsql]$ patch -p1 <
~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
patching file src/backend/access/transam/xlog.c
Reversed (or previously applied) patch detected!  Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 7954.
Hunk #2 succeeded at 8079 (offset 70 lines).
1 out of 2 hunks FAILED -- saving rejects to file
src/backend/access/transam/xlog.c.rej
[rhaas pgsql]$ git reset --hard
HEAD is now at b484ddf4d2 Treat ETIMEDOUT as indicating a
non-recoverable connection failure.
[rhaas pgsql]$ patch -p1 <
~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
patching file src/backend/access/transam/xlog.c
Reversed (or previously applied) patch detected!  Assume -R? [n]
Apply anyway? [n]
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file
src/backend/access/transam/xlog.c.rej

It seems to me that the approach you're pursuing here can't work,
because the long-term goal is to get to a place where, if the system
starts up read-only, XLogAcceptWrites() might not be called until
later, after StartupXLOG() has exited. But in that case the control
file state would be DB_IN_PRODUCTION. But my idea of using
RecoveryInProgress() won't work either, because we set
RECOVERY_STATE_DONE just after we set DB_IN_PRODUCTION. Put
differently, the question we want to answer is not "are we in recovery
now?" but "did we perform recovery?". After studying the code a bit, I
think a good test might be
!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr). If InRecovery
gets set to true, then we're certain to enter the if (InRecovery)
block that contains the main redo loop. And that block unconditionally
does XLogCtl->lastReplayedEndRecPtr = XLogCtl->replayEndRecPtr. I
think that replayEndRecPtr can't be 0 because it's supposed to
represent the record we're pretending to have last replayed, as
explained by the comments. And while lastReplayedEndRecPtr will get
updated later as we replay more records, I think it will never be set
back to 0. It's only going to increase, as we replay more records. On
the other hand if InRecovery = false then we'll never change it, and
it seems that it starts out as 0.

I was hoping to have more time today to comment on 0004, but the day
seems to have gotten away from me. One quick thought is that it looks
a bit strange to be getting EndOfLog from GetLastSegSwitchData() which
returns lastSegSwitchLSN while getting EndOfLogTLI from replayEndTLI
... because there's also replayEndRecPtr, which seems to go with
replayEndTLI. It feels like we should use a source for the TLI that
clearly matches the source for the corresponding LSN, unless there's
some super-good reason to do otherwise.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Atomic rename feature for Windows.

2021-09-30 Thread Victor Spirin

Thank you,

Fixed FILE_RENAME_INFO structure

I prepared 2 versions of the patch:

1) with manifest and IsWindows10OrGreater() function
2) without manifest and RtlGetVersion function from ntdll.dll

What's better?

Victor Spirin
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

23.09.2021 14:46, Thomas Munro пишет:

On Wed, Sep 8, 2021 at 9:40 AM Victor Spirin  wrote:

Is this code better? Maybe there is another correct method?

Hmm, if we want to use the system header's struct definition, add some
space for a path at the end, and avoid heap allocation, perhaps we
could do something like:

struct {
 FILE_RENAME_INFO fri;
 WCHAR extra_space[MAX_PATH];
} x;


diff --git a/src/include/common/checksum_helper.h 
b/src/include/common/checksum_helper.h
index cac7570ea1..2d533c93a6 100644
--- a/src/include/common/checksum_helper.h
+++ b/src/include/common/checksum_helper.h
@@ -26,6 +26,13 @@
  * MD5 here because any new that does need a cryptographically strong checksum
  * should use something better.
  */
+
+ /*
+ * CHECKSUM_TYPE_NONE defined in the winioctl.h when _WIN32_WINNT >= 
_WIN32_WINNT_WIN10
+ */
+#ifdef CHECKSUM_TYPE_NONE
+#undef CHECKSUM_TYPE_NONE
+#endif
 typedef enum pg_checksum_type
 {
CHECKSUM_TYPE_NONE,
diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index d8ae49e22d..d91555f5c0 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -12,12 +12,13 @@
 /*
  * Make sure _WIN32_WINNT has the minimum required value.
  * Leave a higher value in place. When building with at least Visual
- * Studio 2015 the minimum requirement is Windows Vista (0x0600) to
- * get support for GetLocaleInfoEx() with locales. For everything else
+ * Studio 2015 the minimum requirement is Windows 10 (0x0A00) to get support 
for SetFileInformationByHandle.
+ * The minimum requirement is Windows Vista (0x0600) get support for 
GetLocaleInfoEx() with locales.
+ * For everything else
  * the minimum version is Windows XP (0x0501).
  */
 #if defined(_MSC_VER) && _MSC_VER >= 1900
-#define MIN_WINNT 0x0600
+#define MIN_WINNT 0x0A00
 #else
 #define MIN_WINNT 0x0501
 #endif
diff --git a/src/port/dirmod.c b/src/port/dirmod.c
index 763bc5f915..548b450b41 100644
--- a/src/port/dirmod.c
+++ b/src/port/dirmod.c
@@ -39,6 +39,87 @@
 #endif
 #endif
 
+
+#if defined(WIN32) && !defined(__CYGWIN__) && defined(_WIN32_WINNT_WIN10) && 
_WIN32_WINNT >= _WIN32_WINNT_WIN10
+
+#include 
+
+typedef struct _FILE_RENAME_INFO_EXT {
+   FILE_RENAME_INFO fri;
+   WCHAR extra_space[MAX_PATH];
+} FILE_RENAME_INFO_EXT;
+
+
+/*
+ * pgrename_windows_posix_semantics  - uses SetFileInformationByHandle 
function with FILE_RENAME_FLAG_POSIX_SEMANTICS flag for atomic rename file
+ * working only on Windows 10 or later and  _WIN32_WINNT must be >= 
_WIN32_WINNT_WIN10
+ */
+static int
+pgrename_windows_posix_semantics(const char *from, const char *to)
+{
+   int err;
+   FILE_RENAME_INFO_EXT rename_info;
+   PFILE_RENAME_INFO prename_info;
+   HANDLE f_handle;
+   wchar_t from_w[MAX_PATH];
+
+
+   prename_info = (PFILE_RENAME_INFO)&rename_info;
+
+   if (MultiByteToWideChar(CP_ACP, 0, (LPCCH)from, -1, (LPWSTR)from_w, 
MAX_PATH) == 0) {
+   err = GetLastError();
+   _dosmaperr(err);
+   return -1;
+   }
+   if (MultiByteToWideChar(CP_ACP, 0, (LPCCH)to, -1, 
(LPWSTR)prename_info->FileName, MAX_PATH) == 0) {
+   err = GetLastError();
+   _dosmaperr(err);
+   return -1;
+   }
+
+   f_handle = CreateFileW(from_w,
+   GENERIC_READ | GENERIC_WRITE | DELETE,
+   FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
+   NULL,
+   OPEN_EXISTING,
+   FILE_ATTRIBUTE_NORMAL,
+   NULL);
+
+
+   if (f_handle == INVALID_HANDLE_VALUE)
+   {
+   err = GetLastError();
+
+   _dosmaperr(err);
+
+   return -1;
+   }
+
+   
+   prename_info->ReplaceIfExists = TRUE;
+   prename_info->Flags = FILE_RENAME_FLAG_POSIX_SEMANTICS | 
FILE_RENAME_FLAG_REPLACE_IF_EXISTS;
+
+   prename_info->RootDirectory = NULL;
+   prename_info->FileNameLength = wcslen(prename_info->FileName);
+   
+   if (!SetFileInformationByHandle(f_handle, FileRenameInfoEx, 
prename_info, sizeof(FILE_RENAME_INFO_EXT)))
+   {
+   err = GetLastError();
+
+   _dosmaperr(err);
+   CloseHandle(f_handle);
+
+   return -1;
+   }
+
+   CloseHandle(f_handle);
+   return 0;
+
+}
+
+#endif /* #if 
defined(WIN32) && !defined(__CYGWIN__) && defined(_WIN32_WINNT_WIN10) && 
_WIN32_WINNT >= _WIN32_WINNT_WIN10 */
+
+
 #if defined(WIN32) || defined(__CYGWIN__)
 
 /*
@@ -49,6 +130,16 @@ pgrename(const char *from, const char *to)
 {
int loops = 0;
 
+  

Re: 002_types.pl fails on some timezones on windows

2021-09-30 Thread Andres Freund
Hi,

On 2021-09-30 16:03:15 -0400, Tom Lane wrote:
> I wrote:
> > ... sure enough, 002_types.pl
> > falls over with TZ=Africa/Casablanca on my Linux machine, too.
> 
> Independently of whether Africa/Casablanca is a sane translation of
> that Windows zone name, it'd be nice if 002_types.pl weren't so
> sensitive to the prevailing zone.  I looked into exactly why it's
> falling over, and the answer seems to be this bit:

>   (2, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz - 
> interval '2 days', 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz), '{"[2,3]", 
> "[20,30]"}'),
>   (3, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz - 
> interval '3 days', 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz), 
> '{"[3,4]"}'),
>   (4, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz - 
> interval '4 days', 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz), '{"[4,5]", 
> NULL, "[40,50]"}'),
> 
> The problem with this is the blithe assumption that "minus N days"
> is an immutable computation.  It ain't.  As bad luck would have it,
> these intervals all manage to cross a Moroccan DST boundary
> (Ramadan, I assume):

For a minute I was confused, because of course we should still get the same
result on the subscriber as on the publisher. But then I re-re-re-realized
that the comparison data is a constant in the test script...


> What I'm inclined to do about that is get rid of the totally-irrelevant-
> to-this-test interval subtractions, and just write the desired timestamps
> as constants.

Sounds like a plan.

Greetings,

Andres Freund




Re: 002_types.pl fails on some timezones on windows

2021-09-30 Thread Tom Lane
Andres Freund  writes:
> It turns out to be dependant on the current timezone. I have just about zero
> understanding how timezones work on windows, so I can't really interpret why
> that causes a problem on windows, but apparently not on linux.

As of 20f8671ef, "TZ=Africa/Casablanca make check-world" passes here,
so your CI should be okay.  We still oughta fix the Windows
translation, though.

regards, tom lane




Re: prevent immature WAL streaming

2021-09-30 Thread Tom Lane
Alvaro Herrera  writes:
> Hmm.  Well, as I said, maybe this part of the test isn't worth much
> anyway.  Rather than spending time trying to figure out why isn't this
> triggering the WAL overwriting, I compared the coverage report for
> running only the first test to the coverage report of running only the
> second test.  It turns out that there's no relevant coverage increase in
> the second test.  So I propose just removing that part.

Seems reasonable.  We don't need to spend buildfarm cycles forevermore
on a test that's not adding useful coverage.

regards, tom lane




Re: 002_types.pl fails on some timezones on windows

2021-09-30 Thread Thomas Munro
On Fri, Oct 1, 2021 at 8:38 AM Tom Lane  wrote:
> But the option of "Greenwich Daylight Time" suggests that Windows thinks
> this means UK civil time, not UTC.

Yes, it's been a while but IIRC Windows in the UK uses confusing
terminology here even in user interfaces, so that in summer it appears
to be wrong, which is annoying to anyone brought up on Eggert's
system.  The CLDR windowsZones.xml file shows this.




Re: 002_types.pl fails on some timezones on windows

2021-09-30 Thread Andres Freund
Hi,

On 2021-09-30 15:19:30 -0400, Andrew Dunstan wrote:
> On 9/30/21 2:36 PM, Andres Freund wrote:
> > Hi,
> >
> > CI showed me a failure in 002_types.pl on windows. I only just now noticed
> > that because the subscription tests aren't run by any of the vcregress.pl
> > steps :(

> We have windows buildfarm animals running the subscription tests, e.g.
> 
> and they do it by calling vcregress.pl.

The point I was trying to make is that there's no "target" in vcregress.pl for
it. You have to know that you need to call
src/tools/msvc/vcregress.pl taptest src\test\subscription
to run them. Contrasting to recoverycheck or so, which has it's own
vcregress.pl target.

Greetings,

Andres Freund




Re: Skipping logical replication transactions on subscriber side

2021-09-30 Thread Peter Eisentraut

On 30.09.21 07:45, Masahiko Sawada wrote:

I've attached updated patches that incorporate all comments I got so
far. Please review them.


I'm uneasy about the way the xids-to-be-skipped are presented as 
subscriptions options, similar to settings such as "binary".  I see how 
that is convenient, but it's not really the same thing, in how you use 
it, is it?  Even if we share some details internally, I feel that there 
should be a separate syntax somehow.


Also, what happens when you forget to reset the xid after it has passed? 
 Will it get skipped again after wraparound?





Re: prevent immature WAL streaming

2021-09-30 Thread Alvaro Herrera
On 2021-Sep-30, Tom Lane wrote:

> Just when you thought it was safe to go back in the water:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2021-09-29%2022%3A05%3A44
> 
> which is complaining that the (misspelled, BTW)

Ah, the case of the missing juxtaposed consonants.  Les Luthiers have
something to say on that matter.
https://www.youtube.com/watch?v=ptorPqV7D5s

> log message 'sucessfully skipped missing contrecord at' doesn't show
> up.

Hmm.  Well, as I said, maybe this part of the test isn't worth much
anyway.  Rather than spending time trying to figure out why isn't this
triggering the WAL overwriting, I compared the coverage report for
running only the first test to the coverage report of running only the
second test.  It turns out that there's no relevant coverage increase in
the second test.  So I propose just removing that part.

(The reason I added that test in the first place was to try to reproduce
the problem without having to physically unlink a WAL file from the
primary's pg_wal subdir.  But maybe it's just make-work.)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From 759846455920d1886a154b2b159363d353ba0def Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 30 Sep 2021 16:55:17 -0300
Subject: [PATCH] Remove test, fix typo

---
 src/backend/access/transam/xlog.c |   2 +-
 .../recovery/t/026_overwrite_contrecord.pl| 116 +-
 src/test/recovery/t/idiosyncratic_copy|  20 ---
 3 files changed, 3 insertions(+), 135 deletions(-)
 delete mode 100755 src/test/recovery/t/idiosyncratic_copy

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1388afdfb0..f8c714b7b7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10598,7 +10598,7 @@ VerifyOverwriteContrecord(xl_overwrite_contrecord *xlrec, XLogReaderState *state
 			 LSN_FORMAT_ARGS(state->overwrittenRecPtr));
 
 	ereport(LOG,
-			(errmsg("sucessfully skipped missing contrecord at %X/%X, overwritten at %s",
+			(errmsg("successfully skipped missing contrecord at %X/%X, overwritten at %s",
 	LSN_FORMAT_ARGS(xlrec->overwritten_lsn),
 	timestamptz_to_str(xlrec->overwrite_time;
 
diff --git a/src/test/recovery/t/026_overwrite_contrecord.pl b/src/test/recovery/t/026_overwrite_contrecord.pl
index acf9bd08d1..1725a97531 100644
--- a/src/test/recovery/t/026_overwrite_contrecord.pl
+++ b/src/test/recovery/t/026_overwrite_contrecord.pl
@@ -10,7 +10,7 @@ use PostgresNode;
 use TestLib;
 use Test::More;
 
-plan tests => 5;
+plan tests => 3;
 
 # Test: Create a physical replica that's missing the last WAL file,
 # then restart the primary to create a divergent WAL file and observe
@@ -89,120 +89,8 @@ ok($node_standby->safe_psql('postgres', 'select * from foo') eq 'hello',
 my $log = slurp_file($node_standby->logfile);
 like(
 	$log,
-	qr[sucessfully skipped missing contrecord at],
+	qr[successfully skipped missing contrecord at],
 	"found log line in standby");
 
 $node->stop;
 $node_standby->stop;
-
-
-# Second test: a standby that receives WAL via archive/restore commands.
-$node = PostgresNode->new('primary2');
-$node->init(
-	has_archiving => 1,
-	extra => ['--wal-segsize=1']);
-$node->set_replication_conf;
-
-# Note: consistent use of forward slashes here avoids any escaping problems
-# that arise from use of backslashes. That means we need to double-quote all
-# the paths in the archive_command
-my $perlbin = TestLib::perl2host($^X);
-$perlbin =~ s!\\!/!g if $TestLib::windows_os;
-my $archivedir = $node->archive_dir;
-$archivedir =~ s!\\!/!g if $TestLib::windows_os;
-$node->append_conf(
-	'postgresql.conf',
-	qq{
-archive_command = '"$perlbin" "$FindBin::RealBin/idiosyncratic_copy" "%p" "$archivedir/%f"'
-wal_level = replica
-max_wal_senders = 2
-wal_keep_size = 1GB
-});
-# Make sure that Msys perl doesn't complain about difficulty in setting locale
-# when called from the archive_command.
-local $ENV{PERL_BADLANG} = 0;
-$node->start;
-$node->backup('backup');
-
-$node_standby = PostgresNode->new('standby2');
-$node_standby->init_from_backup($node, 'backup', has_restoring => 1);
-
-$node_standby->start;
-
-$node->safe_psql('postgres', 'create table filler (a int)');
-# First, measure how many bytes does the insertion of 1000 rows produce
-$start_lsn =
-  $node->safe_psql('postgres', q{select pg_current_wal_insert_lsn() - '0/0'});
-$node->safe_psql('postgres',
-	'insert into filler select * from generate_series(1, 1000)');
-$end_lsn =
-  $node->safe_psql('postgres', q{select pg_current_wal_insert_lsn() - '0/0'});
-$rows_walsize = $end_lsn - $start_lsn;
-
-# Now consume all remaining room in the current WAL segment, leaving
-# space enough only for the start of a largish record.
-$node->safe_psql(
-	'postgres', qq{
-WITH setting AS (
-  SELECT setting::int AS wal_segsize
-FROM pg_settings WHERE name = 'wal_segment_size'
-)
-INSERT INTO filler
-SE

Re: 002_types.pl fails on some timezones on windows

2021-09-30 Thread Tom Lane
I wrote:
> ... sure enough, 002_types.pl
> falls over with TZ=Africa/Casablanca on my Linux machine, too.

Independently of whether Africa/Casablanca is a sane translation of
that Windows zone name, it'd be nice if 002_types.pl weren't so
sensitive to the prevailing zone.  I looked into exactly why it's
falling over, and the answer seems to be this bit:

(2, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz - 
interval '2 days', 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz), '{"[2,3]", 
"[20,30]"}'),
(3, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz - 
interval '3 days', 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz), '{"[3,4]"}'),
(4, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz - 
interval '4 days', 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz), '{"[4,5]", 
NULL, "[40,50]"}'),

The problem with this is the blithe assumption that "minus N days"
is an immutable computation.  It ain't.  As bad luck would have it,
these intervals all manage to cross a Moroccan DST boundary
(Ramadan, I assume):

RuleMorocco 2014only-   Jun 28   3:00   0   -
RuleMorocco 2014only-   Aug  2   2:00   1:00-

Thus, in GMT or most other zones, we get 24-hour-spaced times of day for
these calculations:

regression=# set timezone to 'GMT';
SET
regression=# select n, 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz - n * 
interval '1 day' from generate_series(0,4) n;
 n |?column?
---+
 0 | 2014-08-03 22:00:00+00
 1 | 2014-08-02 22:00:00+00
 2 | 2014-08-01 22:00:00+00
 3 | 2014-07-31 22:00:00+00
 4 | 2014-07-30 22:00:00+00
(5 rows)

but not so much in Morocco:

regression=# set timezone to 'Africa/Casablanca';
SET
regression=# select n, 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz - n * 
interval '1 day' from generate_series(0,4) n;
 n |?column?
---+
 0 | 2014-08-03 23:00:00+01
 1 | 2014-08-02 23:00:00+01
 2 | 2014-08-01 23:00:00+00
 3 | 2014-07-31 23:00:00+00
 4 | 2014-07-30 23:00:00+00
(5 rows)

What I'm inclined to do about that is get rid of the totally-irrelevant-
to-this-test interval subtractions, and just write the desired timestamps
as constants.

regards, tom lane




Re: 002_types.pl fails on some timezones on windows

2021-09-30 Thread Tom Lane
Andrew Dunstan  writes:
> On 9/30/21 2:36 PM, Andres Freund wrote:
>> CI showed me a failure in 002_types.pl on windows. I only just now noticed
>> that because the subscription tests aren't run by any of the vcregress.pl
>> steps :(

> We have windows buildfarm animals running the subscription tests, e.g.
> 
> and they do it by calling vcregress.pl.

But are they running with the prevailing zone set to "Greenwich Standard
Time"?

I dug around to see exactly how we handle that, and was somewhat
gobsmacked to find this mapping in findtimezone.c:

/* (UTC+00:00) Monrovia, Reykjavik */
"Greenwich Standard Time", "Greenwich Daylight Time",
"Africa/Casablanca"

According to current tzdb,

# Zone  NAMESTDOFF  RULES   FORMAT  [UNTIL]
Zone Africa/Casablanca  -0:30:20 -  LMT 1913 Oct 26
 0:00   Morocco +00/+01 1984 Mar 16
 1:00   -   +01 1986
 0:00   Morocco +00/+01 2018 Oct 28  3:00
 1:00   Morocco +01/+00

Morocco has had weird changes-every-year DST rules since 2008, which'd
go a long way towards explaining funny behavior with this zone, even
without the "reverse DST" since 2018.  And sure enough, 002_types.pl
falls over with TZ=Africa/Casablanca on my Linux machine, too.

I'm inclined to think we ought to be translating that zone name to
Europe/London instead.  Or maybe we should translate to straight-up UTC?
But the option of "Greenwich Daylight Time" suggests that Windows thinks
this means UK civil time, not UTC.

I wonder if findtimezone.c has any other surprising Windows mappings.
I've never dug through that list particularly.

regards, tom lane




Re: 002_types.pl fails on some timezones on windows

2021-09-30 Thread Andrew Dunstan


On 9/30/21 2:36 PM, Andres Freund wrote:
> Hi,
>
> CI showed me a failure in 002_types.pl on windows. I only just now noticed
> that because the subscription tests aren't run by any of the vcregress.pl
> steps :(



We have windows buildfarm animals running the subscription tests, e.g.

and they do it by calling vcregress.pl.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: 002_types.pl fails on some timezones on windows

2021-09-30 Thread Tom Lane
Andres Freund  writes:
> It turns out to be dependant on the current timezone. I have just about zero
> understanding how timezones work on windows, so I can't really interpret why
> that causes a problem on windows, but apparently not on linux.

Weird.  Unless you're using --with-system-tzdata, I wouldn't expect that
code to work any differently on Windows.

regards, tom lane




Re: when the startup process doesn't (logging startup delays)

2021-09-30 Thread Tom Lane
Robert Haas  writes:
> ...  When I say I want my handler to be
> fired in 3 s, I don't mean that I want it to be fired when the system
> time is 3 seconds greater than it is right now. I mean I want it to be
> fired in 3 actual seconds, regardless of what dumb thing the system
> clock may choose to do.

That would be lovely, certainly.  But aren't you moving the goalposts
rather far?  I don't think we make any promises about such things
today, so why has the issue suddenly gotten more pressing?  In particular,
why do you think Nitin's patch is proof against this?  Seems to me it's
probably got *more* failure cases, not fewer, if the system clock is
acting funny.

BTW, one could imagine addressing this concern by having timeout.c work
with CLOCK_MONOTONIC instead of the regular wall clock.  But I fear
we'd have to drop enable_timeout_at(), for lack of ability to translate
between CLOCK_MONOTONIC timestamps and those used by anybody else.
Also get_timeout_start_time/get_timeout_finish_time would become
problematic.  Maybe we only really care about deltas, so the more
restrictive API would be workable, but it seems like a nontrivial
amount of work.

On the whole, in these days of NTP, I'm not sure I care to spend
large amounts of effort on dealing with a bogus system clock.

regards, tom lane




Re: pgsql: Document XLOG_INCLUDE_XID a little better

2021-09-30 Thread Robert Haas
On Thu, Sep 30, 2021 at 6:08 AM Amit Kapila  wrote:
> I think we can do better than using XLOG_INCLUDE_XID flag in the
> record being inserted. We need this flag so that we can mark
> SubTransaction assigned after XLogInsertRecord()  is successful. We
> can instead output a flag (say sub_xact_assigned) from
> XLogRecordAssemble() and pass it to XLogInsertRecord(). Then in
> XLogInsertRecord(), we can mark SubTransactionAssigned once the record
> is inserted (after or before calling
> MarkCurrentTransactionIdLoggedIfAny()).

Isn't there other communication between these routines that just uses
global variables?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: prevent immature WAL streaming

2021-09-30 Thread Tom Lane
Just when you thought it was safe to go back in the water:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2021-09-29%2022%3A05%3A44

which is complaining that the (misspelled, BTW) log message
'sucessfully skipped missing contrecord at' doesn't show up.

This machine is old, slow, and 32-bit bigendian.  I first thought
the problem might be "didn't wait long enough", but it seems like
waiting for replay ought to be sufficient.  What I'm now guessing
is that the test case is making unwarranted assumptions about how
much WAL will be generated, such that no page-crossing contrecord
actually appears.

Also, digging around, I see hornet showed the same problem:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-09-29%2018%3A19%3A55

hornet is 64-bit bigendian ... so maybe this actually reduces to
an endianness question?

regards, tom lane




Re: when the startup process doesn't (logging startup delays)

2021-09-30 Thread Robert Haas
On Wed, Sep 29, 2021 at 5:12 PM Tom Lane  wrote:
> I didn't claim there are any other places that could use the feature
> *today*.  But once we've got one, it seems like there could be more
> tomorrow.  In any case, I dislike keeping timeout state data outside
> timeout.c, because it's so likely to get out-of-sync that way.

Well, I had a quick go at implementing this (attached).

It seems to do a satisfactory job preventing drift over time, but it
doesn't behave nicely if you set the system clock backward. With a bit
of extra debugging output:

2021-09-30 14:23:50.291 EDT [2279] LOG:  scheduling wakeup in 2 secs,
998727 usecs
2021-09-30 14:23:53.291 EDT [2279] LOG:  scheduling wakeup in 2 secs,
998730 usecs
2021-09-30 14:23:56.291 EDT [2279] LOG:  scheduling wakeup in 2 secs,
998728 usecs
2021-09-30 14:20:01.154 EDT [2279] LOG:  scheduling wakeup in 238
secs, 135532 usecs
2021-09-30 14:23:59.294 EDT [2279] LOG:  scheduling wakeup in 2 secs, 995922 use

The issue here is that fin_time is really the driving force behind
everything timeout.c does. In particular, it determines the order of
active_timeouts[]. And that's not really correct either for
enable_timeout_after(), or for the new function I added in this draft
patch, enable_timeout_every(). When I say I want my handler to be
fired in 3 s, I don't mean that I want it to be fired when the system
time is 3 seconds greater than it is right now. I mean I want it to be
fired in 3 actual seconds, regardless of what dumb thing the system
clock may choose to do. I don't really think that precise behavior can
be implemented, but ideally if a timeout that was supposed to happen
after 3 s is now scheduled for a time that is more than 3 seconds
beyond the current value of the system clock, we'd move the firing
time backwards to 3 seconds beyond the current system clock. That way,
if you set the system time backward by 4 minutes, you might see a few
seconds of delay before the next firing, but you wouldn't go into the
tank for 4 minutes.

I don't see an obvious way of making timeout.c behave like that
without major surgery, though. If nobody else does either, then we
could (1) stick with something closer to Nitin's current patch, which
tries to handle this concern outside of timeout.c, (2) adopt something
like the attached 0001 and leave the question of improved behavior in
case of backwards system clock adjustments for another day, or (3)
undertake to rewrite timeout.c as a precondition of being able to log
messages about why startup is slow. I'm not a huge fan of (3) but I'm
open to suggestions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0002-Quick-testing-hack.patch
Description: Binary data


0001-Add-enable_timeout_every-to-fire-the-same-timeout-re.patch
Description: Binary data


002_types.pl fails on some timezones on windows

2021-09-30 Thread Andres Freund
Hi,

CI showed me a failure in 002_types.pl on windows. I only just now noticed
that because the subscription tests aren't run by any of the vcregress.pl
steps :(

It turns out to be dependant on the current timezone. I have just about zero
understanding how timezones work on windows, so I can't really interpret why
that causes a problem on windows, but apparently not on linux.

The CI instance not unreasonably runs with the timezone set to GMT. With that
the tests fail. If I set it to PST, they work. For the detailed (way too long)
output see [1]. The relevant excerpt:

tzutil /s "Pacific Standard Time"
...
timeout -k60s 30m perl src/tools/msvc/vcregress.pl taptest 
.\src\test\subscription\   || true
t/002_types.pl . ok
..

tzutil /s "Greenwich Standard Time"
timeout -k60s 30m perl src/tools/msvc/vcregress.pl taptest 
.\src\test\subscription\   || true
..
#   Failed test 'check replicated inserts on subscriber'
#   at t/002_types.pl line 278.
#  got: '1|{1,2,3}
...
# 5|[5,51)
# 1|["2014-08-04 00:00:00+02",infinity)|{"[1,3)","[10,21)"}
# 2|["2014-08-02 01:00:00+02","2014-08-04 00:00:00+02")|{"[2,4)","[20,31)"}
# 3|["2014-08-01 01:00:00+02","2014-08-04 00:00:00+02")|{"[3,5)"}
# 4|["2014-07-31 01:00:00+02","2014-08-04 00:00:00+02")|{"[4,6)",NULL,"[40,51)"}
...
# expected: '1|{1,2,3}
...
# 1|["2014-08-04 00:00:00+02",infinity)|{"[1,3)","[10,21)"}
# 2|["2014-08-02 00:00:00+02","2014-08-04 00:00:00+02")|{"[2,4)","[20,31)"}
# 3|["2014-08-01 00:00:00+02","2014-08-04 00:00:00+02")|{"[3,5)"}
# 4|["2014-07-31 00:00:00+02","2014-08-04 00:00:00+02")|{"[4,6)",NULL,"[40,51)"}
...

Greetings,

Andres Freund

[1] https://api.cirrus-ci.com/v1/task/5800120848482304/logs/check_tz_sub.log




Re: Empty string in lexeme for tsvector

2021-09-30 Thread Jean-Christophe Arnu
Thank you Tom for your review.

Le mer. 29 sept. 2021 à 21:36, Tom Lane  a écrit :

> Jean-Christophe Arnu  writes:
> > [ empty_string_in_tsvector_v4.patch ]
>
> I looked through this patch a bit.  I don't agree with adding
> these new error conditions to tsvector_setweight_by_filter and
> tsvector_delete_arr.  Those don't prevent bad lexemes from being
> added to tsvectors, so AFAICS they can have no effect other than
> breaking existing applications.  In fact, tsvector_delete_arr is
> one thing you could use to fix existing bad tsvectors, so making
> it throw an error seems actually counterproductive.
>

Agreed.
The new patch included here does not change tsvector_setweight_by_filter()
anymore. Tests and docs are also upgraded.
This patch is not ready yet.


> (By the same token, I think there's a good argument for
> tsvector_delete_arr to just ignore nulls, not throw an error.
> That's a somewhat orthogonal issue, though.)
>

Nulls are now ignored in tsvector_delete_arr().


> What I'm wondering about more than that is whether array_to_tsvector
> is the only place that can inject an empty lexeme ... don't we have
> anything else that can add lexemes without going through the parser?
>

I crawled the docs [1] in order to check each tsvector output from
functions and
operators :

 * The only fonctions left that may lead to empty strings seem
both json_to_tsvector() and jsonb_to_tsvector(). Both functions use
parsetext
(in ts_parse.c) that seems to behave correctly and don't create "empty
string".
 * concat operator "||' allows to compute a ts_vector containing "empty
string" if
   one of its operands contains itself an empty string tsvector. This seems
perfectly
   correct from the operator point of view... Should we change this
behaviour to
   filter out empty strings ?

I also wonder if we should not also consider changing COPY FROM  behaviour
on empty string lexemes.
Current version is just crashing on empty string lexemes. Should
we allow them  anyway as COPY FROM input (it seems weird not to be able
to re-import dumped data) or "skipping them" just like array_to_tsvector()
does in the patched version (that may lead to database content changes) or
finally should we not change COPY behaviour ?

I admit this is a tricky bunch of questions I'm too rookie to answer.

[1] https://www.postgresql.org/docs/14/functions-textsearch.html

-- 
Jean-Christophe Arnu
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 78812b2dbe..74e25db03c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12896,7 +12896,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple


 Converts an array of lexemes to a tsvector.
-The given strings are used as-is without further processing.
+The given strings are used as-is. Some checks are performed
+on array elements. Empty strings and NULL values
+will cause an error to be raised.


 array_to_tsvector('{fat,cat,rat}'::text[])
@@ -13079,6 +13081,8 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 Assigns the specified weight to elements
 of the vector that are listed
 in lexemes.
+Some checks are performed on lexemes.
+NULL values will cause an error to be raised.


 setweight('fat:2,4 cat:3 rat:5,6B'::tsvector, 'A', '{cat,rat}')
@@ -13256,6 +13260,8 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 Removes any occurrences of the lexemes
 in lexemes
 from the vector.
+NULL values in lexemes
+will be ignored.


 ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat'])
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 9236ebcc8f..e50d8a84e2 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -602,10 +602,9 @@ tsvector_delete_arr(PG_FUNCTION_ARGS)
 		int			lex_len,
 	lex_pos;
 
+		// Ignore null values
 		if (nulls[i])
-			ereport(ERROR,
-	(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
-	 errmsg("lexeme array may not contain nulls")));
+			continue;
 
 		lex = VARDATA(dlexemes[i]);
 		lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ;
@@ -761,13 +760,18 @@ array_to_tsvector(PG_FUNCTION_ARGS)
 
 	deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, &dlexemes, &nulls, &nitems);
 
-	/* Reject nulls (maybe we should just ignore them, instead?) */
+	/* Reject nulls and zero length strings (maybe we should just ignore them, instead?) */
 	for (i = 0; i < nitems; i++)
 	{
 		if (nulls[i])
 			ereport(ERROR,
 	(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 	 errmsg("lexeme array may not contain nulls")));
+
+		if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+	 errmsg("lexeme array may not co

Re: FETCH FIRST clause PERCENT option

2021-09-30 Thread Jaime Casanova
On Tue, Jan 26, 2021 at 07:29:11PM +0300, Surafel Temesgen wrote:
> On Mon, Jan 25, 2021 at 2:39 PM Kyotaro Horiguchi 
> wrote:
> 
> > Sorry for the dealy. I started to look this.
> >
> >
> Hi kyotaro,
> Thanks for looking into this but did we agree to proceed
> on this approach? I fear that it will be the west of effort if
> Andrew comes up with the patch for his approach.
> Andrew Gierth: Are you working on your approach?
> 

Hi Surafel,

I'm marking this one as "returned with feedback". Then you or Andrew can
submit a new version of the patch for the next CF.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: [BUG] failed assertion in EnsurePortalSnapshotExists()

2021-09-30 Thread Tom Lane
"Drouvot, Bertrand"  writes:
> On 9/30/21 7:16 PM, Tom Lane wrote:
>> PS: Memo to self: in the back branches, the new field has to be
>> added at the end of struct Portal.

> out of curiosity, why?

Sticking it into the middle would create an ABI break for any
extension code that's looking at struct Portal, due to movement
of existing field offsets.  In HEAD that's fine, so we should
put the field where it makes the most sense.  But we have to
be careful about changing globally-visible structs in released
branches.

regards, tom lane




Re: [EXTERNAL] Re: Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS

2021-09-30 Thread Tom Lane
Jelte Fennema  writes:
> Attached is a new patch that I think addresses your concerns.

You missed TranslateSocketError ...

Pushed to HEAD only with that fix.

regards, tom lane




Re: Document spaces in .pgpass need to be escaped

2021-09-30 Thread James Coleman
On Wed, Sep 29, 2021 at 12:13 PM Tom Lane  wrote:
>
> James Coleman  writes:
> > A coworker has a space in a Postgres password and noticed .pgpass
> > didn't work; escaping it fixed the issue. That requirement wasn't
> > documented (despite other escaping requirements being documented), so
> > I've attached a patch to add that comment.
>
> I looked at passwordFromFile() and I don't see any indication that
> it treats spaces specially.  Nor does a quick test here confirm
> this report.  So I'm pretty certain that this proposed doc change
> is wrong.  Perhaps there's some other issue to investigate, though?
>
> regards, tom lane

Thanks for taking a look.

I'm honestly not sure what happened here. I couldn't reproduce again
either, and on another box with this coworker we could reproduce, but
then realized the pgpass entry was missing a field. I imagine it must
have been similar on the original box we observed the error on, but
both of our memories were of just adding teh escape characters...

I'll mark the CF entry as withdrawn.

Thanks,
James Coleman




Re: Numeric x^y for negative x

2021-09-30 Thread Jaime Casanova
On Mon, Sep 13, 2021 at 07:29:13PM +0100, Dean Rasheed wrote:
> On Mon, 13 Sept 2021 at 17:51, Alvaro Herrera  wrote:
> >
> > I came here just to opine that there should be a comment about there not
> > being a clamp to the maximum scale.  For example, log_var says "Set the
> > scales .. so that they each have more digits ..." which seems clear
> > enough; I think the new comment is a bit on the short side.
> >
> 
> OK, that's a fair point. Updated version attached.
> 

Hi Dean,

Are you planning to commit this soon?

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: [BUG] failed assertion in EnsurePortalSnapshotExists()

2021-09-30 Thread Tom Lane
"Drouvot, Bertrand"  writes:
> [ v2-0003-EnsurePortalSnapshotExists-failed-assertion.patch ]

Looking through this, I think you were overenthusiastic about applying
PushActiveSnapshotWithLevel.  We don't really need to use it except in
the places where we're setting portalSnapshot, because other than those
cases we don't have a risk of portalSnapshot becoming a dangling pointer.
Also, I'm quite nervous about applying PushActiveSnapshotWithLevel to
snapshots that aren't created by the portal machinery itself, because
we don't know very much about where passed-in snapshots came from or
what the caller thinks their lifespan is.

The attached revision therefore backs off to only using the new code
in the two places where we really need it.  I made a number of
more-cosmetic revisions too.  Notably, I think it's useful to frame
the testing shortcoming as "we were not testing COMMIT/ROLLBACK
inside a plpgsql exception block".  So I moved the test code to the
plpgsql tests and made it check ROLLBACK too.

regards, tom lane

PS: Memo to self: in the back branches, the new field has to be
added at the end of struct Portal.

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 6597ec45a9..4cc38f0d85 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -4863,6 +4863,7 @@ CommitSubTransaction(void)
 	AfterTriggerEndSubXact(true);
 	AtSubCommit_Portals(s->subTransactionId,
 		s->parent->subTransactionId,
+		s->parent->nestingLevel,
 		s->parent->curTransactionOwner);
 	AtEOSubXact_LargeObject(true, s->subTransactionId,
 			s->parent->subTransactionId);
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index b5797042ab..960f3fadce 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -480,7 +480,9 @@ PortalStart(Portal portal, ParamListInfo params,
  * We could remember the snapshot in portal->portalSnapshot,
  * but presently there seems no need to, as this code path
  * cannot be used for non-atomic execution.  Hence there can't
- * be any commit/abort that might destroy the snapshot.
+ * be any commit/abort that might destroy the snapshot.  Since
+ * we don't do that, there's also no need to force a
+ * non-default nesting level for the snapshot.
  */
 
 /*
@@ -1136,9 +1138,15 @@ PortalRunUtility(Portal portal, PlannedStmt *pstmt,
 			snapshot = RegisterSnapshot(snapshot);
 			portal->holdSnapshot = snapshot;
 		}
-		/* In any case, make the snapshot active and remember it in portal */
-		PushActiveSnapshot(snapshot);
-		/* PushActiveSnapshot might have copied the snapshot */
+
+		/*
+		 * In any case, make the snapshot active and remember it in portal.
+		 * Because the portal now references the snapshot, we must tell
+		 * snapmgr.c that the snapshot belongs to the portal's transaction
+		 * level, else we risk portalSnapshot becoming a dangling pointer.
+		 */
+		PushActiveSnapshotWithLevel(snapshot, portal->createLevel);
+		/* PushActiveSnapshotWithLevel might have copied the snapshot */
 		portal->portalSnapshot = GetActiveSnapshot();
 	}
 	else
@@ -1784,8 +1792,13 @@ EnsurePortalSnapshotExists(void)
 		elog(ERROR, "cannot execute SQL without an outer snapshot or portal");
 	Assert(portal->portalSnapshot == NULL);
 
-	/* Create a new snapshot and make it active */
-	PushActiveSnapshot(GetTransactionSnapshot());
-	/* PushActiveSnapshot might have copied the snapshot */
+	/*
+	 * Create a new snapshot, make it active, and remember it in portal.
+	 * Because the portal now references the snapshot, we must tell snapmgr.c
+	 * that the snapshot belongs to the portal's transaction level, else we
+	 * risk portalSnapshot becoming a dangling pointer.
+	 */
+	PushActiveSnapshotWithLevel(GetTransactionSnapshot(), portal->createLevel);
+	/* PushActiveSnapshotWithLevel might have copied the snapshot */
 	portal->portalSnapshot = GetActiveSnapshot();
 }
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 5c30e141f5..58674d611d 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -210,6 +210,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
 	portal->cleanup = PortalCleanup;
 	portal->createSubid = GetCurrentSubTransactionId();
 	portal->activeSubid = portal->createSubid;
+	portal->createLevel = GetCurrentTransactionNestLevel();
 	portal->strategy = PORTAL_MULTI_QUERY;
 	portal->cursorOptions = CURSOR_OPT_NO_SCROLL;
 	portal->atStart = true;
@@ -657,6 +658,7 @@ HoldPortal(Portal portal)
 	 */
 	portal->createSubid = InvalidSubTransactionId;
 	portal->activeSubid = InvalidSubTransactionId;
+	portal->createLevel = 0;
 }
 
 /*
@@ -940,6 +942,7 @@ PortalErrorCleanup(void)
 void
 AtSubCommit_Portals(SubTransactionId mySubid,
 	SubTransactionId parentSubid,
+	int parentLevel,
 	ResourceOwner parentXactOwner)
 {
 	HASH_SEQ_STATUS status;
@@ -954,

Re: Diagnostic comment in LogicalIncreaseXminForSlot

2021-09-30 Thread Jaime Casanova
On Tue, Sep 07, 2021 at 11:14:23AM +0530, Amit Kapila wrote:
> On Mon, Sep 6, 2021 at 5:29 PM Ashutosh Bapat
>  wrote:
> >
> > Yeah, I agree. Sorry for missing that.
> >
> > The updated patch looks good to me.
> >
> 
> Pushed!
> 

This patch is still on "Needs review"!
Should we change it to Committed or is expected something else 
about it?

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: prevent immature WAL streaming

2021-09-30 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Sep-30, Tom Lane wrote:
>> There's still the issue of these tests overriding postgresql.conf
>> entries made by init(), but maybe we can live with that?

> I vote to at least have has_archiving=>1 set wal_level=replica, and
> potentially max_wal_senders=2 too (allows pg_basebackup).

I think this requires a bit more investigation.  I looked quickly
through the pre-existing tests that set has_archiving=>1, and every
single one of them also sets allows_streaming=>1, which no doubt
explains why this issue hasn't come up before.  So now I'm wondering
if any of those other tests is setting allows_streaming only because
of this configuration issue.

More to the point, since we've not previously used the combination
of has_archiving without allows_streaming, I wonder exactly how
we want to define it to work.  I'm not really convinced that it
should be defined as "allows basebackup even though replication
is supposed to be off".

Perhaps a compromise could be to invent a third option
"allows_basebackup", so that init() actually knows what's going on?

regards, tom lane




Re: Support for NSS as a libpq TLS backend

2021-09-30 Thread Jacob Champion
On Thu, 2021-09-30 at 14:17 +0200, Daniel Gustafsson wrote:
> The libpq libnss implementation doesn't call either of these, and neither does
> libnss.

I thought the refs check only searched for direct symbol dependencies;
is that piece of NSPR being statically included somehow?

> I'm not entirely sure what to do here, it clearly requires an exception in the
> Makefile check of sorts if we deem we can live with this.
> 
> @Jacob: how did you configure your copy of NSPR?

I use the Ubuntu 20.04 builtin (NSPR 4.25.0), but it looks like the
reason I haven't been seeing this is because I've always used --enable-
coverage. If I take that out, I see the same exit check failure.

--Jacob


Re: non-superusers are allowed to drop the replication user, but are not allowed to alter or even create them, is that ok?

2021-09-30 Thread Mark Dilger



> On Sep 30, 2021, at 3:07 AM, Ashutosh Sharma  wrote:
> 
> While working on one of the internal projects I noticed that currently in 
> Postgres, we do not allow normal users to alter attributes of the replication 
> user. However we do allow normal users to drop replication users or to even 
> rename it using the alter command. Is that behaviour ok? If yes, can someone 
> please help me understand how and why this is okay.

The definition of CREATEROLE is a bit of a mess.  Part of the problem is that 
roles do not have owners, which makes the permissions to drop roles work 
differently than for other object types.  I have a patch pending [1] for the 
version 15 development cycle that fixes this and other problems.  I'd 
appreciate feedback on the design and whether it addresses your concerns.

[1] https://commitfest.postgresql.org/34/3223/

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_stat_statements and "IN" conditions

2021-09-30 Thread Dmitry Dolgov
> On Thu, Sep 30, 2021 at 08:03:16AM -0700, Zhihong Yu wrote:
> On Thu, Sep 30, 2021 at 6:49 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > >On Wed, Jun 16, 2021 at 04:02:12PM +0200, Dmitry Dolgov wrote:
> > >
> > > > I've prepared a new rebased version to deal with the new way of
> > > > computing query id, but as always there is one tricky part. From what I
> > > > understand, now an external module can provide custom implementation
> > for
> > > > query id computation algorithm. It seems natural to think this
> > machinery
> > > > could be used instead of patch in the thread, i.e. one could create a
> > > > custom logic that will enable constants collapsing as needed, so that
> > > > same queries with different number of constants in an array will be
> > > > hashed into the same record.
> > > >
> > > > But there is a limitation in how such queries will be normalized
> > > > afterwards — to reduce level of surprise it's necessary to display the
> > > > fact that a certain query in fact had more constants that are showed in
> > > > pgss record. Ideally LocationLen needs to carry some bits of
> > information
> > > > on what exactly could be skipped, and generate_normalized_query needs
> > to
> > > > understand that, both are not reachable for an external module with
> > > > custom query id logic (without replicating significant part of the
> > > > existing code). Hence, a new version of the patch.
> > >
> > > Forgot to mention a couple of people who already reviewed the patch.
> >
> > And now for something completely different, here is a new patch version.
> > It contains a small fix for one problem we've found during testing (one
> > path code was incorrectly assuming find_const_walker results).
> >
> Hi,
>
> bq. and at position further that specified threshold.
>
>  that specified threshold -> than specified threshold

You mean in the patch commit message, nowhere else, right? Yep, my spell
checker didn't catch that, thanks for noticing!




Re: pg_stat_statements and "IN" conditions

2021-09-30 Thread Zhihong Yu
On Thu, Sep 30, 2021 at 6:49 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> >On Wed, Jun 16, 2021 at 04:02:12PM +0200, Dmitry Dolgov wrote:
> >
> > > I've prepared a new rebased version to deal with the new way of
> > > computing query id, but as always there is one tricky part. From what I
> > > understand, now an external module can provide custom implementation
> for
> > > query id computation algorithm. It seems natural to think this
> machinery
> > > could be used instead of patch in the thread, i.e. one could create a
> > > custom logic that will enable constants collapsing as needed, so that
> > > same queries with different number of constants in an array will be
> > > hashed into the same record.
> > >
> > > But there is a limitation in how such queries will be normalized
> > > afterwards — to reduce level of surprise it's necessary to display the
> > > fact that a certain query in fact had more constants that are showed in
> > > pgss record. Ideally LocationLen needs to carry some bits of
> information
> > > on what exactly could be skipped, and generate_normalized_query needs
> to
> > > understand that, both are not reachable for an external module with
> > > custom query id logic (without replicating significant part of the
> > > existing code). Hence, a new version of the patch.
> >
> > Forgot to mention a couple of people who already reviewed the patch.
>
> And now for something completely different, here is a new patch version.
> It contains a small fix for one problem we've found during testing (one
> path code was incorrectly assuming find_const_walker results).
>
Hi,

bq. and at position further that specified threshold.

 that specified threshold -> than specified threshold

Cheers


Re: [EXTERNAL] Re: Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS

2021-09-30 Thread Jelte Fennema
Attached is a new patch that I think addresses your concerns.

From: Tom Lane 
Sent: Thursday, September 30, 2021 16:04
To: Jelte Fennema 
Cc: pgsql-hack...@postgresql.org 
Subject: [EXTERNAL] Re: Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS

Jelte Fennema  writes:
> Previously successfully opened TCP connections can still fail on reads
> with ETIMEDOUT. This should be considered a connection failure, so that
> the connection in libpq is marked as CONNECTION_BAD. The reason I got an
> ETIMEDOUT was, because I had set a low tcp_user_timeout in the
> connection string. However, it can probably also happen due to
> keepalive limits being reached.

I'm dubious about the portability of this patch, because we don't
use ETIMEDOUT elsewhere.  strerror.c thinks it may not exist,
which is probably overly conservative because POSIX has required
it since SUSv2.  The bigger problem is that it's not accounted for in
the WSAxxx mapping done in port/win32_port.h and TranslateSocketError.
That'd have to be fixed for this to behave reasonably on Windows,
I think.

regards, tom lane


0001-Consider-ETIMEDOUT-a-connection-failure.patch
Description: 0001-Consider-ETIMEDOUT-a-connection-failure.patch


Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

2021-09-30 Thread Zhihong Yu
On Thu, Sep 30, 2021 at 7:45 AM Jelte Fennema 
wrote:

> The new connection made by PQcancel does not use the tcp_user_timeout,
> connect_timeout or any of the keepalive settings that are provided in the
> connection string. This means that a call to PQcancel can block for a much
> longer time than intended if there are network issues. This can be
> especially impactful, because PQcancel is a blocking function an there is
> no non blocking version of it.
>
> I attached a proposed patch to use the tcp_user_timeout from the
> connection string when connecting to Postgres in PQcancel. This resolves
> the issue for me, since this will make connecting timeout after a
> configurable time. So the other options are not strictly needed. It might
> still be nice for completeness to support them too though. I didn't do this
> yet, because I first wanted some feedback and also because implementing
> connect_timeout would require using non blocking TCP to connect and then
> use select to have a timeout.


Hi,

int be_key; /* key of backend --- needed for cancels */
+   int pgtcp_user_timeout; /* tcp user timeout */

The other field names are quite short. How about naming the field
tcp_timeout ?

Cheers


PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

2021-09-30 Thread Jelte Fennema
The new connection made by PQcancel does not use the tcp_user_timeout, 
connect_timeout or any of the keepalive settings that are provided in the 
connection string. This means that a call to PQcancel can block for a much 
longer time than intended if there are network issues. This can be especially 
impactful, because PQcancel is a blocking function an there is no non blocking 
version of it. 

I attached a proposed patch to use the tcp_user_timeout from the connection 
string when connecting to Postgres in PQcancel. This resolves the issue for me, 
since this will make connecting timeout after a configurable time. So the other 
options are not strictly needed. It might still be nice for completeness to 
support them too though. I didn't do this yet, because I first wanted some 
feedback and also because implementing connect_timeout would require using non 
blocking TCP to connect and then use select to have a timeout.

0001-Use-tcp_user_timeout-in-PQcancel.patch
Description: 0001-Use-tcp_user_timeout-in-PQcancel.patch


Re: prevent immature WAL streaming

2021-09-30 Thread Alvaro Herrera
On 2021-Sep-30, Tom Lane wrote:

> Andrew Dunstan  writes:
> > Regardless of this problem, I think we should simply call
> > set_replication_conf unconditionally in init().  Replication connections
> > are now allowed by default on Unix, this would just bring Windows nodes
> > into line with that.
> 
> Yeah, I was thinking along the same lines yesterday.  The fact that
> pre-commit testing failed to note the problem is exactly because we
> have this random difference between what works by default on Unix
> and what works by default on Windows.  Let's close that gap before
> it bites us again.

+1

> There's still the issue of these tests overriding postgresql.conf
> entries made by init(), but maybe we can live with that?

I vote to at least have has_archiving=>1 set wal_level=replica, and
potentially max_wal_senders=2 too (allows pg_basebackup).

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: prevent immature WAL streaming

2021-09-30 Thread Tom Lane
Andrew Dunstan  writes:
> Regardless of this problem, I think we should simply call
> set_replication_conf unconditionally in init().  Replication connections
> are now allowed by default on Unix, this would just bring Windows nodes
> into line with that.

Yeah, I was thinking along the same lines yesterday.  The fact that
pre-commit testing failed to note the problem is exactly because we
have this random difference between what works by default on Unix
and what works by default on Windows.  Let's close that gap before
it bites us again.

There's still the issue of these tests overriding postgresql.conf
entries made by init(), but maybe we can live with that?

regards, tom lane




Re: prevent immature WAL streaming

2021-09-30 Thread Andrew Dunstan


On 9/29/21 5:29 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 9/29/21 4:33 PM, Tom Lane wrote:
>>> which looks like a pretty straightforward bogus-connection-configuration
>>> problem, except why wouldn't other BF members show it?
>> This:
>> ...
>> doesn't have "allows_streaming => 1".
> Oh, and that only breaks things on Windows, cf set_replication_conf.
>
> ... although I wonder why the fact that sub init otherwise sets
> wal_level = minimal doesn't cause a problem for this test.
> Maybe the test script is cowboy-ishly overriding that?
>
>   



Regardless of this problem, I think we should simply call
set_replication_conf unconditionally in init().  Replication connections
are now allowed by default on Unix, this would just bring Windows nodes
into line with that.


The function does have this:

$self->host eq $test_pghost
   or die "set_replication_conf only works with the default host";

I'm not sure when that wouldn't be true.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: non-superusers are allowed to drop the replication user, but are not allowed to alter or even create them, is that ok?

2021-09-30 Thread Bharath Rupireddy
On Thu, Sep 30, 2021 at 3:37 PM Ashutosh Sharma  wrote:
>
> Hi All,
>
> While working on one of the internal projects I noticed that currently in 
> Postgres, we do not allow normal users to alter attributes of the replication 
> user. However we do allow normal users to drop replication users or to even 
> rename it using the alter command. Is that behaviour ok? If yes, can someone 
> please help me understand how and why this is okay.
>
> Here is an example illustrating this behaviour:
>
> supusr@postgres=# create user repusr with password 'repusr' replication;
> CREATE ROLE
>
> supusr@postgres=# create user nonsu with password 'nonsu' createrole createdb;
> CREATE ROLE
>
> supusr@postgres=# \c postgres nonsu;
> You are now connected to database "postgres" as user "nonsu".
>
> nonsu@postgres=> alter user repusr nocreatedb;
> ERROR:  42501: must be superuser to alter replication roles or change 
> replication attribute
>
> nonsu@postgres=> alter user repusr rename to refusr;
> ALTER ROLE
>
> nonsu@postgres=> drop user refusr;
> DROP ROLE
>
> nonsu@postgres=> create user repusr2 with password 'repusr2' replication;
> ERROR:  42501: must be superuser to create replication users

I think having createrole for a non-super allows them to rename/drop a
user with a replication role. Because renaming/creating/dropping roles
is what createrole/nocreaterole is meant for.

postgres=# create user nonsu_nocreterole with createdb;
CREATE ROLE
postgres=# set role nonsu_nocreterole;
SET
postgres=> alter user repusr rename to refusr;
ERROR:  permission denied to rename role
postgres=> drop user refusr;
ERROR:  permission denied to drop role
postgres=>

Regards,
Bharath Rupireddy.




Re: Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS

2021-09-30 Thread Tom Lane
Jelte Fennema  writes:
> Previously successfully opened TCP connections can still fail on reads
> with ETIMEDOUT. This should be considered a connection failure, so that
> the connection in libpq is marked as CONNECTION_BAD. The reason I got an
> ETIMEDOUT was, because I had set a low tcp_user_timeout in the
> connection string. However, it can probably also happen due to
> keepalive limits being reached.

I'm dubious about the portability of this patch, because we don't
use ETIMEDOUT elsewhere.  strerror.c thinks it may not exist,
which is probably overly conservative because POSIX has required
it since SUSv2.  The bigger problem is that it's not accounted for in
the WSAxxx mapping done in port/win32_port.h and TranslateSocketError.
That'd have to be fixed for this to behave reasonably on Windows,
I think.

regards, tom lane




Re: pg_stat_statements and "IN" conditions

2021-09-30 Thread Dmitry Dolgov
>On Wed, Jun 16, 2021 at 04:02:12PM +0200, Dmitry Dolgov wrote:
>
> > I've prepared a new rebased version to deal with the new way of
> > computing query id, but as always there is one tricky part. From what I
> > understand, now an external module can provide custom implementation for
> > query id computation algorithm. It seems natural to think this machinery
> > could be used instead of patch in the thread, i.e. one could create a
> > custom logic that will enable constants collapsing as needed, so that
> > same queries with different number of constants in an array will be
> > hashed into the same record.
> >
> > But there is a limitation in how such queries will be normalized
> > afterwards — to reduce level of surprise it's necessary to display the
> > fact that a certain query in fact had more constants that are showed in
> > pgss record. Ideally LocationLen needs to carry some bits of information
> > on what exactly could be skipped, and generate_normalized_query needs to
> > understand that, both are not reachable for an external module with
> > custom query id logic (without replicating significant part of the
> > existing code). Hence, a new version of the patch.
>
> Forgot to mention a couple of people who already reviewed the patch.

And now for something completely different, here is a new patch version.
It contains a small fix for one problem we've found during testing (one
path code was incorrectly assuming find_const_walker results).
>From 8025228206518ce0db0562174932da9b9ab63c78 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Thu, 10 Jun 2021 13:15:35 +0200
Subject: [PATCH v5] Prevent jumbling of every element in ArrayExpr

pg_stat_statements produces multiple entries for queries like

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on number of parameters, because every element of ArrayExpr is
jumbled. Make Consts (or any expression that could be reduced to a
Const) contribute nothing to the jumble hash if they're part of a series
and at position further that specified threshold. Do the same for
similar queries with VALUES as well.

Reviewed-by: Zhihong Yu, Sergey Dudoladov
Tested-by: Chengxi Sun
---
 .../expected/pg_stat_statements.out   | 845 +-
 .../pg_stat_statements/pg_stat_statements.c   |  42 +-
 .../sql/pg_stat_statements.sql| 170 
 src/backend/utils/misc/guc.c  |  13 +
 src/backend/utils/misc/queryjumble.c  | 287 +-
 src/include/utils/queryjumble.h   |  11 +-
 6 files changed, 1353 insertions(+), 15 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index b52d187722..140b227fb4 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -205,7 +205,7 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
  INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 |   10
  SELECT * FROM test ORDER BY a| 1 |   12
  SELECT * FROM test WHERE a > $1 ORDER BY a   | 2 |4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5)   | 1 |8
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, ...)  | 1 |8
  SELECT pg_stat_statements_reset()| 1 |1
  SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 0 |0
  UPDATE test SET b = $1 WHERE a = $2  | 6 |6
@@ -1077,4 +1077,847 @@ SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
  2
 (1 row)
 
+--
+-- Consts merging
+--
+SET pg_stat_statements.merge_threshold = 5;
+CREATE TABLE test_merge (id int, data int);
+-- IN queries
+-- Normal
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3)  | 1
+ SELECT pg_stat_statements_reset()  | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(3 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8);
+ id | data 
++---

Re: prevent immature WAL streaming

2021-09-30 Thread Alvaro Herrera
Hello

On 2021-Sep-30, Andres Freund wrote:

> FWIW, with that fixed I see the test hanging (e.g. [1]):
> 
> can't unlink 
> c:/cirrus/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary2_data/archives/00010008.fail:
>  Permission denied at t/026_overwrite_contrecord.pl line 189.
> ### Stopping node "primary2" using mode immediate
> # Running: pg_ctl -D 
> c:/cirrus/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary2_data/pgdata
>  -m immediate stop
> waiting for server to shut 
> down...
>  failed
> pg_ctl: server does not shut down
> Bail out!  command "pg_ctl -D 
> c:/cirrus/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary2_data/pgdata
>  -m immediate stop" exited with value 1
> Warning: unable to close filehandle GEN6 properly: Bad file descriptor during 
> global destruction.
> 
> The hang seems to be fixed by uncommenting the $h->finish(). Was there a
> reason you commented that out?

Hmm, no -- I was experimenting to see what effect it had, and because I
saw none, I left it like that.  Let me fix both things and see what
happens next.

> THe test still fails, but at least it doesn't hang anymore.

I'll try and get it fixed, but ultimately we can always "fix" this test
by removing it.  It is only in 14 and master; the earlier branches only
have the first part of this test file.  (That's because I ran out of
patience trying to port it to versions lacking LSN operators and such.)
I think the second half doesn't provide all that much additional
coverage anyway.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)




Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-30 Thread Amul Sul
On Fri, Sep 24, 2021 at 5:07 PM Amul Sul  wrote:
>
> On Thu, Sep 23, 2021 at 11:56 PM Robert Haas  wrote:
> >
> > On Mon, Sep 20, 2021 at 11:20 AM Amul Sul  wrote:
> > > Ok, understood, I have separated my changes into 0001 and 0002 patch,
> > > and the refactoring patches start from 0003.
> >
> > I think it would be better in the other order, with the refactoring
> > patches at the beginning of the series.
> >
>
> Ok, will do that. I did this other way to minimize the diff e.g.
> deletion diff of RecoveryXlogAction enum and
> DetermineRecoveryXlogAction(), etc.
>

I have reversed the patch order. Now refactoring patches will be
first, and the patch that removes the dependencies on global & local
variables will be the last. I did the necessary modification in the
refactoring patches too e.g. removed DetermineRecoveryXlogAction() and
RecoveryXlogAction enum which is no longer needed (thanks to commit #
1d919de5eb3fffa7cc9479ed6d2915fb89794459 to make code simple).

To find the value of InRecovery after we clear it, patch still uses
ControlFile's DBState, but now the check condition changed to a more
specific one which is less confusing.

In casual off-list discussion, the point was made to check
SharedRecoveryState to find out the InRecovery value afterward, and
check that using RecoveryInProgress().  But we can't depend on
SharedRecoveryState because at the start it gets initialized to
RECOVERY_STATE_CRASH irrespective of InRecovery that happens later.
Therefore, we can't use RecoveryInProgress() which always returns
true if SharedRecoveryState != RECOVERY_STATE_DONE.

I am posting only refactoring patches for now.

Regards,
Amul
From 730e8331fefc882b4cab7112adf0f4d8da1ea831 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Thu, 30 Sep 2021 06:29:06 -0400
Subject: [PATCH v36 4/4] Remove dependencies on startup-process specifical
 variables.

To make XLogAcceptWrites(), need to dependency on few global and local
variable spcific to startup process.

Global variables are ArchiveRecoveryRequested and
LocalPromoteIsTriggered, whereas LocalPromoteIsTriggered can be
accessed in any other process using existing PromoteIsTriggered().
ArchiveRecoveryRequested is made accessible by copying into shared
memory.

XLogAcceptWrites() accepts two argument as EndOfLogTLI and EndOfLog
which are local to StartupXLOG(). Instead of passing as an argument
XLogCtl->replayEndTLI and XLogCtl->lastSegSwitchLSN from the shared
memory can be used as an replacement to EndOfLogTLI and EndOfLog
respectively.  XLogCtl->lastSegSwitchLSN is not going to change until
we use it. That changes only when the current WAL segment gets full
which never going to happen because of two reasons, first WAL writes
are disabled for other processes until XLogAcceptWrites() finishes and
other reasons before use of lastSegSwitchLSN, XLogAcceptWrites() is
writes fix size wal records as full-page write and record for either
recovery end or checkpoint which not going to fill up the 16MB wal
segment.

EndOfLogTLI in the StartupXLOG() is the timeline ID of the last record
that xlogreader reads, but this xlogreader was simply re-fetching the
last record which we have replied in redo loop if it was in recovery,
if not in recovery, we don't need to worry since this value is needed
only in case of ArchiveRecoveryRequested = true, which implicitly
forces redo and sets XLogCtl->replayEndTLI value.
---
 src/backend/access/transam/xlog.c | 36 ++-
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 91cdd7d9ff2..5b4e5ac379f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -659,6 +659,13 @@ typedef struct XLogCtlData
 	 */
 	bool		SharedPromoteIsTriggered;
 
+	/*
+	 * SharedArchiveRecoveryRequested exports the value of the
+	 * ArchiveRecoveryRequested flag to be share which is otherwise valid only
+	 * in the startup process.
+	 */
+	bool		SharedArchiveRecoveryRequested;
+
 	/*
 	 * WalWriterSleeping indicates whether the WAL writer is currently in
 	 * low-power mode (and hence should be nudged if an async commit occurs).
@@ -880,8 +887,7 @@ static MemoryContext walDebugCxt = NULL;
 static void readRecoverySignalFile(void);
 static void validateRecoveryParameters(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
-static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,
-		XLogRecPtr EndOfLog);
+static void CleanupAfterArchiveRecovery(void);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
 static char *getRecoveryStopReason(void);
@@ -927,7 +933,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
 			  int emode, bool fetching_ckpt);
 static void CheckRecoveryConsistency(void);
-static bool XLogAcceptWrites(TimeLineID EndOfLogT

Re: Logical replication keepalive flood

2021-09-30 Thread Amit Kapila
On Thu, Sep 30, 2021 at 3:41 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 30 Sep 2021 17:08:35 +0900 (JST), Kyotaro Horiguchi 
>  wrote in
> > At Thu, 30 Sep 2021 17:21:03 +1000, Greg Nancarrow  
> > wrote in
> > > Actually, with the patch applied, I find that "make check-world" fails
> > > (006_logical_decoding, test 7).
> >
> > Mmm..
> >
> > t/006_logical_decoding.pl .. 4/14
> > #   Failed test 'pg_recvlogical acknowledged changes'
> > #   at t/006_logical_decoding.pl line 117.
> > #  got: 'BEGIN
> > # table public.decoding_test: INSERT: x[integer]:5 y[text]:'5''
> > # expected: ''
> >
> > I'm not sure what the test is checking for now, though.
>
> It's checking that endpos works correctly? The logical decoded WALs
> looks like this.
>
> 0/1528F10|table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
> 0/15290F8|table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
> 0/1529138|table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
> 0/1529178|table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
> 0/15291E8|COMMIT 709
> 0/15291E8|BEGIN 710
> 0/15291E8|table public.decoding_test: INSERT: x[integer]:5 y[text]:'5'
> 0/1529228|table public.decoding_test: INSERT: x[integer]:6 y[text]:'6'
>
> The COMMIT and BEGIN shares the same LSN, which I don't understand how come.
>

This is because endlsn is always commit record + 1 which makes it
equal to start of next record and we use endlsn here for commit. See
below comments in code.
/*
* LSN pointing to the end of the commit record + 1.
*/
XLogRecPtr end_lsn;

> The previous read by pg_recvlocal prceeded upto the COMMIT record. and
> the following command runs after that behaves differently.
>
> pg_recvlogical -S test_slot --dbname postgres --endpos '0/15291E8' -f - 
> --no-loop --start
>
> Before the patch it ends before reading a record, and after the patch
> it reads into the "table ..." line.  pg_recvlogical seems using the
> endpos as the beginning of the last record. In that meaning the three
> lines (COMMIT 709/BEGIN 710/table ...'5') are falls under the end of
> data.
>
> The difference seems coming from the timing keepalive
> comes. pg_recvlogical checks the endpos only when keepalive comes.  In
> other words, it needs keepalive for every data line so that it stops
> exactly at the specified endpos.
>
> 1. Is it the correct behavior that the three data lines share the same
>   LSN?  I think BEGIN and the next line should do, but COMMIT and next
>   BEGIN shouldn't.
>
> 2. Is it the designed behavior that pg_recvlogical does check endpos
>   only when a keepalive comes? If it is the correct behavior, we
>   shouldn't avoid the keepalive flood.
>

If anything, I think this is a testcase issue as explained by me in email [1]

[1] - 
https://www.postgresql.org/message-id/CAA4eK1Ja2XmK59Czv1V%2BtfOgU4mcFfDwTtTgO02Wd%3DrO02JbiQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Logical replication keepalive flood

2021-09-30 Thread Amit Kapila
On Thu, Sep 30, 2021 at 1:26 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 30 Sep 2021 16:21:25 +1000, Greg Nancarrow  
> wrote in
> > On Thu, Sep 30, 2021 at 3:56 PM Amit Kapila  wrote:
> > >
>
> > > Also, more to the point this special keep_alive seems to be sent for
> > > synchronous replication and walsender shutdown as they can expect
> > > updated locations. You haven't given any reason (theory) why those two
> > > won't be impacted due to this change? I am aware that for synchronous
> > > replication, we wake waiters while ProcessStandbyReplyMessage but I am
> > > not sure how it helps with wal sender shutdown? I think we need to
> > > know the reasons for this message and then need to see if the change
> > > has any impact on the same.
> > >
> >
> > Yes, I'm not sure about the possible impacts, still looking at it.
>
> If the comment describes the objective correctly, the only possible
> impact would be that there may be a case where server responds a bit
> slowly for a shutdown request.  But I'm not sure it is definitely
> true.
>

So, we should try to find how wal sender shutdown is dependent on
sending keep alive and second thing is what about sync rep case? I
think in the worst case that also might delay. Why do you think that
would be acceptable?

-- 
With Regards,
Amit Kapila.




Re: pgcrypto support for bcrypt $2b$ hashes

2021-09-30 Thread Andreas Karlsson

On 9/28/21 11:58 PM, Daniel Fone wrote:

On 29/09/2021, at 2:33 AM, Daniel Gustafsson  wrote:
I don't see why not, the best first patches are those scratching an itch.  If
you feel up for it then give it a go, I - and the rest of pgsql-hackers - can
help if you need to bounce ideas.


I’m glad you said that. I couldn’t resist trying and have attached a patch. By 
referencing the respective git logs, I didn’t have too much difficulty 
identifying the material changes in each codebase. I’ve documented all the 
postgres-specific changes to upstream in the header comment for each file.


I took a quick look and on a cursory glance it looks good but I got 
these compilation warnings.


crypt-blowfish.c: In function ‘BF_crypt’:
crypt-blowfish.c:789:3: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]

  789 |   int   done;
  |   ^~~
crypt-blowfish.c: In function ‘_crypt_blowfish_rn’:
crypt-blowfish.c:897:8: warning: variable ‘save_errno’ set but not used 
[-Wunused-but-set-variable]

  897 |  int   save_errno,
  |^~

Andreas





Re: Failed transaction statistics to measure the logical replication progress

2021-09-30 Thread Amit Kapila
On Thu, Sep 30, 2021 at 1:02 PM Osumi, Takamichi/大墨 昂道
 wrote:
>
> On Thursday, September 30, 2021 1:15 PM Amit Kapila  
> wrote:
> > On Thu, Sep 30, 2021 at 8:22 AM Hou, Zhijie/侯 志杰
> >  wrote:
> > >
> > > On Tues, Sep 28, 2021 6:05 PM Amit Kapila 
> > wrote:
> > > Adding a new view seems resonalble, but it will bring another
> > > subscription related view which might be too much. OTOH, I can see
> > > there are already some different views[1] including xact stat, maybe 
> > > adding
> > another one is accepatble ?
> > These all views are related to untransmitted to the collector but what we 
> > really
> > need is a view similar to pg_stat_archiver or pg_stat_bgwriter which gives
> > information about background workers.
> > Now, the problem as I see is if we go that route then pg_stat_subscription 
> > will
> > no longer remain dynamic view and one might consider that as a compatibility
> > break. The other idea I have shared is that we display these stats under 
> > the new
> > view introduced by Sawada-San's patch [1] and probably rename that view as
> > pg_stat_subscription_worker where all the stats (xact info and last failure
> > information) about each worker will be displayed.
> Sorry, all the stats in pg_stat_subscription_worker view ?
>
> There was a discussion that
> the xact info should be displayed from pg_stat_subscription
> with existing stats in the same (which will be changed to persist),
> but when your above proposal comes true,
> the list of pg_stat_subscription_worker's columns
> will be something like below (when I list up major columns).
>
> - subid, subrelid and some other relation attributes required
> - 5 stats values moved from pg_stat_subscription
>   received_lsn, last_msg_send_time, last_msg_receipt_time,
>   latest_end_lsn, latest_end_time
>

If we go with the view as pg_stat_subscription_worker, we don't need
to move the existing stats of pg_stat_subscription into it. There is a
clear difference between them, all the stats displayed via
pg_stat_subscription are dynamic and won't persist whereas all the
stats corresponding to pg_stat_subscription_worker view will persist.


-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2021-09-30 Thread Amit Kapila
On Wed, Sep 29, 2021 at 6:49 PM Alvaro Herrera  wrote:
>
> On 2021-Sep-28, Amit Kapila wrote:
>
> > But it is not allowed to create schema or table with the name
> > CURRENT_SCHEMA, so not sure if we need to do anything special for it.
>
> Oh?  You certainly can.
>
> alvherre=# create schema "CURRENT_SCHEMA";
> CREATE SCHEMA
> alvherre=# \dn
> Listado de esquemas
>  Nombre |   Dueño
> +---
>  CURRENT_SCHEMA | alvherre
>  public | pg_database_owner
>  temp   | alvherre
> (3 filas)
>
> alvherre=# create table "CURRENT_SCHEMA"."CURRENT_SCHEMA" ("bother amit for a 
> while" int);
> CREATE TABLE
> alvherre=# \d "CURRENT_SCHEMA".*
>   Tabla «CURRENT_SCHEMA.CURRENT_SCHEMA»
>  Columna |  Tipo   | Ordenamiento | Nulable | Por omisión
> -+-+--+-+-
>  bother amit for a while | integer |  | |
>

oops, I was trying without quotes.

>
> > Now, during post-processing, the PUBLICATIONOBJ_CONTINUATION will be
> > distinguished as CURRENT_SCHEMA because both rangeVar and name will be
> > NULL. Do you have other ideas to deal with it?
>
> That sounds plausible.  There's no need for a name-free object of any other
> kind AFAICS, so there should be no conflict.  If we ever do find a
> conflict, we can add another struct member to disambiguate.
>

Okay, thanks. I feel now we are in agreement on the grammar rules.


-- 
With Regards,
Amit Kapila.




Re: rand48 replacement

2021-09-30 Thread Fabien COELHO



I guess the declaration needs PGDLLIMPORT.


Indeed, thanks!

Attached v16 adds that.

--
Fabien.diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index d19f73127c..b250ae912b 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -32,6 +32,7 @@
 #include "catalog/index.h"
 #include "catalog/pg_am.h"
 #include "commands/tablecmds.h"
+#include "common/pg_prng.h"
 #include "lib/bloomfilter.h"
 #include "miscadmin.h"
 #include "storage/lmgr.h"
@@ -465,8 +466,8 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 		total_pages = RelationGetNumberOfBlocks(rel);
 		total_elems = Max(total_pages * (MaxTIDsPerBTreePage / 3),
 		  (int64) state->rel->rd_rel->reltuples);
-		/* Random seed relies on backend srandom() call to avoid repetition */
-		seed = random();
+		/* Random seed relies on backend prng initialization to avoid repetition */
+		seed = pg_prng_u64(&pg_global_prng_state);
 		/* Create Bloom filter to fingerprint index */
 		state->filter = bloom_create(total_elems, maintenance_work_mem, seed);
 		state->heaptuplespresent = 0;
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index e9092ba359..0c4279447c 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -16,6 +16,7 @@
 
 #include "access/parallel.h"
 #include "commands/explain.h"
+#include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "jit/jit.h"
 #include "utils/guc.h"
@@ -275,8 +276,7 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	if (nesting_level == 0)
 	{
 		if (auto_explain_log_min_duration >= 0 && !IsParallelWorker())
-			current_query_sampled = (random() < auto_explain_sample_rate *
-	 ((double) MAX_RANDOM_VALUE + 1));
+			current_query_sampled = pg_prng_f64(&pg_global_prng_state) < auto_explain_sample_rate;
 		else
 			current_query_sampled = false;
 	}
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2c2f149fb0..146b524076 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1188,7 +1188,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
  * Found a suitable tuple, so save it, replacing one old tuple
  * at random
  */
-int			k = (int) (targrows * sampler_random_fract(rstate.randstate));
+int			k = (int) (targrows * sampler_random_fract(&rstate.randstate));
 
 Assert(k >= 0 && k < targrows);
 heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 76d4fea21c..c139382170 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5157,7 +5157,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		if (astate->rowstoskip <= 0)
 		{
 			/* Choose a random reservoir element to replace. */
-			pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
+			pos = (int) (targrows * sampler_random_fract(&astate->rstate.randstate));
 			Assert(pos >= 0 && pos < targrows);
 			heap_freetuple(astate->rows[pos]);
 		}
diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c
index 52b272f298..ac6db426ac 100644
--- a/contrib/tablefunc/tablefunc.c
+++ b/contrib/tablefunc/tablefunc.c
@@ -36,6 +36,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/pg_prng.h"
 #include "executor/spi.h"
 #include "funcapi.h"
 #include "lib/stringinfo.h"
@@ -290,8 +291,8 @@ get_normal_pair(float8 *x1, float8 *x2)
 
 	do
 	{
-		u1 = (float8) random() / (float8) MAX_RANDOM_VALUE;
-		u2 = (float8) random() / (float8) MAX_RANDOM_VALUE;
+		u1 = pg_prng_f64(&pg_global_prng_state);
+		u2 = pg_prng_f64(&pg_global_prng_state);
 
 		v1 = (2.0 * u1) - 1.0;
 		v2 = (2.0 * u2) - 1.0;
diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c
index 4996612902..1a46d4b143 100644
--- a/contrib/tsm_system_rows/tsm_system_rows.c
+++ b/contrib/tsm_system_rows/tsm_system_rows.c
@@ -69,7 +69,7 @@ static BlockNumber system_rows_nextsampleblock(SampleScanState *node, BlockNumbe
 static OffsetNumber system_rows_nextsampletuple(SampleScanState *node,
 BlockNumber blockno,
 OffsetNumber maxoffset);
-static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate);
+static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate);
 
 
 /*
@@ -213,25 +213,25 @@ system_rows_nextsampleblock(SampleScanState *node, BlockNumber nblocks)
 		if (sampler->step == 0)
 		{
 			/* Initialize now that we have scan descriptor */
-			SamplerRandomState randstate;
+			pg_prng_state randstate;
 
 			/* If relation is empty, there's nothing to scan */
 			if (nblocks == 0)
 return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
-			sampler_random_init_state(sampler->seed, randstate);
+			sampler_random_init_state(sampler->seed, &ra

Re: Make relfile tombstone files conditional on WAL level

2021-09-30 Thread Thomas Munro
On Wed, Sep 29, 2021 at 4:07 PM Thomas Munro  wrote:
> Hmm, on closer inspection, isn't the lack of real interlocking with
> checkpoints a bit suspect already?  What stops bgwriter from writing
> to the previous relfilenode generation's fd, if a relfilenode is
> recycled while BgBufferSync() is running?  Not sinval, and not the
> above code that only runs between BgBufferSync() invocations.

I managed to produce a case where live data is written to an unlinked
file and lost, with a couple of tweaks to get the right timing and
simulate OID wraparound.  See attached.  If you run the following
commands repeatedly with shared_buffers=256kB and
bgwriter_lru_multiplier=10, you should see a number lower than 10,000
from the last query in some runs, depending on timing.

create extension if not exists chaos;
create extension if not exists pg_prewarm;

drop table if exists t1, t2;
checkpoint;
vacuum pg_class;

select clobber_next_oid(20);
create table t1 as select 42 i from generate_series(1, 1);
select pg_prewarm('t1'); -- fill buffer pool with t1
update t1 set i = i; -- dirty t1 buffers so bgwriter writes some
select pg_sleep(2); -- give bgwriter some time

drop table t1;
checkpoint;
vacuum pg_class;

select clobber_next_oid(20);
create table t2 as select 0 i from generate_series(1, 1);
select pg_prewarm('t2'); -- fill buffer pool with t2
update t2 set i = 1 where i = 0; -- dirty t2 buffers so bgwriter writes some
select pg_sleep(2); -- give bgwriter some time

select pg_prewarm('pg_attribute'); -- evict all clean t2 buffers
select sum(i) as t2_sum_should_be_1 from t2; -- have any updates been lost?
From b116b80b2775b004e35a9e7be0a057ee2724041b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 30 Sep 2021 15:47:23 +1300
Subject: [PATCH 1/2] HACK: A function to control the OID allocator.

---
 src/test/modules/chaos/Makefile   | 21 +
 src/test/modules/chaos/chaos--1.0.sql |  8 
 src/test/modules/chaos/chaos.c| 26 ++
 src/test/modules/chaos/chaos.control  |  4 
 4 files changed, 59 insertions(+)
 create mode 100644 src/test/modules/chaos/Makefile
 create mode 100644 src/test/modules/chaos/chaos--1.0.sql
 create mode 100644 src/test/modules/chaos/chaos.c
 create mode 100644 src/test/modules/chaos/chaos.control

diff --git a/src/test/modules/chaos/Makefile b/src/test/modules/chaos/Makefile
new file mode 100644
index 00..ac69721af6
--- /dev/null
+++ b/src/test/modules/chaos/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/chaos/Makefile
+
+MODULE_big = chaos
+OBJS = \
+	$(WIN32RES) \
+	chaos.o
+PGFILEDESC = "chaos - module in which to write throwaway fault-injection hacks"
+
+EXTENSION = chaos
+DATA = chaos--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/chaos
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/chaos/chaos--1.0.sql b/src/test/modules/chaos/chaos--1.0.sql
new file mode 100644
index 00..5016f7e586
--- /dev/null
+++ b/src/test/modules/chaos/chaos--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/chaos/chaos--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION chaos" to load this file. \quit
+
+CREATE FUNCTION clobber_next_oid(size BIGINT)
+	RETURNS pg_catalog.void STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/chaos/chaos.c b/src/test/modules/chaos/chaos.c
new file mode 100644
index 00..f1052f865e
--- /dev/null
+++ b/src/test/modules/chaos/chaos.c
@@ -0,0 +1,26 @@
+#include "postgres.h"
+
+#include "access/transam.h"
+#include "fmgr.h"
+#include "storage/lwlock.h"
+
+#include 
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(clobber_next_oid);
+
+Datum
+clobber_next_oid(PG_FUNCTION_ARGS)
+{
+	int64		oid = PG_GETARG_INT64(0);
+
+	if (oid < FirstNormalObjectId || oid > UINT_MAX)
+		elog(ERROR, "invalid oid");
+
+	LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
+	ShmemVariableCache->nextOid = oid;
+	LWLockRelease(OidGenLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/test/modules/chaos/chaos.control b/src/test/modules/chaos/chaos.control
new file mode 100644
index 00..137ab8a58d
--- /dev/null
+++ b/src/test/modules/chaos/chaos.control
@@ -0,0 +1,4 @@
+comment = 'Test module for throwaway fault-injection hacks...'
+default_version = '1.0'
+module_pathname = '$libdir/chaos'
+relocatable = true
-- 
2.30.2

From 2acc2ad31c1db268de0e8927d5c10ba2bb06e33c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 30 Sep 2021 17:16:01 +1300
Subject: [PATCH 2/2] HACK: Slow the bgwriter down a bit.

---
 src/backend/postmaster/bgwriter.c   | 2 ++
 src/backend/storage/buffer/bufmgr.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 5584f4bc24..b65284b1f6

Re: Added schema level support for publication.

2021-09-30 Thread vignesh C
On Wed, Sep 29, 2021 at 11:49 AM Greg Nancarrow  wrote:
>
> On Wed, Sep 29, 2021 at 3:16 PM Amit Kapila  wrote:
> >
> > 4.
> > + /*
> > + * Check if setting the relation to a different schema will result in the
> > + * publication having schema and same schema's table in the publication.
> > + */
> > + if (stmt->objectType == OBJECT_TABLE)
> > + {
> > + ListCell   *lc;
> > + List*schemaPubids = GetSchemaPublications(nspOid);
> > + foreach(lc, schemaPubids)
> > + {
> > + Oid pubid = lfirst_oid(lc);
> > + if (list_member_oid(GetPublicationRelations(pubid, PUBLICATION_PART_ALL),
> > + relid))
> > + ereport(ERROR,
> > + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > + errmsg("cannot move table \"%s\" to schema \"%s\"",
> > +RelationGetRelationName(rel), stmt->newschema),
> > + errdetail("Altering table will result in having schema \"%s\" and
> > same schema's table \"%s\" in the publication \"%s\" which is not
> > supported.",
> > +   stmt->newschema,
> > +   RelationGetRelationName(rel),
> > +   get_publication_name(pubid, false)));
> > + }
> > + }
> >
> > Let's slightly change the comment as: "Check that setting the relation
> > to a different schema won't result in the publication having schema
> > and the same schema's table." and errdetail as: "The schema \"%s\" and
> > same schema's table \"%s\" cannot be part of the same publication
> > \"%s\"."
> >
>
> Since this code is in AlterTableNamespace() and the relation being
> checked may or may not be part of a publication, I'd use "a
> publication" instead of "the publication" in the comment.
> Also, I'd say that we're doing the check because the mentioned
> combination is not supported.
>
> i.e. "Check that setting the relation to a different schema won't
> result in a publication having both a schema and the same schema's
> table, as this is not supported."

Thanks for the comment, I have handled it in the v35 version patch
attached at [1]
[1] - 
https://www.postgresql.org/message-id/CALDaNm2yJOEPCqR%3DgTMEwveJujH9c9_z4LhKmk2T3vZH7T1DLQ%40mail.gmail.com

Regards,
Vignesh




Re: Added schema level support for publication.

2021-09-30 Thread vignesh C
On Wed, Sep 29, 2021 at 10:46 AM Amit Kapila  wrote:
>
> On Tue, Sep 28, 2021 at 8:15 PM vignesh C  wrote:
> >
> > On Mon, Sep 27, 2021 at 12:15 PM houzj.f...@fujitsu.com
> >  wrote:
> >
> > Attached v34 patch has the changes for the same.
> >
>
> Few comments on v34-0001-Added-schema-level-support-for-publication
> ==
> 1.
> + * This rule parses publication object with and without keyword prefix.
>
> I think we should write it as: "This rule parses publication objects
> with and without keyword prefixes."

Modified

> 2.
> + * For the object without keyword prefix, we cannot just use
> relation_expr here,
> + * because some extended expression in relation_expr cannot be used as a
>
> /expression/expressions

Modified

> 3.
> +/*
> + * Process pubobjspec_list to check for errors in any of the objects and
> + * convert PUBLICATIONOBJ_CONTINUATION into appropriate 
> PublicationObjSpecType
> + * type.

Modified to remove type as discussed offline.

> 4.
> + /*
> + * Check if setting the relation to a different schema will result in the
> + * publication having schema and same schema's table in the publication.
> + */
> + if (stmt->objectType == OBJECT_TABLE)
> + {
> + ListCell   *lc;
> + List*schemaPubids = GetSchemaPublications(nspOid);
> + foreach(lc, schemaPubids)
> + {
> + Oid pubid = lfirst_oid(lc);
> + if (list_member_oid(GetPublicationRelations(pubid, PUBLICATION_PART_ALL),
> + relid))
> + ereport(ERROR,
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot move table \"%s\" to schema \"%s\"",
> +RelationGetRelationName(rel), stmt->newschema),
> + errdetail("Altering table will result in having schema \"%s\" and
> same schema's table \"%s\" in the publication \"%s\" which is not
> supported.",
> +   stmt->newschema,
> +   RelationGetRelationName(rel),
> +   get_publication_name(pubid, false)));
> + }
> + }
>
> Let's slightly change the comment as: "Check that setting the relation
> to a different schema won't result in the publication having schema
> and the same schema's table." and errdetail as: "The schema \"%s\" and
> same schema's table \"%s\" cannot be part of the same publication
> \"%s\"."
>
> Maybe it is better to specify that this will disallow the partition table 
> case.

Modified, I did not add the above as we are allowing it.

> 5.
> ObjectsInPublicationToOids()
> {
> ..
> + pubobj = (PublicationObjSpec *) lfirst(cell);
> + if (pubobj->pubobjtype == PUBLICATIONOBJ_TABLE)
>
> It is better to keep an empty line between above two lines.

Modified

> 6.
> List*schemaPubids = GetSchemaPublications(nspOid);
> foreach(lc, schemaPubids)
> ..
> Oid pubid = lfirst_oid(lc);
> if (list_member_oid(GetPublicationRelations(pubid, PUBLICATION_PART_ALL),
>
> Add an empty line between each of the above two lines.

Modified

> 7.
> + /*
> + * Schema lock is held until the publication is altered to prevent
> + * concurrent schema deletion. No need to unlock the schemas, the locks
> + * will be released automatically at the end of alter publication command.
> + */
> + LockSchemaList(schemaidlist);
>
> I think it is better to add a similar comment at other places where
> this function is called. And we can shorten the comment atop
> LockSchemaList to something like: "The schemas specified in the schema
> list are locked in AccessShareLock mode in order to prevent concurrent
> schema deletion."

Modified

> 8. In CreatePublication(), the check if (stmt->for_all_tables) can be
> the first check and then in else if we can process tables and schemas.

Modified

> 9.
> AlterPublication()
> {
> ..
> + /* Lock the publication so nobody else can do anything with it. */
> + LockDatabaseObject(PublicationRelationId, pubform->oid, 0,
> +AccessExclusiveLock);
>
> I think it is better to say why we need this lock. So, can we change
> the comment to something like: "Lock the publication so nobody else
> can do anything with it. This prevents concurrent alter to add
> table(s) that were already going to become part of the publication by
> adding corresponding schema(s) via this command and similarly it will
> prevent the concurrent addition of schema(s) for which there is any
> corresponding table being added by this command."

Modified

These comments are handled in the v35 version patch attached at [1]
[1] - 
https://www.postgresql.org/message-id/CALDaNm2yJOEPCqR%3DgTMEwveJujH9c9_z4LhKmk2T3vZH7T1DLQ%40mail.gmail.com

Regards,
Vignesh




Re: Added schema level support for publication.

2021-09-30 Thread vignesh C
On Wed, Sep 29, 2021 at 9:07 AM houzj.f...@fujitsu.com
 wrote:
>
> On Tues, Sep 28, 2021 10:46 PM vignesh C  wrote:
> > Attached v34 patch has the changes for the same.
>
> Thanks for updating the patch.
> Here are a few comments.
>
> 1)
> + * ALL TABLES IN SCHEMA schema [[, ...]
>
> [[ -> [

Modified

> 2)
> +   /* ALTER PUBLICATION ... ADD/DROP TABLE/ALL TABLES IN SCHEMA 
> parameters */
>
> The two '/' seems a bit unclear and it doesn't mention the SET case.
> Maybe we can write like:
>
> /* parameters used for ALTER PUBLICATION ... ADD/DROP/SET publication objects 
> */

Modified

> 3)
> +   /*
> +* Check if setting the relation to a different schema will result in 
> the
> +* publication having schema and same schema's table in the 
> publication.
> +*/
> +   if (stmt->objectType == OBJECT_TABLE)
> +   {
> +   ListCell   *lc;
> +   List   *schemaPubids = GetSchemaPublications(nspOid);
> +   foreach(lc, schemaPubids)
> +   {
> +   Oid pubid = lfirst_oid(lc);
> +   if (list_member_oid(GetPublicationRelations(pubid, 
> PUBLICATION_PART_ALL),
> +   relid))
> +   ereport(ERROR,
>
> How about we check this case like the following ?
>
> List   *schemaPubids = GetSchemaPublications(nspOid);
> List   *relPubids = GetRelationPublications(RelationGetRelid(rel));
> if (list_intersection(schemaPubids, relPubids))
> ereport(ERROR, ...

Modified it with slight changes without using list_intersection.

These comments are handled in the v35 version patch attached at [1]
[1] - 
https://www.postgresql.org/message-id/CALDaNm2yJOEPCqR%3DgTMEwveJujH9c9_z4LhKmk2T3vZH7T1DLQ%40mail.gmail.com

Regards,
VIgnesh




Re: Logical replication keepalive flood

2021-09-30 Thread Kyotaro Horiguchi
At Thu, 30 Sep 2021 17:08:35 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 30 Sep 2021 17:21:03 +1000, Greg Nancarrow  
> wrote in 
> > Actually, with the patch applied, I find that "make check-world" fails
> > (006_logical_decoding, test 7).
> 
> Mmm..
> 
> t/006_logical_decoding.pl .. 4/14 
> #   Failed test 'pg_recvlogical acknowledged changes'
> #   at t/006_logical_decoding.pl line 117.
> #  got: 'BEGIN
> # table public.decoding_test: INSERT: x[integer]:5 y[text]:'5''
> # expected: ''
> 
> I'm not sure what the test is checking for now, though.

It's checking that endpos works correctly? The logical decoded WALs
looks like this.

0/1528F10|table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
0/15290F8|table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
0/1529138|table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
0/1529178|table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
0/15291E8|COMMIT 709
0/15291E8|BEGIN 710
0/15291E8|table public.decoding_test: INSERT: x[integer]:5 y[text]:'5'
0/1529228|table public.decoding_test: INSERT: x[integer]:6 y[text]:'6'

The COMMIT and BEGIN shares the same LSN, which I don't understand how come.

The previous read by pg_recvlocal prceeded upto the COMMIT record. and
the following command runs after that behaves differently.

pg_recvlogical -S test_slot --dbname postgres --endpos '0/15291E8' -f - 
--no-loop --start

Before the patch it ends before reading a record, and after the patch
it reads into the "table ..." line.  pg_recvlogical seems using the
endpos as the beginning of the last record. In that meaning the three
lines (COMMIT 709/BEGIN 710/table ...'5') are falls under the end of
data.

The difference seems coming from the timing keepalive
comes. pg_recvlogical checks the endpos only when keepalive comes.  In
other words, it needs keepalive for every data line so that it stops
exactly at the specified endpos.

1. Is it the correct behavior that the three data lines share the same
  LSN?  I think BEGIN and the next line should do, but COMMIT and next
  BEGIN shouldn't.

2. Is it the designed behavior that pg_recvlogical does check endpos
  only when a keepalive comes? If it is the correct behavior, we
  shouldn't avoid the keepalive flood.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pgsql: Document XLOG_INCLUDE_XID a little better

2021-09-30 Thread Amit Kapila
On Wed, Sep 29, 2021 at 8:50 PM Robert Haas  wrote:
>
> On Tue, Sep 21, 2021 at 6:48 PM Alvaro Herrera  
> wrote:
> > Document XLOG_INCLUDE_XID a little better
> >
> > I noticed that commit 0bead9af484c left this flag undocumented in
> > XLogSetRecordFlags, which led me to discover that the flag doesn't
> > actually do what the one comment on it said it does.  Improve the
> > situation by adding some more comments.
> >
> > Backpatch to 14, where the aforementioned commit appears.
>
> I'm not sure that saying something is a "hack" is really all that
> useful as documentation.
>
> But more to the point, I think this hack is ugly and needs to be
> replaced with something less hacky.
>

I think we can do better than using XLOG_INCLUDE_XID flag in the
record being inserted. We need this flag so that we can mark
SubTransaction assigned after XLogInsertRecord()  is successful. We
can instead output a flag (say sub_xact_assigned) from
XLogRecordAssemble() and pass it to XLogInsertRecord(). Then in
XLogInsertRecord(), we can mark SubTransactionAssigned once the record
is inserted (after or before calling
MarkCurrentTransactionIdLoggedIfAny()).

The other idea could be that in XLogInsertRecord(), we check
IsSubTransactionAssignmentPending() after the record is successfully
inserted and then mark SubTransaction assigned but I think the
previous one is better.

What do you think?

-- 
With Regards,
Amit Kapila.




non-superusers are allowed to drop the replication user, but are not allowed to alter or even create them, is that ok?

2021-09-30 Thread Ashutosh Sharma
Hi All,

While working on one of the internal projects I noticed that currently in
Postgres, we do not allow normal users to alter attributes of the
replication user. However we do allow normal users to drop replication
users or to even rename it using the alter command. Is that behaviour ok?
If yes, can someone please help me understand how and why this is okay.

Here is an example illustrating this behaviour:

supusr@postgres=# create user repusr with password 'repusr' replication;
CREATE ROLE

supusr@postgres=# create user nonsu with password 'nonsu' createrole
createdb;
CREATE ROLE

supusr@postgres=# \c postgres nonsu;
You are now connected to database "postgres" as user "nonsu".

nonsu@postgres=> alter user repusr nocreatedb;
ERROR:  42501: must be superuser to alter replication roles or change
replication attribute

nonsu@postgres=> alter user repusr rename to refusr;
ALTER ROLE

nonsu@postgres=> drop user refusr;
DROP ROLE

nonsu@postgres=> create user repusr2 with password 'repusr2' replication;
ERROR:  42501: must be superuser to create replication users

--
With Regards,
Ashutosh Sharma.


Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS

2021-09-30 Thread Jelte Fennema
Previously successfully opened TCP connections can still fail on reads
with ETIMEDOUT. This should be considered a connection failure, so that
the connection in libpq is marked as CONNECTION_BAD. The reason I got an
ETIMEDOUT was, because I had set a low tcp_user_timeout in the
connection string. However, it can probably also happen due to
keepalive limits being reached.

0001-Consider-ETIMEDOUT-a-connection-failure.patch
Description: 0001-Consider-ETIMEDOUT-a-connection-failure.patch


Re: Logical replication keepalive flood

2021-09-30 Thread Greg Nancarrow
On Thu, Sep 30, 2021 at 6:08 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 30 Sep 2021 17:21:03 +1000, Greg Nancarrow  
> wrote in
> > Actually, with the patch applied, I find that "make check-world" fails
> > (006_logical_decoding, test 7).
>
> Mmm..
>
> t/006_logical_decoding.pl .. 4/14
> #   Failed test 'pg_recvlogical acknowledged changes'
> #   at t/006_logical_decoding.pl line 117.
> #  got: 'BEGIN
> # table public.decoding_test: INSERT: x[integer]:5 y[text]:'5''
> # expected: ''
>
> I'm not sure what the test is checking for now, though.
>

I think it's trying to check that pg_recvlogical doesn't read "past" a
specified "--endpos" LSN. The test is invoking pg_recvlogical with the
same --endpos LSN value multiple times.
After first getting the LSN (to use for the --endpos value) after 4
rows are inserted, some additional rows are inserted which the test
expects pg_recvlogical won't read because it should't read past
--endpos.
Problem is, the test seems to be relying on a keepalive between the
WAL record of the first transaction and the WAL record of the next
transaction.
As Amit previously explained on this thread "I think here the reason
is that the first_lsn of a transaction is always equal to end_lsn of
the previous transaction (See comments
above first_lsn and end_lsn fields of ReorderBufferTXN)."
When the patch is applied, pg_recvlogical doesn't read a keepalive
when it is invoked with the same --endpos for a second time here, and
it ends up reading the first WAL record for the next transaction
(those additional rows that the test expects it won't read).


Regards,
Greg Nancarrow
Fujitsu Australia




Re: rand48 replacement

2021-09-30 Thread Thomas Munro
On Thu, Sep 30, 2021 at 9:23 PM Fabien COELHO  wrote:
> The missing symbol is really defined in common/pg_prng.c which AFAICT is
> linked with postgres.
>
> If someone experienced with the windows compilation chain could give a
> hint of what is needed, I'd appreciate it!

I guess the declaration needs PGDLLIMPORT.




Re: On login trigger: take three

2021-09-30 Thread Daniel Gustafsson
> On 30 Sep 2021, at 04:15, Greg Nancarrow  wrote:
> 
> On Wed, Sep 29, 2021 at 10:14 PM Teodor Sigaev  wrote:
>> 
>> Nice feature, but, sorry, I see some design problem in suggested feature. 
>> AFAIK,
>> there is two use cases for this feature:
>> 1 A permission / prohibition to login some users
>> 2 Just a logging of facts of user's login
>> 
>> Suggested patch proposes prohibition of login only by failing of login 
>> trigger
>> and it has at least two issues:
>> 1 In case of prohibition to login, there is no clean way to store information
>> about unsuccessful  login. Ok, it could be solved by dblink module but it 
>> seems
>> to scary.
> 
> It's an area that could be improved, but the patch is more intended to
> allow additional actions on successful login, as described by the
> following (taken from the doc updates in the patch):
> 
> +   
> +The event trigger on the login event can be
> +useful for logging user logins, for verifying the connection and
> +assigning roles according to current circumstances, or for some
> session data
> +initialization.
> +   

Running user code with potential side effects on unsuccessful logins also open
up the risk for (D)DoS attacks.

--
Daniel Gustafsson   https://vmware.com/





Re: rand48 replacement

2021-09-30 Thread Fabien COELHO



Attached v15 also does call srandom if it is there, and fixes yet another 
remaining random call.


I think that I have now removed all references to "random" from pg source. 
However, the test still fails on windows, because the linker does not find 
a global variable when compiling extensions, but it seems to find the 
functions defined in the very same file...


Link:
 4130  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64\link.exe /ERRORREPORT:QUEUE 
/OUT:".\Release\tablefunc\tablefunc.dll" /INCREMENTAL:NO /NOLOGO Release/postgres/postgres.lib kernel32.lib user32.lib gdi32.lib 
winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /NODEFAULTLIB:libc 
/DEF:"./Release/tablefunc/tablefunc.def" /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /manifest:embed 
/DEBUG /PDB:".\Release\tablefunc\tablefunc.pdb" /SUBSYSTEM:CONSOLE /STACK:"4194304" /TLBID:1 /DYNAMICBASE:NO /NXCOMPAT 
/IMPLIB:"Release/tablefunc/tablefunc.lib" /MACHINE:X64 /ignore:4197 /DLL .\Release\tablefunc\win32ver.res
 4131  .\Release\tablefunc\tablefunc.obj
 4132 Creating library Release/tablefunc/tablefunc.lib and object 
Release/tablefunc/tablefunc.exp
 4133 tablefunc.obj : error LNK2001: unresolved external symbol 
pg_global_prng_state [C:\projects\postgresql\tablefunc.vcxproj]
 4134 .\Release\tablefunc\tablefunc.dll : fatal error LNK1120: 1 unresolved 
externals [C:\projects\postgresql\tablefunc.vcxproj]
 4135 Done Building Project "C:\projects\postgresql\tablefunc.vcxproj" (default 
targets) -- FAILED.

The missing symbol is really defined in common/pg_prng.c which AFAICT is 
linked with postgres.


If someone experienced with the windows compilation chain could give a 
hint of what is needed, I'd appreciate it!


--
Fabien.




Re: Logical replication keepalive flood

2021-09-30 Thread Kyotaro Horiguchi
At Thu, 30 Sep 2021 17:21:03 +1000, Greg Nancarrow  wrote 
in 
> Actually, with the patch applied, I find that "make check-world" fails
> (006_logical_decoding, test 7).

Mmm..

t/006_logical_decoding.pl .. 4/14 
#   Failed test 'pg_recvlogical acknowledged changes'
#   at t/006_logical_decoding.pl line 117.
#  got: 'BEGIN
# table public.decoding_test: INSERT: x[integer]:5 y[text]:'5''
# expected: ''

I'm not sure what the test is checking for now, though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Failed transaction statistics to measure the logical replication progress

2021-09-30 Thread Osumi , Takamichi/大墨 昂道
On Thursday, September 30, 2021 1:15 PM Amit Kapila  
wrote:
> On Thu, Sep 30, 2021 at 8:22 AM Hou, Zhijie/侯 志杰
>  wrote:
> >
> > On Tues, Sep 28, 2021 6:05 PM Amit Kapila 
> wrote:
> > Adding a new view seems resonalble, but it will bring another
> > subscription related view which might be too much. OTOH, I can see
> > there are already some different views[1] including xact stat, maybe adding
> another one is accepatble ?
> These all views are related to untransmitted to the collector but what we 
> really
> need is a view similar to pg_stat_archiver or pg_stat_bgwriter which gives
> information about background workers.
> Now, the problem as I see is if we go that route then pg_stat_subscription 
> will
> no longer remain dynamic view and one might consider that as a compatibility
> break. The other idea I have shared is that we display these stats under the 
> new
> view introduced by Sawada-San's patch [1] and probably rename that view as
> pg_stat_subscription_worker where all the stats (xact info and last failure
> information) about each worker will be displayed.
Sorry, all the stats in pg_stat_subscription_worker view ?

There was a discussion that
the xact info should be displayed from pg_stat_subscription
with existing stats in the same (which will be changed to persist),
but when your above proposal comes true,
the list of pg_stat_subscription_worker's columns
will be something like below (when I list up major columns).

- subid, subrelid and some other relation attributes required
- 5 stats values moved from pg_stat_subscription
  received_lsn, last_msg_send_time, last_msg_receipt_time,
  latest_end_lsn, latest_end_time
- xact stats
  xact_commit, xact_commit_bytes,
  xact_error, xact_error_bytes,
  xact_abort, xact_abort_bytes,
- error stats
  datname, command, xid, failure_source,
  failure_count, last_failure, last_failure_message, etc

If this is what you imagined,
what we can do from left values of pg_stat_subscription
would be only finding subscription worker's process id.
Is it OK and is this view's alignment acceptable ?


Best Regards,
Takamichi Osumi



  1   2   >