Re: Logical replication keepalive flood

2021-09-18 Thread Amit Kapila
On Fri, Sep 17, 2021 at 3:03 PM Greg Nancarrow  wrote:
>
> On Thu, Sep 16, 2021 at 10:36 PM Amit Kapila  wrote:
> >
> > I think here the reason is that the first_lsn of a transaction is
> > always equal to end_lsn of the previous transaction (See comments
> > above first_lsn and end_lsn fields of ReorderBufferTXN).
>
> That may be the case, but those comments certainly don't make this clear.
>
> >I have not
> > debugged but I think in StreamLogicalLog() the cur_record_lsn after
> > receiving 'w' message, in this case, will be equal to endpos whereas
> > we expect to be greater than endpos to exit. Before the patch, it will
> > always get the 'k' message where we expect the received lsn to be
> > equal to endpos to conclude that we can exit. Do let me know if your
> > analysis differs?
> >
>
> Yes, pg_recvlogical seems to be relying on receiving a keepalive for
> its "--endpos" logic to work (and the 006 test is relying on '' record
> output from pg_recvlogical in this case).
> But is it correct to be relying on a keepalive for this?
>

I don't think this experiment/test indicates that pg_recvlogical's
"--endpos" relies on keepalive. It would just print the records till
--endpos and then exit. In the test under discussion, as per
confirmation by Hou-San, the BEGIN record received has the same LSN as
--endpos, so it would just output that and exit which is what is
mentioned in pg_recvlogical docs as well (If there's a record with LSN
exactly equal to lsn, the record will be output).

I think here the test case could be a culprit. In the original commit
eb2a6131be [1], where this test of the second time using
pg_recvlogical was added there were no additional Inserts (# Insert
some rows after $endpos, which we won't read.) which were later added
by a different commit 8222a9d9a1 [2]. I am not sure if the test added
by commit [2] was a good idea. It seems to be working due to the way
keepalives are being sent but otherwise, it can fail as per the
current design of pg_recvlogical.

[1]:
commit eb2a6131beccaad2b39629191508062b70d3a1c6
Author: Simon Riggs 
Date:   Tue Mar 21 14:04:49 2017 +

Add a pg_recvlogical wrapper to PostgresNode

[2]:
commit 8222a9d9a12356349114ec275b01a1a58da2b941
Author: Noah Misch 
Date:   Wed May 13 20:42:09 2020 -0700

In successful pg_recvlogical, end PGRES_COPY_OUT cleanly.

-- 
With Regards,
Amit Kapila.




Improved PostgreSQL Mathematics Support.

2021-09-18 Thread A Z
I have been trying to get a reply or interest in either updating
PostgreSQL to support the following, or for there to be a public,
free for any use Extension put out there, that will support the following:



# High Precision Numeric and Elementary Functions Support. #


-Integer (HPZ) Z, or Rational Decimal Q (HPQ) numbers support.

-Recurring Rational Numbers and recurring Irrational Numbers can be 
appropriately
truncated, by a precision value, to obtain an approximating value.  The latter
phenomenon is a finite Rational value, possibly with integer and/or decimal 
parts at the
same time.  These may be positive or negative, standard number line, values.

-Forward and Inverse operations accuracy, withstanding truncation,
can be maintained by storing and normalising the expression behind a value,
(or just include pointers to the value) and displaying the evaluation.
This system will uphold any precision.

-A defaulting number of significant figures (precision), in one copy of one 
field
in memory that is held in there, as a filter, for all HPZ and HPQ numbers.
For example, 20 significant figures, as a default, to start by.

-A function that varies the precision filter for every HPZ and HPQ number at 
once.

-Value assignment to a typed variable by =.

-Base 10 Arithmetic and comparisons support on Base 10 Integer and Rational 
Decimal numbers.
+,-,*,/,%,^,=,!=,<>,>,<,>=,<=, ::
These include full finite division and integer only division, with no remainder.
The defaulting ability of numbers data in lower types to automatically be casted
up to HPZ or HPQ, where specified and occuring in PostgreSQL code.

-Reified support with broader syntax and operations within PostgreSQL, in all 
the obvious
and less than obvious places.  Tables and related phenomena, Indexing, the 
Window type,
Record type, direct compatability with Aggregate and Window Functions, the 
Recursive keyword,
are all parts of a larger subset that may re-interact with HPZ or HPQ.

#

-Mathematical and Operational functions support:

precision(BIGINT input)

cast(TEXT as HPZ) returns HPZ;
cast(TEXT as HPQ) returns HPQ;
cast(HPQ as TEXT) returns TEXT;
cast(HPZ as TEXT) returns TEXT;
cast(HPZ as HPQ) returns HPQ;
cast(HPQ as HPZ) returns HPZ;
cast(HPZ as SMALLINT) returns SMALLINT;
cast(SMALLINT as HPQ) returns HPZ;
cast(HPZ as INTEGER) returns INTEGER;
cast(INTEGER as HPZ) returns HPZ;
cast(HPZ as BIGINT) returns BIGINT;
cast(BIGINT as HPZ) returns HPZ;
cast(HPQ as REAL) returns REAL;
cast(REAL as HPQ) returns HPQ
cast(DOUBLE PRECISION as HPQ) returns HPQ;
cast(HPQ as DOUBLE PRECISION) returns DOUBLE PRECISION;
cast(HPQ as DECIMAL) returns DECIMAL;
cast(DECIMAL as HPQ) returns HPQ;

sign(HPQ input) returns HPQ;
abs(HPQ input) returns HPQ;
ceil(HPQ input) returns HPQ;
floor(HPQ input) returns HPQ;
round(HPQ input) returns HPZ;
pi() returns HPQ;
e() returns HPQ;
power(HPQ base, HPQ exponent) returns HPQ;
sqrt(HPQ input) returns HPQ
nroot(HPZ theroot, HPQ input) returns HPQ;
log10(HPQ input) returns HPQ;
loge(HPQ input) returns HPQ;
log2(HPQ input) returns HPQ;
factorial(HPZ input) returns HPZ;
nCr(HPZ objects, HPZ selectionSize) returns HPZ
nPr(HPZ objects, HPZ selectionSize) returns HPZ

degrees(HPQ input) returns HPQ;
radians(HPQ input) returns HPQ;
sind(HPQ input) returns HPQ;
cosd(HPQ input) returns HPQ;
tand(HPQ input) returns HPQ;
asind(HPQ input) returns HPQ;
acosd(HPQ input) returns HPQ;
atand(HPQ input) returns HPQ;
sinr(HPQ input) returns HPQ;
cosr(HPQ input) returns HPQ;
tanr(HPQ input) returns HPQ;
asinr(HPQ input) returns HPQ;
acosr(HPQ input) returns HPQ;
atanr(HPQ input) returns HPQ;

##

-Informative articles on all these things exist at:
Comparison Operators: https://en.wikipedia.org/wiki/Relational_operator
Floor and Ceiling Functions: 
https://en.wikipedia.org/wiki/Floor_and_ceiling_functions
Arithmetic Operations: https://en.wikipedia.org/wiki/Arithmetic
Integer Division: 
https://en.wikipedia.org/wiki/Division_(mathematics)#Of_integers
Modulus Operation: https://en.wikipedia.org/wiki/Modulo_operation
Rounding (Commercial Rounding): https://en.wikipedia.org/wiki/Rounding
Factorial Operation: https://en.wikipedia.org/wiki/Factorial
Degrees: https://en.wikipedia.org/wiki/Degree_(angle)
Radians: https://en.wikipedia.org/wiki/Radian
Elementary Functions: https://en.wikipedia.org/wiki/Elementary_function

-Ease of installation support. Particularly for Windows and Linux. *.exe, *.msi 
or *.rpm, *.deb, *.bin installers.
With a PostgreSQL standard installation.  Installation and Activation 
instructions included.

The following chart could be used to help test trigonometry outputs, under
Further Condsideration of the Unit Circle:

Re: POC: Cleaning up orphaned files using undo logs

2021-09-18 Thread Amit Kapila
On Fri, Sep 17, 2021 at 9:50 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote:
>
> * What happened with the idea of abandoning discard worker for the sake
>   of simplicity? From what I see limiting everything to foreground undo
>   could reduce the core of the patch series to the first four patches
>   (forgetting about test and docs, but I guess it would be enough at
>   least for the design review), which is already less overwhelming.
>

I think the discard worker would be required even if we decide to
apply all the undo in the foreground. We need to forget/remove the
undo of committed transactions as well which we can't remove
immediately after the commit.

-- 
With Regards,
Amit Kapila.




Re: Logical replication timeout problem

2021-09-18 Thread Amit Kapila
On Fri, Sep 17, 2021 at 8:08 PM Fabrice Chapuis  wrote:
>
> the publisher and the subscriber run on the same postgres instance.
>

Okay, but there is no log corresponding to operations being performed
by the publisher. By looking at current logs it is not very clear to
me what might have caused this. Did you try increasing
wal_sender_timeout and wal_receiver_timeout?

-- 
With Regards,
Amit Kapila.




Re: So, about that cast-to-typmod-minus-one business

2021-09-18 Thread DEVOPS_WwIT
+1 backporting

Tony

On 2021/9/19 01:06, Tom Lane wrote:
> We had left it as an open issue whether or not to risk back-patching
> 5c056b0c2 into stable branches [1].  While updating the v14 release notes,
> I realized that we can't put off that decision any longer, because we
> have to decide now whether to document that as a new behavior in v14.
>
> I'm inclined to back-patch, since nobody has complained about this
> in 14beta3.  Thoughts?
>
>   regards, tom lane
>
> [1] 
> https://www.postgresql.org/message-id/flat/CABNQVagu3bZGqiTjb31a8D5Od3fUMs7Oh3gmZMQZVHZ%3DuWWWfQ%40mail.gmail.com




Undocumented AT TIME ZONE INTERVAL syntax

2021-09-18 Thread Corey Huinker
In reviewing Paul's application period patch, I noticed some very curious
syntax in the test cases. I learned that Paul is equally confused by it,
and has asked about it in his PgCon 2020 presentation

> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE;
  timezone
-
 2018-03-04 05:02:00
(1 row)

Searching around, I found several instances of this syntax being used
[1][2][3], but with one important clarifying difference: the expected
syntax was

> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2:00' HOUR TO MINUTE;
  timezone
-
 2018-03-04 07:00:00
(1 row)

Now I understand that the user probably meant to do this:

# SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR;
  timezone
-
 2018-03-04 07:00:00
(1 row)

But none of this is in our own documentation.

Before I write a patch to add this to the documentation, I'm curious what
level of sloppiness we should tolerate in the interval calculation. Should
we enforce the time string to actually conform to the format laid out in
the X TO Y spec? If we don't require that, is it correct to say that the
values will be filled from order of least significance to greatest?

[1]
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/DataTypes/Date-Time/TIMESTAMPATTIMEZONE.htm
[2]
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/aWY6mGNJ5CYJlSDrvgDQag
[3]
https://community.snowflake.com/s/question/0D50Z9AqIaSSAV/is-it-possible-to-add-an-interval-of-5-hours-to-the-session-timezone-


Re: SQL:2011 application time

2021-09-18 Thread Corey Huinker
In IBM DB2 you can only have one because application-time periods must
> be named "business_time" (not joking).
>

I saw that as well, and it made me think that someone at IBM is a fan of
Flight Of The Conchords.


> Personally I feel like it's a weird limitation and I wouldn't mind
> supporting more, but my current implementation only allows for one,
> and I'd have to rethink some things to do it differently.
>

I'm satisfied that it's not something we need to do in the first MVP.


>
> Yes. Even though the name "SYSTEM_TIME" is technically enough, I'd
> still include a pertype column to make distinguishing system vs
> application periods easier and more obvious.
>

SYSTEM_TIME seems to allow for DATE values in the start_time and end_time
fields, though I cannot imagine how that would ever be practical, unless it
were somehow desirable to reject subsequent updates within a 24 hour
timeframe. I have seen instances where home-rolled application periods used
date values, which had similar problems where certain intermediate updates
would simply have to be discarded in favor of the one that was still
standing at midnight.


>
> > 2. The system versioning effort has chosen 'infinity' as their end-time
> value, whereas you have chosen NULL as that makes sense for an unbounded
> range. Other databases seem to leverage '-12-31 23:59:59' (SQLServer,
> IIRC) whereas some others seem to used '2999-12-31 23:59:59' but those
> might have been home-rolled temporal implementations. To further add to the
> confusion, the syntax seems to specify the keyword of MAXVALUE, which
> further muddies things. The system versioning people went with 'infinity'
> seemingly because it prescribe and end to the world like SQLServer did, but
> also because it allowed for a primary key based on (id, endtime) and that's
> just not possible with NULL endtime values.
>
> I think it's a little weird that our system-time patch mutates your
> primary key. None of the other RDMBSes do that. I don't think it's
> incompatible (as long as the system time patch knows how to preserve
> the extra period/range data in an application-time temporal key), but
> it feels messy to me.
>

Per outline below, I'm proposing an alternate SYSTEM_TIME implementation
that would leave the PK as-is.


> I would prefer if system-time and application-time used the same value
> to mean "unbounded". Using null means we can support any type (not
> just types with +-Infinity). And it pairs nicely with range types. If
> the only reason for system-time to use Infinity is the primary key, I
> think it would be better not to mutate the primary key (and store the
> historical records in a separate table as other RDMSes do).
>

The two  "big wins" of infinity seemed (to me) to be:

1. the ability to add "AND end_time = 'infinity'" as a cheap way to get
current rows
2. clauses like "WHERE CURRENT_DATE - 3 BETWEEN start_time AND end_time"
would work. Granted, there's very specific new syntax to do that properly,
but you know somebody's gonna see the columns and try to do it that way.


>
> Btw Oracle also uses NULL to mean "unbounded".
>

Huh, I missed that one. That is good in that it gives some precedence to
how you've approached it.


>
> We presently forbid PKs from including expressions, but my patch lifts
> that exception so it can index a rangetype expression built from the
> period start & end columns. So even if we must include the system-time
> end column in a PK, perhaps it can use a COALESCE expression to store
> Infinity even while using NULL to signify "currently true" from a user
> perspective.
>

Either way seems viable, but I understand why you want to leverage ranges
in this way.


>
> > 3. I noticed some inconsistency in the results from various "SELECT *
> FROM portion_of_test" examples. In some, the "valid_at" range is shown but
> not columns that make it up, and in some others, the "valid_from" and
> "valid_to" columns are shown, with no mention of the period. From what I've
> seen, the period column should be invisible unless invoked, like ctid or
> xmin.
>
> In most cases the tests test the same functionality with both PERIODs
> and rangetype columns. For FKs they test all four combinations of
> PERIOD-referencing-PERIOD, PERIOD-referencing-range,
> range-referencing-PERIOD, and range-referencing-range. If valid_at is
> a genuine column, it is included in SELECT *, but not if it is a
> PERIOD.
>

Ok, I'll have to look back over the test coverage to make sure that I
understand the behavior now.


>
> > 4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2'  HOUR TO MINUTE
> simply confounded me.
>
> Me too! I have no idea what that is supposed to mean. But that
> behavior predates my patch. I only had to deal with it because it
> creates a shift-reduce conflict with `FOR PORTION OF valid_at FROM x
> TO y`, where x & y are expressions. I asked about this syntax at my
> PgCon 2020 talk, but I haven't ever received an answer. Perhaps
> someone else knows 

Re: WIP: System Versioned Temporal Table

2021-09-18 Thread Corey Huinker
>
>
>
> 1. Much of what I have read about temporal tables seemed to imply or
> almost assume that system temporal tables would be implemented as two
> actual separate tables. Indeed, SQLServer appears to do it that way [1]
> with syntax like
>
> WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory));
>
>
> Q 1.1. Was that implementation considered and if so, what made this
> implementation more appealing?
>
>
I've been digging some more on this point, and I've reached the conclusion
that a separate history table is the better implementation. It would make
the act of removing system versioning into little more than a DROP TABLE,
plus adjusting the base table to reflect that it is no longer system
versioned.

What do you think of this method:

1. The regular table remains unchanged, but a pg_class attribute named
"relissystemversioned" would be set to true
2. I'm unsure if the standard allows dropping a column from a table while
it is system versioned, and the purpose behind system versioning makes me
believe the answer is a strong "no" and requiring DROP COLUMN to fail
on relissystemversioned = 't' seems pretty straightforward.
3. The history table would be given a default name of $FOO_history (space
permitting), but could be overridden with the history_table option.
4. The history table would have relkind = 'h'
5. The history table will only have rows that are not current, so it is
created empty.
6. As such, the table is effectively append-only, in a way that vacuum can
actually leverage, and likewise the fill factor of such a table should
never be less than 100.
7. The history table could only be updated only via system defined triggers
(insert,update,delete, alter to add columns), or row migration similar to
that found in partitioning. It seems like this would work as the two tables
working as partitions of the same table, but presently we can't have
multi-parent partitions.
8. The history table would be indexed the same as the base table, except
that all unique indexes would be made non-unique, and an index of pk +
start_time + end_time would be added
9. The primary key of the base table would remain the existing pk vals, and
would basically function normally, with triggers to carry forth changes to
the history table. The net effect of this is that the end_time value of all
rows in the main table would always be the chosen "current" value
(infinity, null, -12-31, etc) and as such might not actually _need_ to
be stored.
10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use
FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table
directly with no quals to add.
11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS
OF CURRENT_TIMESTAMP, then the query would do a union of the base table and
the history table with quals applied to both.
12. It's a fair question whether the history table would be something that
could be queried directly. I'm inclined to say no, because that allows for
things like SELECT FOR UPDATE, which of course we'd have to reject.
13. If a history table is directly referenceable, then SELECT permission
can be granted or revoked as normal, but all insert/update/delete/truncate
options would raise an error.
14. DROP SYSTEM VERSIONING from a table would be quite straightforward -
the history table would be dropped along with the triggers that reference
it, setting relissystemversioned = 'f' on the base table.

I think this would have some key advantages:

1. MVCC bloat is no worse than it was before.
2. No changes whatsoever to referential integrity.
3. DROP SYSTEM VERSIONING becomes an O(1) operation.

Thoughts?

I'm going to be making a similar proposal to the people doing the
application time effort, but I'm very much hoping that we can reach some
consensus and combine efforts.


Re: So, about that cast-to-typmod-minus-one business

2021-09-18 Thread Dean Rasheed
On Sat, 18 Sep 2021, 18:06 Tom Lane,  wrote:

>
> I'm inclined to back-patch
>

+1

Regards,
Dean

>


Re: PG 14 release notes, first draft

2021-09-18 Thread Justin Pryzby
On Sat, Sep 18, 2021 at 05:15:39PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > I think the release notes for the autovacuum item (which was first reverted 
> > and
> > then partially un-reverted) should say something like "Partitioned tables 
> > are
> > now included in pg_stat_all_tables":
> > | e1efc5b465 Keep stats up to date for partitioned tables
> 
> Hmm.  If I'm reading the commit message properly, the actual change there
> is not that, but that analyze count and last analyze time are now tracked
> correctly for partitioned tables.  Might be worth mentioning, not sure.

The reverted patch to autoanalyze included partitioned tables in
pg_stat_all_tables, and the revert specifically avoided changing that (to avoid
a catbump).  But last_analyzed and analyze_count were always shown as "0".  So
the e1 commit addresses that by tracking that information and showing correct
value instead of always 0.

The relevant portion starts here:
https://www.postgresql.org/message-id/flat/202108161700.d4eh6a7n2lki%40alvherre.pgsql#b2e426eb19dbbddee0adf9bb1bcbbcf1

I suggest that this *should* be included in the release notes, since I
specifically requested that partitioned tables be included in 2018.

> In 20200418050815(dot)GE26953(at)telsasoft(dot)com I wrote:
> |This patch includes partitioned tables in pg_stat_*_tables, which is great; I
> |complained awhile ago that they were missing [0].  It might be useful if that
> |part was split out into a separate 0001 patch (?).
> | [0] 
> https://www.postgresql.org/message-id/20180601221428.GU5164%40telsasoft.com

Also, I've patched my analyze script to use that field (same as for
nonpartitioned tables) rather than needing to do a subquery involving
max(last_analyzed) of the partitions.  Since it's still needed to manually
analyze parent tables.

-- 
Justin




Re: PG 14 release notes, first draft

2021-09-18 Thread Tom Lane
Justin Pryzby  writes:
> I think the release notes for the autovacuum item (which was first reverted 
> and
> then partially un-reverted) should say something like "Partitioned tables are
> now included in pg_stat_all_tables":
> | e1efc5b465 Keep stats up to date for partitioned tables

Hmm.  If I'm reading the commit message properly, the actual change there
is not that, but that analyze count and last analyze time are now tracked
correctly for partitioned tables.  Might be worth mentioning, not sure.

> Remove some internal question/marks:
> ACCURATE?
> REMOVE?
> ALTER TABLE ... ALTER COLUMN ... TYPE RESETS STASTISTICS? (was never intended 
> to have markup added?)

Did that in the copy-editing I just pushed.

> Also, I'm not sure if this one should be included.
> 9a4c0e36fb Dump ALTER TABLE ... ATTACH PARTITION as a separate ArchiveEntry.
> See: 
> https://www.postgresql.org/message-id/flat/20210830154249.gx26...@telsasoft.com#9ad6fd4c36e13deea1c5f92f5114320e

Probably not worth listing --- the amount of explanation needed seems
to outweigh the probability of users caring.

> What about this ?
> ce6b662aae psql: Fix name quoting on extended statistics

Seems way too minor to bother with here.

regards, tom lane




Re: PG 14 release notes, first draft

2021-09-18 Thread Justin Pryzby
I think the release notes for the autovacuum item (which was first reverted and
then partially un-reverted) should say something like "Partitioned tables are
now included in pg_stat_all_tables":
| e1efc5b465 Keep stats up to date for partitioned tables

Remove some internal question/marks:
ACCURATE?
REMOVE?
ALTER TABLE ... ALTER COLUMN ... TYPE RESETS STASTISTICS? (was never intended 
to have markup added?)

Also, I'm not sure if this one should be included.
9a4c0e36fb Dump ALTER TABLE ... ATTACH PARTITION as a separate ArchiveEntry.
See: 
https://www.postgresql.org/message-id/flat/20210830154249.gx26...@telsasoft.com#9ad6fd4c36e13deea1c5f92f5114320e

What about this ?
ce6b662aae psql: Fix name quoting on extended statistics

-- 
Justin




Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade

2021-09-18 Thread Alvaro Herrera
On 2021-Sep-18, Alexander Korotkov wrote:

> I see now.  I think I'm rather favoring splitting visibilitymap.h.

Agreed, this looks sane to me.  However, I think the
VM_ALL_{VISIBLE,FROZEN} macros should remain in visibilitymap.h, since
they depend on the visibilitymap_get_status function (and pg_upgrade
doesn't use them).

There's a typo "maros" for "macros" in the new header file.  (Also, why
does the copyright line say "portions" if no portion under another
copyright?  I think we don't say "portions" when there is only one
copyright statement line.)

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: Timeout failure in 019_replslot_limit.pl

2021-09-18 Thread Alvaro Herrera
On 2021-Sep-18, Michael Paquier wrote:

> Could it be possible to rely on a combination of wait events set in WAL
> senders and pg_stat_replication to assume that a WAL sender is in a
> stopped state?

Hmm, sounds a possibly useful idea to explore, but I would only do so if
the other ideas prove fruitless, because it sounds like it'd have more
moving parts.  Can you please first test if the idea of sending the signal
twice is enough?  If that doesn't work, let's try Horiguchi-san's idea
of using some `ps` flags to find the process.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".




Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade

2021-09-18 Thread Alexander Korotkov
Hi,

On Sat, Sep 18, 2021 at 3:06 AM Andres Freund  wrote:
> On 2021-09-18 02:51:09 +0300, Alexander Korotkov wrote:
> > On Tue, Sep 14, 2021 at 6:53 AM Tom Lane  wrote:
> > > Without having looked at the details, I think using a forward-declare
> > > to avoid including relcache.h in visibilitymap.h might be a reasonably
> > > non-painful fix.
> >
> > I like that idea, but I didn't find an appropriate existing header for
> > forward-declaration of Relation.  relation.h isn't suitable, because
> > it includes primnodes.h.  A separate header for just
> > forward-definition of Relation seems too much.
>
> I was just thinking of doing something like the attached.

I see now.  I think I'm rather favoring splitting visibilitymap.h.

> > > TOH, in the long run it might be worth the effort
> > > to split visibilitymap.h to separate useful file-contents knowledge
> > > from backend function declarations.
> >
> > I've drafted a patch splitting visibilitymap_maros.h from
> > visibilitymap.h.  What do you think?
>
> I'd name it visibilitymapdefs.h or such, mostly because that's what other
> headers are named like...

Good point.  The revised patch is attached.

--
Regards,
Alexander Korotkov
From 798431948da5374ad947fc5f00ef3c87b1bb03ba Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Sat, 18 Sep 2021 02:45:41 +0300
Subject: [PATCH] Split macros from visibilitymap.h into a separate header

That allows to include just visibilitymapdefs.h from file.c, and in turn
remove include of postgres.h from relcache.h.
---
 src/bin/pg_upgrade/file.c  |  2 +-
 src/include/access/visibilitymap.h | 16 +
 src/include/access/visibilitymapdefs.h | 31 ++
 src/include/utils/relcache.h   |  1 -
 4 files changed, 33 insertions(+), 17 deletions(-)
 create mode 100644 src/include/access/visibilitymapdefs.h

diff --git a/src/bin/pg_upgrade/file.c b/src/bin/pg_upgrade/file.c
index 9b0cc16e452..1b34ee09fa6 100644
--- a/src/bin/pg_upgrade/file.c
+++ b/src/bin/pg_upgrade/file.c
@@ -19,7 +19,7 @@
 #include 
 #endif
 
-#include "access/visibilitymap.h"
+#include "access/visibilitymapdefs.h"
 #include "common/file_perm.h"
 #include "pg_upgrade.h"
 #include "storage/bufpage.h"
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 57362c36876..3f546667337 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -14,26 +14,12 @@
 #ifndef VISIBILITYMAP_H
 #define VISIBILITYMAP_H
 
+#include "access/visibilitymapdefs.h"
 #include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "utils/relcache.h"
 
-/* Number of bits for one heap page */
-#define BITS_PER_HEAPBLOCK 2
-
-/* Flags for bit map */
-#define VISIBILITYMAP_ALL_VISIBLE	0x01
-#define VISIBILITYMAP_ALL_FROZEN	0x02
-#define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
-			 * flags bits */
-
-/* Macros for visibilitymap test */
-#define VM_ALL_VISIBLE(r, b, v) \
-	((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_VISIBLE) != 0)
-#define VM_ALL_FROZEN(r, b, v) \
-	((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0)
-
 extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 Buffer vmbuf, uint8 flags);
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
new file mode 100644
index 000..01d0611bb6b
--- /dev/null
+++ b/src/include/access/visibilitymapdefs.h
@@ -0,0 +1,31 @@
+/*-
+ *
+ * visibilitymapdefs.h
+ *		maros for accessing contents of visibility map pages
+ *
+ *
+ * Portions Copyright (c) 2021, PostgreSQL Global Development Group
+ *
+ * src/include/access/visibilitymapdefs.h
+ *
+ *-
+ */
+#ifndef VISIBILITYMAPDEFS_H
+#define VISIBILITYMAPDEFS_H
+
+/* Number of bits for one heap page */
+#define BITS_PER_HEAPBLOCK 2
+
+/* Flags for bit map */
+#define VISIBILITYMAP_ALL_VISIBLE	0x01
+#define VISIBILITYMAP_ALL_FROZEN	0x02
+#define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
+			 * flags bits */
+
+/* Macros for visibilitymap test */
+#define VM_ALL_VISIBLE(r, b, v) \
+	((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_VISIBLE) != 0)
+#define VM_ALL_FROZEN(r, b, v) \
+	((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0)
+
+#endif			/* VISIBILITYMAPDEFS_H */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index f772855ac69..d2c17575f65 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -14,7 +14,6 @@
 #ifndef RELCACHE_H
 #define RELCACHE_H
 
-#include "postgres.h"
 #include "access/tupdesc.h"
 #include "nodes/bitmapset.h"
 
-- 
2.24.3 (Apple Git-128)



Re: So, about that cast-to-typmod-minus-one business

2021-09-18 Thread Zhihong Yu
On Sat, Sep 18, 2021 at 10:06 AM Tom Lane  wrote:

> We had left it as an open issue whether or not to risk back-patching
> 5c056b0c2 into stable branches [1].  While updating the v14 release notes,
> I realized that we can't put off that decision any longer, because we
> have to decide now whether to document that as a new behavior in v14.
>
> I'm inclined to back-patch, since nobody has complained about this
> in 14beta3.  Thoughts?
>
> regards, tom lane
>
> [1]
> https://www.postgresql.org/message-id/flat/CABNQVagu3bZGqiTjb31a8D5Od3fUMs7Oh3gmZMQZVHZ%3DuWWWfQ%40mail.gmail.com
>
>
> Hi,
+1 to backporting.

Thanks


Re: Release 14 Schedule

2021-09-18 Thread Tom Lane
Andrew Dunstan  writes:
> The Release Management Team (Michael Paquier, Peter Geoghegan and
> myself) in consultation with the release team proposes the following
> release schedule:
> * PostgreSQL 14 Release Candidate 1 (RC1) will be released on September 23, 
> 2021.
> * In the absence of any critical issues, PostgreSQL 14 will become generally 
> available on September 30, 2021.

We don't yet have a list-of-major-features for the v14 release notes.
Anybody care to propose one?

regards, tom lane




So, about that cast-to-typmod-minus-one business

2021-09-18 Thread Tom Lane
We had left it as an open issue whether or not to risk back-patching
5c056b0c2 into stable branches [1].  While updating the v14 release notes,
I realized that we can't put off that decision any longer, because we
have to decide now whether to document that as a new behavior in v14.

I'm inclined to back-patch, since nobody has complained about this
in 14beta3.  Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/CABNQVagu3bZGqiTjb31a8D5Od3fUMs7Oh3gmZMQZVHZ%3DuWWWfQ%40mail.gmail.com




Re: Gather performance analysis

2021-09-18 Thread Tomas Vondra




On 9/8/21 8:05 AM, Dilip Kumar wrote:
On Tue, Sep 7, 2021 at 8:41 PM Tomas Vondra 
mailto:tomas.von...@enterprisedb.com>> 
wrote:


Hi,

The numbers presented in this thread seem very promising - clearly
there's significant potential for improvements. I'll run similar
benchmarks too, to get a better understanding of this.


Thanks for showing interest.


Can you share some basic details about the hardware you used?
Particularly the CPU model - I guess this might explain some of the
results, e.g. if CPU caches are ~1MB, that'd explain why setting
tup_queue_size to 1MB improves things, but 4MB is a bit slower.
Similarly, number of cores might explain why 4 workers perform better
than 8 or 16 workers.


I have attached the output of the lscpu.  I think batching the data 
before updating in the shared memory will win because we are avoiding 
the frequent cache misses and IMHO the benefit will be more in the 
machine with more CPU sockets.


Now, this is mostly expected, but the consequence is that maybe things
like queue size should be tunable/dynamic, not hard-coded?


Actually, my intention behind the tuple queue size was to just see the 
behavior. Do we really have the problem of workers stalling on queue 
while sending the tuple, the perf report showed some load on WaitLatch 
on the worker side so I did this experiment.  I saw some benefits but it 
was not really huge.  I am not sure whether we want to just increase the 
tuple queue size or make it tunable,  but if we want to support 
redistribute operators in future sometime then maybe we should make it 
dynamically growing at runtime, maybe using dsa or dsa + shared files.




Thanks. I ran a couple more benchmarks, with different queue sizes 
(16kB, 64kB, 256kB and 1MB) and according to the results the queue size 
really makes almost no difference. It might make a difference for some 
queries, but I wouldn't bother tuning this until we have a plausible 
example - increasing the queue size is not free either.


So it was worth checking, but I'd just leave it as 64kB for now. We may 
revisit this later in a separate patch/thread.



As for the patches, I think the proposed changes are sensible, but I
wonder what queries might get slower. For example with the batching
(updating the counter only once every 4kB, that pretty much transfers
data in larger chunks with higher latency. So what if the query needs
only a small chunk, like a LIMIT query? Similarly, this might mean the
upper parts of the plan have to wait for the data for longer, and thus
can't start some async operation (like send them to a FDW, or something
like that). I do admit those are theoretical queries, I haven't tried
creating such query.


Yeah, I was thinking about such cases, basically, this design can 
increase the startup cost of the Gather node, I will also try to derive 
such cases and test them.



FWIW I've tried applying both patches at the same time, but there's a
conflict in shm_mq_sendv - not a complex one, but I'm not sure what's
the correct solution. Can you share a "combined" patch?


Actually, these both patches are the same, 
"v1-0001-Optimize-parallel-tuple-send-shm_mq_send_bytes.patch" is the 
cleaner version of the first patch.  For configurable tuple queue size I 
did not send a patch, because that is I just used for the testing 
purpose and never intended to to propose anything.  My most of the 
latest performance data I sent with only 
"v1-0001-Optimize-parallel-tuple-send-shm_mq_send_bytes.patch" and with 
default tuple queue size.


But I am attaching both the patches in case you want to play around.



Ah, silly me. I should have noticed the second patch is just a refined 
version of the first one.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-18 Thread Cameron Murdoch
On Sat, 18 Sep 2021 at 12:57, Thomas Habets  wrote:

>
> But these are two changes:
> 1. Actually verify against a CA
> 2. Actually check the CN/altnames
>
> Anything short of "verify-full" is in my view "not checking". Even with a
> private CA this allows for a lot of lateral movement in an org, as if you
> have one cert you have them all, for impersonation purposes.
>

100% agree. I suspect that many postgres users are not completely aware
that by default their ssl connections do not check the CA or CN/altnames.


> Changing such a default is a big change.
>

Agreed. It is going to break existing installs that rely on the current
behaviour.

There are two defaults to worry about here:

sslmode=prefer
sslrootcert=~/.postgresql/root.crt

Having sslrootcert use the system trust store if ~/.postgresql/root.crt
doesn’t exist would seem like a good change.

Changing sslmode to default to something else would mostly likely break a
ton of existing installations, and there are plenty of use cases were ssl
isn’t used. Trying ssl first and without afterwards probably is still a
sensible default. However…

I haven’t completely through this through, but what if the sslmode=prefer
logic was:

1. Try ssl first, with both CA and CN checking (ie same as verify-full)
2. Print warnings appropriate to what type of ssl connection can be made
3. If all else fails, try without ssl.

In other words start with verify-full and downgrade gracefully to prefer,
but actually tell the user that this has happen.

Essentially sslmode=prefer is a type of opportunistic encryption. I’m
suggesting making it try stronger levels of ssl opportunistically. Require,
verify-ca and verify-full can keep their semantics, or rather, they should
all try verify-full first and then downgrade (with warnings logged) to the
level they actually enforce.

Thanks
C


Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-18 Thread Thomas Habets
On Sat, 18 Sept 2021 at 00:10, Cameron Murdoch  wrote:

> I also agree that the proposed patch is not the right way to go as it is
> essentially the same as verify-full, and I think that the correct fix would
> be to change the default.
>

But these are two changes:
1. Actually verify against a CA
2. Actually check the CN/altnames

Anything short of "verify-full" is in my view "not checking". Even with a
private CA this allows for a lot of lateral movement in an org, as if you
have one cert you have them all, for impersonation purposes.

Changing such a default is a big change. Maybe long term it's worth the
short term pain, though. Long term it'd be the config of least surprise, in
my opinion.
But note that one has to think about all the settings, such that the
default is not more checking than "require", which might also be surprising.

A magic setting of the file to be "system" sounds good for my use cases, at
least.



-- 
typedef struct me_s {
 char name[]  = { "Thomas Habets" };
 char email[] = { "tho...@habets.se " };
 char kernel[]= { "Linux" };
 char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt; };
 char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
 char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;


Re: Timeout failure in 019_replslot_limit.pl

2021-09-18 Thread Michael Paquier
On Fri, Sep 17, 2021 at 08:41:00PM -0700, Noah Misch wrote:
> If this fixes things for the OP, I'd like it slightly better than the "ps"
> approach.  It's less robust, but I like the brevity.
> 
> Another alternative might be to have walreceiver reach walsender via a proxy
> Perl script.  Then, make that proxy able to accept an instruction to pause
> passing data until further notice.  However, I like two of your options better
> than this one.

Could it be possible to rely on a combination of wait events set in WAL
senders and pg_stat_replication to assume that a WAL sender is in a
stopped state?  I would think about something like that in the top of
my mind (perhaps this would need 2 WAL senders, one stopped and one
still running):
1) SIGSTOP WAL sender 1.
2) Check WAL sender 1 is in WalSenderMain.  If not retry 1) after a
SIGCONT.
3) Generate some WAL, and look at pg_stat_replication to see if there
has been some progress in 1), but that 2) is done.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-09-18 Thread Michael Paquier
On Fri, Sep 17, 2021 at 08:12:42AM +, gkokola...@pm.me wrote:
> Yeah, I was considering it to split them over to a separate commit,
> then decided against it. Will do so in the future.

I have been digging into the issue I saw in the TAP tests when closing
a segment, and found the problem.  The way you manipulate
frameInfo.contentSize by just setting it to WalSegSz when *opening*
a segment causes problems on LZ4F_compressEnd(), making the code
throw a ERROR_frameSize_wrong.  In lz4frame.c, the end of
LZ4F_compressEnd() triggers this check and the error:
if (cctxPtr->prefs.frameInfo.contentSize) {
if (cctxPtr->prefs.frameInfo.contentSize != cctxPtr->totalInSize)
return err0r(LZ4F_ERROR_frameSize_wrong);
}

We don't really care about contentSize as long as a segment is not
completed.  Rather than filling contentSize all the time we write
something, we'd better update frameInfo once the segment is
completed and closed.  That would also take take of the error as this
is not checked if contentSize is 0.  It seems to me that we should
fill in the information when doing a CLOSE_NORMAL.

-   if (stream->walmethod->compression() == 0 &&
+   if (stream->walmethod->compression() == COMPRESSION_NONE &&
stream->walmethod->existsfile(fn))
This one was a more serious issue, as the compression() callback would
return an integer for the compression level but v5 compared it to a
WalCompressionMethod.  In order to take care of this issue, mainly for
pg_basebackup, I think that we have to update the compression()
callback to compression_method(), and it is cleaner to save the
compression method as well as the compression level for the tar data.

I am attaching a new patch, on which I have done many tweaks and
adjustments while reviewing it.  The attached patch fixes the second
issue, and I have done nothing about the first issue yet, but that
should be simple enough to address as this needs an update of the
frame info when closing a completed segment.  Could you look at it?

Thanks,
--
Michael
From 37e3800d279566445864ed82f29e8d650c72d8cd Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 18 Sep 2021 15:11:49 +0900
Subject: [PATCH v6] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received WAL.
This commit teaches it to also be able to use LZ4 compression. It is required
that the binary is build using the -llz4 flag. It is enabled via the --with-lz4
flag on configuration time.

Previously, the user had to use the option --compress with a value between [0-9]
to denote that gzip compression was required. This specific behaviour has not
maintained. A newly introduced option --compression-method=[LZ4|gzip] can be
used to ask for the logs to be compressed. Compression values can be selected
only when the compression method is gzip. A compression value of 0 now returns
an error.

Under the hood there is nothing exceptional to be noted. Tar based archives have
not yet been taught to use LZ4 compression. If that is felt useful, then it is
easy to be added in the future.

Tests have been added to verify the creation and correctness of the generated
LZ4 files. The later is achieved by the use of LZ4 program, if present in the
installation.
---
 src/bin/pg_basebackup/Makefile   |   1 +
 src/bin/pg_basebackup/pg_basebackup.c|   7 +-
 src/bin/pg_basebackup/pg_receivewal.c| 278 +++
 src/bin/pg_basebackup/receivelog.c   |   2 +-
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  81 +-
 src/bin/pg_basebackup/walmethods.c   | 216 --
 src/bin/pg_basebackup/walmethods.h   |  16 +-
 doc/src/sgml/ref/pg_receivewal.sgml  |  28 +-
 src/Makefile.global.in   |   1 +
 src/tools/pgindent/typedefs.list |   1 +
 10 files changed, 546 insertions(+), 85 deletions(-)

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..387d728345 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -24,6 +24,7 @@ export TAR
 # used by the command "gzip" to pass down options, so stick with a different
 # name.
 export GZIP_PROGRAM=$(GZIP)
+export LZ4
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c 
b/src/bin/pg_basebackup/pg_basebackup.c
index 669aa207a3..18c6a93cec 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -555,10 +555,13 @@ LogStreamerMain(logstreamer_param *param)
stream.replication_slot = replication_slot;
 
if (format == 'p')
-   stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0,
+   stream.walmethod = CreateWalDirectoryMethod(param->xlog,
+