Re: Logical replication keepalive flood
On Fri, Sep 17, 2021 at 3:03 PM Greg Nancarrow wrote: > > On Thu, Sep 16, 2021 at 10:36 PM Amit Kapila wrote: > > > > I think here the reason is that the first_lsn of a transaction is > > always equal to end_lsn of the previous transaction (See comments > > above first_lsn and end_lsn fields of ReorderBufferTXN). > > That may be the case, but those comments certainly don't make this clear. > > >I have not > > debugged but I think in StreamLogicalLog() the cur_record_lsn after > > receiving 'w' message, in this case, will be equal to endpos whereas > > we expect to be greater than endpos to exit. Before the patch, it will > > always get the 'k' message where we expect the received lsn to be > > equal to endpos to conclude that we can exit. Do let me know if your > > analysis differs? > > > > Yes, pg_recvlogical seems to be relying on receiving a keepalive for > its "--endpos" logic to work (and the 006 test is relying on '' record > output from pg_recvlogical in this case). > But is it correct to be relying on a keepalive for this? > I don't think this experiment/test indicates that pg_recvlogical's "--endpos" relies on keepalive. It would just print the records till --endpos and then exit. In the test under discussion, as per confirmation by Hou-San, the BEGIN record received has the same LSN as --endpos, so it would just output that and exit which is what is mentioned in pg_recvlogical docs as well (If there's a record with LSN exactly equal to lsn, the record will be output). I think here the test case could be a culprit. In the original commit eb2a6131be [1], where this test of the second time using pg_recvlogical was added there were no additional Inserts (# Insert some rows after $endpos, which we won't read.) which were later added by a different commit 8222a9d9a1 [2]. I am not sure if the test added by commit [2] was a good idea. It seems to be working due to the way keepalives are being sent but otherwise, it can fail as per the current design of pg_recvlogical. [1]: commit eb2a6131beccaad2b39629191508062b70d3a1c6 Author: Simon Riggs Date: Tue Mar 21 14:04:49 2017 + Add a pg_recvlogical wrapper to PostgresNode [2]: commit 8222a9d9a12356349114ec275b01a1a58da2b941 Author: Noah Misch Date: Wed May 13 20:42:09 2020 -0700 In successful pg_recvlogical, end PGRES_COPY_OUT cleanly. -- With Regards, Amit Kapila.
Improved PostgreSQL Mathematics Support.
I have been trying to get a reply or interest in either updating PostgreSQL to support the following, or for there to be a public, free for any use Extension put out there, that will support the following: # High Precision Numeric and Elementary Functions Support. # -Integer (HPZ) Z, or Rational Decimal Q (HPQ) numbers support. -Recurring Rational Numbers and recurring Irrational Numbers can be appropriately truncated, by a precision value, to obtain an approximating value. The latter phenomenon is a finite Rational value, possibly with integer and/or decimal parts at the same time. These may be positive or negative, standard number line, values. -Forward and Inverse operations accuracy, withstanding truncation, can be maintained by storing and normalising the expression behind a value, (or just include pointers to the value) and displaying the evaluation. This system will uphold any precision. -A defaulting number of significant figures (precision), in one copy of one field in memory that is held in there, as a filter, for all HPZ and HPQ numbers. For example, 20 significant figures, as a default, to start by. -A function that varies the precision filter for every HPZ and HPQ number at once. -Value assignment to a typed variable by =. -Base 10 Arithmetic and comparisons support on Base 10 Integer and Rational Decimal numbers. +,-,*,/,%,^,=,!=,<>,>,<,>=,<=, :: These include full finite division and integer only division, with no remainder. The defaulting ability of numbers data in lower types to automatically be casted up to HPZ or HPQ, where specified and occuring in PostgreSQL code. -Reified support with broader syntax and operations within PostgreSQL, in all the obvious and less than obvious places. Tables and related phenomena, Indexing, the Window type, Record type, direct compatability with Aggregate and Window Functions, the Recursive keyword, are all parts of a larger subset that may re-interact with HPZ or HPQ. # -Mathematical and Operational functions support: precision(BIGINT input) cast(TEXT as HPZ) returns HPZ; cast(TEXT as HPQ) returns HPQ; cast(HPQ as TEXT) returns TEXT; cast(HPZ as TEXT) returns TEXT; cast(HPZ as HPQ) returns HPQ; cast(HPQ as HPZ) returns HPZ; cast(HPZ as SMALLINT) returns SMALLINT; cast(SMALLINT as HPQ) returns HPZ; cast(HPZ as INTEGER) returns INTEGER; cast(INTEGER as HPZ) returns HPZ; cast(HPZ as BIGINT) returns BIGINT; cast(BIGINT as HPZ) returns HPZ; cast(HPQ as REAL) returns REAL; cast(REAL as HPQ) returns HPQ cast(DOUBLE PRECISION as HPQ) returns HPQ; cast(HPQ as DOUBLE PRECISION) returns DOUBLE PRECISION; cast(HPQ as DECIMAL) returns DECIMAL; cast(DECIMAL as HPQ) returns HPQ; sign(HPQ input) returns HPQ; abs(HPQ input) returns HPQ; ceil(HPQ input) returns HPQ; floor(HPQ input) returns HPQ; round(HPQ input) returns HPZ; pi() returns HPQ; e() returns HPQ; power(HPQ base, HPQ exponent) returns HPQ; sqrt(HPQ input) returns HPQ nroot(HPZ theroot, HPQ input) returns HPQ; log10(HPQ input) returns HPQ; loge(HPQ input) returns HPQ; log2(HPQ input) returns HPQ; factorial(HPZ input) returns HPZ; nCr(HPZ objects, HPZ selectionSize) returns HPZ nPr(HPZ objects, HPZ selectionSize) returns HPZ degrees(HPQ input) returns HPQ; radians(HPQ input) returns HPQ; sind(HPQ input) returns HPQ; cosd(HPQ input) returns HPQ; tand(HPQ input) returns HPQ; asind(HPQ input) returns HPQ; acosd(HPQ input) returns HPQ; atand(HPQ input) returns HPQ; sinr(HPQ input) returns HPQ; cosr(HPQ input) returns HPQ; tanr(HPQ input) returns HPQ; asinr(HPQ input) returns HPQ; acosr(HPQ input) returns HPQ; atanr(HPQ input) returns HPQ; ## -Informative articles on all these things exist at: Comparison Operators: https://en.wikipedia.org/wiki/Relational_operator Floor and Ceiling Functions: https://en.wikipedia.org/wiki/Floor_and_ceiling_functions Arithmetic Operations: https://en.wikipedia.org/wiki/Arithmetic Integer Division: https://en.wikipedia.org/wiki/Division_(mathematics)#Of_integers Modulus Operation: https://en.wikipedia.org/wiki/Modulo_operation Rounding (Commercial Rounding): https://en.wikipedia.org/wiki/Rounding Factorial Operation: https://en.wikipedia.org/wiki/Factorial Degrees: https://en.wikipedia.org/wiki/Degree_(angle) Radians: https://en.wikipedia.org/wiki/Radian Elementary Functions: https://en.wikipedia.org/wiki/Elementary_function -Ease of installation support. Particularly for Windows and Linux. *.exe, *.msi or *.rpm, *.deb, *.bin installers. With a PostgreSQL standard installation. Installation and Activation instructions included. The following chart could be used to help test trigonometry outputs, under Further Condsideration of the Unit Circle:
Re: POC: Cleaning up orphaned files using undo logs
On Fri, Sep 17, 2021 at 9:50 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote: > > * What happened with the idea of abandoning discard worker for the sake > of simplicity? From what I see limiting everything to foreground undo > could reduce the core of the patch series to the first four patches > (forgetting about test and docs, but I guess it would be enough at > least for the design review), which is already less overwhelming. > I think the discard worker would be required even if we decide to apply all the undo in the foreground. We need to forget/remove the undo of committed transactions as well which we can't remove immediately after the commit. -- With Regards, Amit Kapila.
Re: Logical replication timeout problem
On Fri, Sep 17, 2021 at 8:08 PM Fabrice Chapuis wrote: > > the publisher and the subscriber run on the same postgres instance. > Okay, but there is no log corresponding to operations being performed by the publisher. By looking at current logs it is not very clear to me what might have caused this. Did you try increasing wal_sender_timeout and wal_receiver_timeout? -- With Regards, Amit Kapila.
Re: So, about that cast-to-typmod-minus-one business
+1 backporting Tony On 2021/9/19 01:06, Tom Lane wrote: > We had left it as an open issue whether or not to risk back-patching > 5c056b0c2 into stable branches [1]. While updating the v14 release notes, > I realized that we can't put off that decision any longer, because we > have to decide now whether to document that as a new behavior in v14. > > I'm inclined to back-patch, since nobody has complained about this > in 14beta3. Thoughts? > > regards, tom lane > > [1] > https://www.postgresql.org/message-id/flat/CABNQVagu3bZGqiTjb31a8D5Od3fUMs7Oh3gmZMQZVHZ%3DuWWWfQ%40mail.gmail.com
Undocumented AT TIME ZONE INTERVAL syntax
In reviewing Paul's application period patch, I noticed some very curious syntax in the test cases. I learned that Paul is equally confused by it, and has asked about it in his PgCon 2020 presentation > SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE; timezone - 2018-03-04 05:02:00 (1 row) Searching around, I found several instances of this syntax being used [1][2][3], but with one important clarifying difference: the expected syntax was > SELECT '2018-03-04' AT TIME ZONE INTERVAL '2:00' HOUR TO MINUTE; timezone - 2018-03-04 07:00:00 (1 row) Now I understand that the user probably meant to do this: # SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR; timezone - 2018-03-04 07:00:00 (1 row) But none of this is in our own documentation. Before I write a patch to add this to the documentation, I'm curious what level of sloppiness we should tolerate in the interval calculation. Should we enforce the time string to actually conform to the format laid out in the X TO Y spec? If we don't require that, is it correct to say that the values will be filled from order of least significance to greatest? [1] https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/DataTypes/Date-Time/TIMESTAMPATTIMEZONE.htm [2] https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/aWY6mGNJ5CYJlSDrvgDQag [3] https://community.snowflake.com/s/question/0D50Z9AqIaSSAV/is-it-possible-to-add-an-interval-of-5-hours-to-the-session-timezone-
Re: SQL:2011 application time
In IBM DB2 you can only have one because application-time periods must > be named "business_time" (not joking). > I saw that as well, and it made me think that someone at IBM is a fan of Flight Of The Conchords. > Personally I feel like it's a weird limitation and I wouldn't mind > supporting more, but my current implementation only allows for one, > and I'd have to rethink some things to do it differently. > I'm satisfied that it's not something we need to do in the first MVP. > > Yes. Even though the name "SYSTEM_TIME" is technically enough, I'd > still include a pertype column to make distinguishing system vs > application periods easier and more obvious. > SYSTEM_TIME seems to allow for DATE values in the start_time and end_time fields, though I cannot imagine how that would ever be practical, unless it were somehow desirable to reject subsequent updates within a 24 hour timeframe. I have seen instances where home-rolled application periods used date values, which had similar problems where certain intermediate updates would simply have to be discarded in favor of the one that was still standing at midnight. > > > 2. The system versioning effort has chosen 'infinity' as their end-time > value, whereas you have chosen NULL as that makes sense for an unbounded > range. Other databases seem to leverage '-12-31 23:59:59' (SQLServer, > IIRC) whereas some others seem to used '2999-12-31 23:59:59' but those > might have been home-rolled temporal implementations. To further add to the > confusion, the syntax seems to specify the keyword of MAXVALUE, which > further muddies things. The system versioning people went with 'infinity' > seemingly because it prescribe and end to the world like SQLServer did, but > also because it allowed for a primary key based on (id, endtime) and that's > just not possible with NULL endtime values. > > I think it's a little weird that our system-time patch mutates your > primary key. None of the other RDMBSes do that. I don't think it's > incompatible (as long as the system time patch knows how to preserve > the extra period/range data in an application-time temporal key), but > it feels messy to me. > Per outline below, I'm proposing an alternate SYSTEM_TIME implementation that would leave the PK as-is. > I would prefer if system-time and application-time used the same value > to mean "unbounded". Using null means we can support any type (not > just types with +-Infinity). And it pairs nicely with range types. If > the only reason for system-time to use Infinity is the primary key, I > think it would be better not to mutate the primary key (and store the > historical records in a separate table as other RDMSes do). > The two "big wins" of infinity seemed (to me) to be: 1. the ability to add "AND end_time = 'infinity'" as a cheap way to get current rows 2. clauses like "WHERE CURRENT_DATE - 3 BETWEEN start_time AND end_time" would work. Granted, there's very specific new syntax to do that properly, but you know somebody's gonna see the columns and try to do it that way. > > Btw Oracle also uses NULL to mean "unbounded". > Huh, I missed that one. That is good in that it gives some precedence to how you've approached it. > > We presently forbid PKs from including expressions, but my patch lifts > that exception so it can index a rangetype expression built from the > period start & end columns. So even if we must include the system-time > end column in a PK, perhaps it can use a COALESCE expression to store > Infinity even while using NULL to signify "currently true" from a user > perspective. > Either way seems viable, but I understand why you want to leverage ranges in this way. > > > 3. I noticed some inconsistency in the results from various "SELECT * > FROM portion_of_test" examples. In some, the "valid_at" range is shown but > not columns that make it up, and in some others, the "valid_from" and > "valid_to" columns are shown, with no mention of the period. From what I've > seen, the period column should be invisible unless invoked, like ctid or > xmin. > > In most cases the tests test the same functionality with both PERIODs > and rangetype columns. For FKs they test all four combinations of > PERIOD-referencing-PERIOD, PERIOD-referencing-range, > range-referencing-PERIOD, and range-referencing-range. If valid_at is > a genuine column, it is included in SELECT *, but not if it is a > PERIOD. > Ok, I'll have to look back over the test coverage to make sure that I understand the behavior now. > > > 4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE > simply confounded me. > > Me too! I have no idea what that is supposed to mean. But that > behavior predates my patch. I only had to deal with it because it > creates a shift-reduce conflict with `FOR PORTION OF valid_at FROM x > TO y`, where x & y are expressions. I asked about this syntax at my > PgCon 2020 talk, but I haven't ever received an answer. Perhaps > someone else knows
Re: WIP: System Versioned Temporal Table
> > > > 1. Much of what I have read about temporal tables seemed to imply or > almost assume that system temporal tables would be implemented as two > actual separate tables. Indeed, SQLServer appears to do it that way [1] > with syntax like > > WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory)); > > > Q 1.1. Was that implementation considered and if so, what made this > implementation more appealing? > > I've been digging some more on this point, and I've reached the conclusion that a separate history table is the better implementation. It would make the act of removing system versioning into little more than a DROP TABLE, plus adjusting the base table to reflect that it is no longer system versioned. What do you think of this method: 1. The regular table remains unchanged, but a pg_class attribute named "relissystemversioned" would be set to true 2. I'm unsure if the standard allows dropping a column from a table while it is system versioned, and the purpose behind system versioning makes me believe the answer is a strong "no" and requiring DROP COLUMN to fail on relissystemversioned = 't' seems pretty straightforward. 3. The history table would be given a default name of $FOO_history (space permitting), but could be overridden with the history_table option. 4. The history table would have relkind = 'h' 5. The history table will only have rows that are not current, so it is created empty. 6. As such, the table is effectively append-only, in a way that vacuum can actually leverage, and likewise the fill factor of such a table should never be less than 100. 7. The history table could only be updated only via system defined triggers (insert,update,delete, alter to add columns), or row migration similar to that found in partitioning. It seems like this would work as the two tables working as partitions of the same table, but presently we can't have multi-parent partitions. 8. The history table would be indexed the same as the base table, except that all unique indexes would be made non-unique, and an index of pk + start_time + end_time would be added 9. The primary key of the base table would remain the existing pk vals, and would basically function normally, with triggers to carry forth changes to the history table. The net effect of this is that the end_time value of all rows in the main table would always be the chosen "current" value (infinity, null, -12-31, etc) and as such might not actually _need_ to be stored. 10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table directly with no quals to add. 11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, then the query would do a union of the base table and the history table with quals applied to both. 12. It's a fair question whether the history table would be something that could be queried directly. I'm inclined to say no, because that allows for things like SELECT FOR UPDATE, which of course we'd have to reject. 13. If a history table is directly referenceable, then SELECT permission can be granted or revoked as normal, but all insert/update/delete/truncate options would raise an error. 14. DROP SYSTEM VERSIONING from a table would be quite straightforward - the history table would be dropped along with the triggers that reference it, setting relissystemversioned = 'f' on the base table. I think this would have some key advantages: 1. MVCC bloat is no worse than it was before. 2. No changes whatsoever to referential integrity. 3. DROP SYSTEM VERSIONING becomes an O(1) operation. Thoughts? I'm going to be making a similar proposal to the people doing the application time effort, but I'm very much hoping that we can reach some consensus and combine efforts.
Re: So, about that cast-to-typmod-minus-one business
On Sat, 18 Sep 2021, 18:06 Tom Lane, wrote: > > I'm inclined to back-patch > +1 Regards, Dean >
Re: PG 14 release notes, first draft
On Sat, Sep 18, 2021 at 05:15:39PM -0400, Tom Lane wrote: > Justin Pryzby writes: > > I think the release notes for the autovacuum item (which was first reverted > > and > > then partially un-reverted) should say something like "Partitioned tables > > are > > now included in pg_stat_all_tables": > > | e1efc5b465 Keep stats up to date for partitioned tables > > Hmm. If I'm reading the commit message properly, the actual change there > is not that, but that analyze count and last analyze time are now tracked > correctly for partitioned tables. Might be worth mentioning, not sure. The reverted patch to autoanalyze included partitioned tables in pg_stat_all_tables, and the revert specifically avoided changing that (to avoid a catbump). But last_analyzed and analyze_count were always shown as "0". So the e1 commit addresses that by tracking that information and showing correct value instead of always 0. The relevant portion starts here: https://www.postgresql.org/message-id/flat/202108161700.d4eh6a7n2lki%40alvherre.pgsql#b2e426eb19dbbddee0adf9bb1bcbbcf1 I suggest that this *should* be included in the release notes, since I specifically requested that partitioned tables be included in 2018. > In 20200418050815(dot)GE26953(at)telsasoft(dot)com I wrote: > |This patch includes partitioned tables in pg_stat_*_tables, which is great; I > |complained awhile ago that they were missing [0]. It might be useful if that > |part was split out into a separate 0001 patch (?). > | [0] > https://www.postgresql.org/message-id/20180601221428.GU5164%40telsasoft.com Also, I've patched my analyze script to use that field (same as for nonpartitioned tables) rather than needing to do a subquery involving max(last_analyzed) of the partitions. Since it's still needed to manually analyze parent tables. -- Justin
Re: PG 14 release notes, first draft
Justin Pryzby writes: > I think the release notes for the autovacuum item (which was first reverted > and > then partially un-reverted) should say something like "Partitioned tables are > now included in pg_stat_all_tables": > | e1efc5b465 Keep stats up to date for partitioned tables Hmm. If I'm reading the commit message properly, the actual change there is not that, but that analyze count and last analyze time are now tracked correctly for partitioned tables. Might be worth mentioning, not sure. > Remove some internal question/marks: > ACCURATE? > REMOVE? > ALTER TABLE ... ALTER COLUMN ... TYPE RESETS STASTISTICS? (was never intended > to have markup added?) Did that in the copy-editing I just pushed. > Also, I'm not sure if this one should be included. > 9a4c0e36fb Dump ALTER TABLE ... ATTACH PARTITION as a separate ArchiveEntry. > See: > https://www.postgresql.org/message-id/flat/20210830154249.gx26...@telsasoft.com#9ad6fd4c36e13deea1c5f92f5114320e Probably not worth listing --- the amount of explanation needed seems to outweigh the probability of users caring. > What about this ? > ce6b662aae psql: Fix name quoting on extended statistics Seems way too minor to bother with here. regards, tom lane
Re: PG 14 release notes, first draft
I think the release notes for the autovacuum item (which was first reverted and then partially un-reverted) should say something like "Partitioned tables are now included in pg_stat_all_tables": | e1efc5b465 Keep stats up to date for partitioned tables Remove some internal question/marks: ACCURATE? REMOVE? ALTER TABLE ... ALTER COLUMN ... TYPE RESETS STASTISTICS? (was never intended to have markup added?) Also, I'm not sure if this one should be included. 9a4c0e36fb Dump ALTER TABLE ... ATTACH PARTITION as a separate ArchiveEntry. See: https://www.postgresql.org/message-id/flat/20210830154249.gx26...@telsasoft.com#9ad6fd4c36e13deea1c5f92f5114320e What about this ? ce6b662aae psql: Fix name quoting on extended statistics -- Justin
Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade
On 2021-Sep-18, Alexander Korotkov wrote: > I see now. I think I'm rather favoring splitting visibilitymap.h. Agreed, this looks sane to me. However, I think the VM_ALL_{VISIBLE,FROZEN} macros should remain in visibilitymap.h, since they depend on the visibilitymap_get_status function (and pg_upgrade doesn't use them). There's a typo "maros" for "macros" in the new header file. (Also, why does the copyright line say "portions" if no portion under another copyright? I think we don't say "portions" when there is only one copyright statement line.) -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: Timeout failure in 019_replslot_limit.pl
On 2021-Sep-18, Michael Paquier wrote: > Could it be possible to rely on a combination of wait events set in WAL > senders and pg_stat_replication to assume that a WAL sender is in a > stopped state? Hmm, sounds a possibly useful idea to explore, but I would only do so if the other ideas prove fruitless, because it sounds like it'd have more moving parts. Can you please first test if the idea of sending the signal twice is enough? If that doesn't work, let's try Horiguchi-san's idea of using some `ps` flags to find the process. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ Al principio era UNIX, y UNIX habló y dijo: "Hello world\n". No dijo "Hello New Jersey\n", ni "Hello USA\n".
Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade
Hi, On Sat, Sep 18, 2021 at 3:06 AM Andres Freund wrote: > On 2021-09-18 02:51:09 +0300, Alexander Korotkov wrote: > > On Tue, Sep 14, 2021 at 6:53 AM Tom Lane wrote: > > > Without having looked at the details, I think using a forward-declare > > > to avoid including relcache.h in visibilitymap.h might be a reasonably > > > non-painful fix. > > > > I like that idea, but I didn't find an appropriate existing header for > > forward-declaration of Relation. relation.h isn't suitable, because > > it includes primnodes.h. A separate header for just > > forward-definition of Relation seems too much. > > I was just thinking of doing something like the attached. I see now. I think I'm rather favoring splitting visibilitymap.h. > > > TOH, in the long run it might be worth the effort > > > to split visibilitymap.h to separate useful file-contents knowledge > > > from backend function declarations. > > > > I've drafted a patch splitting visibilitymap_maros.h from > > visibilitymap.h. What do you think? > > I'd name it visibilitymapdefs.h or such, mostly because that's what other > headers are named like... Good point. The revised patch is attached. -- Regards, Alexander Korotkov From 798431948da5374ad947fc5f00ef3c87b1bb03ba Mon Sep 17 00:00:00 2001 From: Alexander Korotkov Date: Sat, 18 Sep 2021 02:45:41 +0300 Subject: [PATCH] Split macros from visibilitymap.h into a separate header That allows to include just visibilitymapdefs.h from file.c, and in turn remove include of postgres.h from relcache.h. --- src/bin/pg_upgrade/file.c | 2 +- src/include/access/visibilitymap.h | 16 + src/include/access/visibilitymapdefs.h | 31 ++ src/include/utils/relcache.h | 1 - 4 files changed, 33 insertions(+), 17 deletions(-) create mode 100644 src/include/access/visibilitymapdefs.h diff --git a/src/bin/pg_upgrade/file.c b/src/bin/pg_upgrade/file.c index 9b0cc16e452..1b34ee09fa6 100644 --- a/src/bin/pg_upgrade/file.c +++ b/src/bin/pg_upgrade/file.c @@ -19,7 +19,7 @@ #include #endif -#include "access/visibilitymap.h" +#include "access/visibilitymapdefs.h" #include "common/file_perm.h" #include "pg_upgrade.h" #include "storage/bufpage.h" diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h index 57362c36876..3f546667337 100644 --- a/src/include/access/visibilitymap.h +++ b/src/include/access/visibilitymap.h @@ -14,26 +14,12 @@ #ifndef VISIBILITYMAP_H #define VISIBILITYMAP_H +#include "access/visibilitymapdefs.h" #include "access/xlogdefs.h" #include "storage/block.h" #include "storage/buf.h" #include "utils/relcache.h" -/* Number of bits for one heap page */ -#define BITS_PER_HEAPBLOCK 2 - -/* Flags for bit map */ -#define VISIBILITYMAP_ALL_VISIBLE 0x01 -#define VISIBILITYMAP_ALL_FROZEN 0x02 -#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap - * flags bits */ - -/* Macros for visibilitymap test */ -#define VM_ALL_VISIBLE(r, b, v) \ - ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_VISIBLE) != 0) -#define VM_ALL_FROZEN(r, b, v) \ - ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0) - extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags); extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk, diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h new file mode 100644 index 000..01d0611bb6b --- /dev/null +++ b/src/include/access/visibilitymapdefs.h @@ -0,0 +1,31 @@ +/*- + * + * visibilitymapdefs.h + * maros for accessing contents of visibility map pages + * + * + * Portions Copyright (c) 2021, PostgreSQL Global Development Group + * + * src/include/access/visibilitymapdefs.h + * + *- + */ +#ifndef VISIBILITYMAPDEFS_H +#define VISIBILITYMAPDEFS_H + +/* Number of bits for one heap page */ +#define BITS_PER_HEAPBLOCK 2 + +/* Flags for bit map */ +#define VISIBILITYMAP_ALL_VISIBLE 0x01 +#define VISIBILITYMAP_ALL_FROZEN 0x02 +#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap + * flags bits */ + +/* Macros for visibilitymap test */ +#define VM_ALL_VISIBLE(r, b, v) \ + ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_VISIBLE) != 0) +#define VM_ALL_FROZEN(r, b, v) \ + ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0) + +#endif /* VISIBILITYMAPDEFS_H */ diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index f772855ac69..d2c17575f65 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -14,7 +14,6 @@ #ifndef RELCACHE_H #define RELCACHE_H -#include "postgres.h" #include "access/tupdesc.h" #include "nodes/bitmapset.h" -- 2.24.3 (Apple Git-128)
Re: So, about that cast-to-typmod-minus-one business
On Sat, Sep 18, 2021 at 10:06 AM Tom Lane wrote: > We had left it as an open issue whether or not to risk back-patching > 5c056b0c2 into stable branches [1]. While updating the v14 release notes, > I realized that we can't put off that decision any longer, because we > have to decide now whether to document that as a new behavior in v14. > > I'm inclined to back-patch, since nobody has complained about this > in 14beta3. Thoughts? > > regards, tom lane > > [1] > https://www.postgresql.org/message-id/flat/CABNQVagu3bZGqiTjb31a8D5Od3fUMs7Oh3gmZMQZVHZ%3DuWWWfQ%40mail.gmail.com > > > Hi, +1 to backporting. Thanks
Re: Release 14 Schedule
Andrew Dunstan writes: > The Release Management Team (Michael Paquier, Peter Geoghegan and > myself) in consultation with the release team proposes the following > release schedule: > * PostgreSQL 14 Release Candidate 1 (RC1) will be released on September 23, > 2021. > * In the absence of any critical issues, PostgreSQL 14 will become generally > available on September 30, 2021. We don't yet have a list-of-major-features for the v14 release notes. Anybody care to propose one? regards, tom lane
So, about that cast-to-typmod-minus-one business
We had left it as an open issue whether or not to risk back-patching 5c056b0c2 into stable branches [1]. While updating the v14 release notes, I realized that we can't put off that decision any longer, because we have to decide now whether to document that as a new behavior in v14. I'm inclined to back-patch, since nobody has complained about this in 14beta3. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/flat/CABNQVagu3bZGqiTjb31a8D5Od3fUMs7Oh3gmZMQZVHZ%3DuWWWfQ%40mail.gmail.com
Re: Gather performance analysis
On 9/8/21 8:05 AM, Dilip Kumar wrote: On Tue, Sep 7, 2021 at 8:41 PM Tomas Vondra mailto:tomas.von...@enterprisedb.com>> wrote: Hi, The numbers presented in this thread seem very promising - clearly there's significant potential for improvements. I'll run similar benchmarks too, to get a better understanding of this. Thanks for showing interest. Can you share some basic details about the hardware you used? Particularly the CPU model - I guess this might explain some of the results, e.g. if CPU caches are ~1MB, that'd explain why setting tup_queue_size to 1MB improves things, but 4MB is a bit slower. Similarly, number of cores might explain why 4 workers perform better than 8 or 16 workers. I have attached the output of the lscpu. I think batching the data before updating in the shared memory will win because we are avoiding the frequent cache misses and IMHO the benefit will be more in the machine with more CPU sockets. Now, this is mostly expected, but the consequence is that maybe things like queue size should be tunable/dynamic, not hard-coded? Actually, my intention behind the tuple queue size was to just see the behavior. Do we really have the problem of workers stalling on queue while sending the tuple, the perf report showed some load on WaitLatch on the worker side so I did this experiment. I saw some benefits but it was not really huge. I am not sure whether we want to just increase the tuple queue size or make it tunable, but if we want to support redistribute operators in future sometime then maybe we should make it dynamically growing at runtime, maybe using dsa or dsa + shared files. Thanks. I ran a couple more benchmarks, with different queue sizes (16kB, 64kB, 256kB and 1MB) and according to the results the queue size really makes almost no difference. It might make a difference for some queries, but I wouldn't bother tuning this until we have a plausible example - increasing the queue size is not free either. So it was worth checking, but I'd just leave it as 64kB for now. We may revisit this later in a separate patch/thread. As for the patches, I think the proposed changes are sensible, but I wonder what queries might get slower. For example with the batching (updating the counter only once every 4kB, that pretty much transfers data in larger chunks with higher latency. So what if the query needs only a small chunk, like a LIMIT query? Similarly, this might mean the upper parts of the plan have to wait for the data for longer, and thus can't start some async operation (like send them to a FDW, or something like that). I do admit those are theoretical queries, I haven't tried creating such query. Yeah, I was thinking about such cases, basically, this design can increase the startup cost of the Gather node, I will also try to derive such cases and test them. FWIW I've tried applying both patches at the same time, but there's a conflict in shm_mq_sendv - not a complex one, but I'm not sure what's the correct solution. Can you share a "combined" patch? Actually, these both patches are the same, "v1-0001-Optimize-parallel-tuple-send-shm_mq_send_bytes.patch" is the cleaner version of the first patch. For configurable tuple queue size I did not send a patch, because that is I just used for the testing purpose and never intended to to propose anything. My most of the latest performance data I sent with only "v1-0001-Optimize-parallel-tuple-send-shm_mq_send_bytes.patch" and with default tuple queue size. But I am attaching both the patches in case you want to play around. Ah, silly me. I should have noticed the second patch is just a refined version of the first one. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
On Sat, 18 Sep 2021 at 12:57, Thomas Habets wrote: > > But these are two changes: > 1. Actually verify against a CA > 2. Actually check the CN/altnames > > Anything short of "verify-full" is in my view "not checking". Even with a > private CA this allows for a lot of lateral movement in an org, as if you > have one cert you have them all, for impersonation purposes. > 100% agree. I suspect that many postgres users are not completely aware that by default their ssl connections do not check the CA or CN/altnames. > Changing such a default is a big change. > Agreed. It is going to break existing installs that rely on the current behaviour. There are two defaults to worry about here: sslmode=prefer sslrootcert=~/.postgresql/root.crt Having sslrootcert use the system trust store if ~/.postgresql/root.crt doesn’t exist would seem like a good change. Changing sslmode to default to something else would mostly likely break a ton of existing installations, and there are plenty of use cases were ssl isn’t used. Trying ssl first and without afterwards probably is still a sensible default. However… I haven’t completely through this through, but what if the sslmode=prefer logic was: 1. Try ssl first, with both CA and CN checking (ie same as verify-full) 2. Print warnings appropriate to what type of ssl connection can be made 3. If all else fails, try without ssl. In other words start with verify-full and downgrade gracefully to prefer, but actually tell the user that this has happen. Essentially sslmode=prefer is a type of opportunistic encryption. I’m suggesting making it try stronger levels of ssl opportunistically. Require, verify-ca and verify-full can keep their semantics, or rather, they should all try verify-full first and then downgrade (with warnings logged) to the level they actually enforce. Thanks C
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
On Sat, 18 Sept 2021 at 00:10, Cameron Murdoch wrote: > I also agree that the proposed patch is not the right way to go as it is > essentially the same as verify-full, and I think that the correct fix would > be to change the default. > But these are two changes: 1. Actually verify against a CA 2. Actually check the CN/altnames Anything short of "verify-full" is in my view "not checking". Even with a private CA this allows for a lot of lateral movement in an org, as if you have one cert you have them all, for impersonation purposes. Changing such a default is a big change. Maybe long term it's worth the short term pain, though. Long term it'd be the config of least surprise, in my opinion. But note that one has to think about all the settings, such that the default is not more checking than "require", which might also be surprising. A magic setting of the file to be "system" sounds good for my use cases, at least. -- typedef struct me_s { char name[] = { "Thomas Habets" }; char email[] = { "tho...@habets.se " }; char kernel[]= { "Linux" }; char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt; }; char pgp[] = { "9907 8698 8A24 F52F 1C2E 87F6 39A4 9EEA 460A 0169" }; char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; } me_t;
Re: Timeout failure in 019_replslot_limit.pl
On Fri, Sep 17, 2021 at 08:41:00PM -0700, Noah Misch wrote: > If this fixes things for the OP, I'd like it slightly better than the "ps" > approach. It's less robust, but I like the brevity. > > Another alternative might be to have walreceiver reach walsender via a proxy > Perl script. Then, make that proxy able to accept an instruction to pause > passing data until further notice. However, I like two of your options better > than this one. Could it be possible to rely on a combination of wait events set in WAL senders and pg_stat_replication to assume that a WAL sender is in a stopped state? I would think about something like that in the top of my mind (perhaps this would need 2 WAL senders, one stopped and one still running): 1) SIGSTOP WAL sender 1. 2) Check WAL sender 1 is in WalSenderMain. If not retry 1) after a SIGCONT. 3) Generate some WAL, and look at pg_stat_replication to see if there has been some progress in 1), but that 2) is done. -- Michael signature.asc Description: PGP signature
Re: Teach pg_receivewal to use lz4 compression
On Fri, Sep 17, 2021 at 08:12:42AM +, gkokola...@pm.me wrote: > Yeah, I was considering it to split them over to a separate commit, > then decided against it. Will do so in the future. I have been digging into the issue I saw in the TAP tests when closing a segment, and found the problem. The way you manipulate frameInfo.contentSize by just setting it to WalSegSz when *opening* a segment causes problems on LZ4F_compressEnd(), making the code throw a ERROR_frameSize_wrong. In lz4frame.c, the end of LZ4F_compressEnd() triggers this check and the error: if (cctxPtr->prefs.frameInfo.contentSize) { if (cctxPtr->prefs.frameInfo.contentSize != cctxPtr->totalInSize) return err0r(LZ4F_ERROR_frameSize_wrong); } We don't really care about contentSize as long as a segment is not completed. Rather than filling contentSize all the time we write something, we'd better update frameInfo once the segment is completed and closed. That would also take take of the error as this is not checked if contentSize is 0. It seems to me that we should fill in the information when doing a CLOSE_NORMAL. - if (stream->walmethod->compression() == 0 && + if (stream->walmethod->compression() == COMPRESSION_NONE && stream->walmethod->existsfile(fn)) This one was a more serious issue, as the compression() callback would return an integer for the compression level but v5 compared it to a WalCompressionMethod. In order to take care of this issue, mainly for pg_basebackup, I think that we have to update the compression() callback to compression_method(), and it is cleaner to save the compression method as well as the compression level for the tar data. I am attaching a new patch, on which I have done many tweaks and adjustments while reviewing it. The attached patch fixes the second issue, and I have done nothing about the first issue yet, but that should be simple enough to address as this needs an update of the frame info when closing a completed segment. Could you look at it? Thanks, -- Michael From 37e3800d279566445864ed82f29e8d650c72d8cd Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Sat, 18 Sep 2021 15:11:49 +0900 Subject: [PATCH v6] Teach pg_receivewal to use LZ4 compression The program pg_receivewal can use gzip compression to store the received WAL. This commit teaches it to also be able to use LZ4 compression. It is required that the binary is build using the -llz4 flag. It is enabled via the --with-lz4 flag on configuration time. Previously, the user had to use the option --compress with a value between [0-9] to denote that gzip compression was required. This specific behaviour has not maintained. A newly introduced option --compression-method=[LZ4|gzip] can be used to ask for the logs to be compressed. Compression values can be selected only when the compression method is gzip. A compression value of 0 now returns an error. Under the hood there is nothing exceptional to be noted. Tar based archives have not yet been taught to use LZ4 compression. If that is felt useful, then it is easy to be added in the future. Tests have been added to verify the creation and correctness of the generated LZ4 files. The later is achieved by the use of LZ4 program, if present in the installation. --- src/bin/pg_basebackup/Makefile | 1 + src/bin/pg_basebackup/pg_basebackup.c| 7 +- src/bin/pg_basebackup/pg_receivewal.c| 278 +++ src/bin/pg_basebackup/receivelog.c | 2 +- src/bin/pg_basebackup/t/020_pg_receivewal.pl | 81 +- src/bin/pg_basebackup/walmethods.c | 216 -- src/bin/pg_basebackup/walmethods.h | 16 +- doc/src/sgml/ref/pg_receivewal.sgml | 28 +- src/Makefile.global.in | 1 + src/tools/pgindent/typedefs.list | 1 + 10 files changed, 546 insertions(+), 85 deletions(-) diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile index 459d514183..387d728345 100644 --- a/src/bin/pg_basebackup/Makefile +++ b/src/bin/pg_basebackup/Makefile @@ -24,6 +24,7 @@ export TAR # used by the command "gzip" to pass down options, so stick with a different # name. export GZIP_PROGRAM=$(GZIP) +export LZ4 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 669aa207a3..18c6a93cec 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -555,10 +555,13 @@ LogStreamerMain(logstreamer_param *param) stream.replication_slot = replication_slot; if (format == 'p') - stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0, + stream.walmethod = CreateWalDirectoryMethod(param->xlog, +