RE: tablesync copy ignores publication actions
On Tue, Jun 14, 2022 3:36 PM Peter Smith wrote: > > PSA v2 of the patch, based on all feedback received. > > ~~~ > > Main differences from v1: > > * Rewording and more explanatory text. > > * The examples were moved to the "Subscription" [1] page and also > extended to show some normal replication and row filter examples, from > [Amit]. > > * Added some text to CREATE PUBLICATION 'publish' param [2], from > [Euler][Amit]. > > * Added some text to CREATE SUBSCRIPTION Notes [3], from [Shi-san]. > > * Added some text to the "Publication page" [4] to say the 'publish' > is only for DML operations. > > * I changed the note in "Row Filter - Initial Data Synchronization" > [5] to be a warning because I felt users could be surprised to see > data exposed by the initial copy, which a DML operation would not > expose. > Thanks for updating the patch. Two comments: 1. + it means the copied table t3 contains all rows even when + they do not patch the row filter of publication pub3b. Typo. I think "they do not patch the row filter" should be "they do not match the row filter", right? 2. @@ -500,7 +704,6 @@ - It seems we should remove this change. Regards, Shi yu
Re: Typo in ro.po file?
On 14.06.22 05:34, Peter Smith wrote: FWIW, I stumbled on this obscure possible typo (?) in src/pl/plperl/po/ro.po: ~~~ #: plperl.c:788 msgid "while parsing Perl initialization" msgstr "în timpul parsing inițializării Perl" #: plperl.c:793 msgid "while running Perl initialization" msgstr "în timpul rulării intializării Perl" ~~~ (Notice the missing 'i' - "inițializării" versus "intializării") Fixed in translations repository. Thanks.
Re: better page-level checksums
On 13.06.22 20:20, Robert Haas wrote: If the user wants 16-bit checksums, the feature we've already got seems good enough -- and, as you say, it doesn't use any extra disk space. This proposal is just about making people happy if they want a bigger checksum. It's hard to get any definite information about what size of checksum is "good enough", since after all it depends on what kinds of errors you expect and what kinds of probabilities you want to accept. But the best I could gather so far is that 16-bit CRC are good until about 16 kB block size. Which leads to the question whether there is really a lot of interest in catering to larger block sizes. The recent thread about performance impact of different block sizes might renew interest in this. But unless we really want to encourage playing with the block sizes (and if my claim above is correct), then a larger checksum size might not be needed. On the topic of which algorithm to use, I'd be inclined to think that it is going to be more useful to offer checksums that are 64 bits or more, since IMHO 32 is not all that much more than 16, and I still think there are going to be alignment issues. Beyond that I don't have anything against your specific suggestions, but I'd like to hear what other people think. Again, gathering some vague information ... The benefits of doubling the checksum size are exponential rather than linear, so there is no significant benefit of using a 64-bit checksum over a 32-bit one, for supported block sizes (current max is 32 kB).
Re: Handle infinite recursion in logical replication setup
PSA a test script that demonstrates all the documented steps for setting up n-way bidirectional replication. These steps are the same as those documented [1] on the new page "Bidirectional logical replication". This script works using the current latest v20* patch set. Each of the sections of 31.11.1 - 31.11.5 (see below) can be run independently (just edit the script and at the bottom uncomment the part you want to test): 31.11.1. Setting bidirectional replication between two nodes 31.11.2. Adding a new node when there is no data in any of the nodes 31.11.3. Adding a new node when data is present in the existing nodes 31.11.4. Adding a new node when data is present in the new node 31.11.5. Generic steps for adding a new node to an existing set of nodes ~~ Some sample output is also attached. -- [1] https://www.postgresql.org/message-id/attachment/134464/v20-0004-Document-bidirectional-logical-replication-steps.patch Kind Regards, Peter Smith. Fujitsu Australia Clean up pg_ctl: directory "data_N1" does not exist pg_ctl: directory "data_N2" does not exist pg_ctl: directory "data_N3" does not exist pg_ctl: directory "data_N4" does not exist rm: cannot remove ‘data_N1’: No such file or directory rm: cannot remove ‘data_N2’: No such file or directory rm: cannot remove ‘data_N3’: No such file or directory rm: cannot remove ‘data_N4’: No such file or directory Set up The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_AU.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory data_N1 ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... Australia/Sydney creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok syncing data to disk ... ok initdb: warning: enabling "trust" authentication for local connections initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: pg_ctl -D data_N1 -l logfile start The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_AU.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory data_N2 ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... Australia/Sydney creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok syncing data to disk ... ok initdb: warning: enabling "trust" authentication for local connections initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: pg_ctl -D data_N2 -l logfile start The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_AU.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory data_N3 ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... Australia/Sydney creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok syncing data to disk ... ok initdb: warning: enabling "trust" authentication for local connections initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: pg_ctl -D data_N3 -l logfile start The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_AU.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory data_N4 ... ok crea
Re: Add header support to text format and matching feature
On 14.06.22 11:13, Julien Rouhaud wrote: There is no need for the extra comment, and the error code should be ERRCODE_FEATURE_NOT_SUPPORTED. Is there any rule for what error code should be used? Maybe that's just me but I understand "not supported" as "this makes sense, but this is currently a limitation that might be lifted later". I tend to agree with that interpretation. Also, when you consider the way SQL rules and error codes are set up, errors that are detected during parse analysis should be a subclass of "syntax error or access rule violation".
Re: replacing role-level NOINHERIT with a grant-level option
On 13.06.22 20:00, Robert Haas wrote: I don't think this creates any more of a conflict than we've already got. In fact, I'd go so far as to say it resolves a problem that we currently have. As far as I can see, we are stuck with a situation where we have to support both the INHERIT behavior and the NOINHERIT behavior. Removing either one would be a pretty major compatibility break. And even if some people were willing to endorse that, it seems clear from previous discussions that there are people who like the NOINHERIT behavior and would object to its removal, and other people (like me!) who like the INHERIT behavior and would object to removing that. If you think it's feasible to get rid of either of these behaviors, I'd be interested in hearing your thoughts on that, but to me it looks like we are stuck with supporting both. From my point of view, the question is how to make the best of that situation. I think we want to keep both. Consider a user who in general prefers the NOINHERIT behavior but also wants to use predefined roles. Perhaps user 'peter' is to be granted both 'paul' and 'pg_execute_server_programs'. If role 'peter' is set to INHERIT, Peter will be sad, because his love for NOINHERIT probably means that he doesn't want to exercise Paul's privileges automatically. However, he needs to inherit the privileges of 'pg_execute_server_programs' or they are of no use to him. Peter presumably wants to use COPY TO/FROM program to put data into a table owned by 'peter', not a table owned by 'pg_execute_server_programs'. If so, being able to SET ROLE to 'pg_execute_server_programs' is of no use to him at all, but inheriting the privilege is useful. That's because our implementation of SET ROLE is bogus. We should have a SET ROLE that is separate from SET SESSION AUTHORIZATION, where the current user can keep their current user-ness and additionally enable (non-inherited) roles. I don't think I'm proposing to break anything or go in a totally opposite direction from anything, and to be honest I'm kind of confused as to why you think that what I'm proposing would have that effect. As far as I can see, the changes that I'm proposing are upward-compatible and would permit easy migration to new releases via either pg_dump or pg_upgrade with no behavior changes at all. Some syntax would be a bit different on the new releases and that would unlock some new options we don't currently have, but there's no behavior that you can get today which you wouldn't be able to get any more under this proposal. I'm mainly concerned that (AAIU), you propose to remove the current INHERIT/NOINHERIT attribute of roles. I wouldn't like that. If you want a feature that allows overriding that per-grant, maybe that's okay.
[PATCH] Add sortsupport for range types and btree_gist
Hi all! The motivation behind this is that incrementally building up a GiST index for certain input data can create a terrible tree structure. Furthermore, exclusion constraints are commonly implemented using GiST indices and for that use case, data is mostly orderable. By sorting the data before inserting it into the index results in a much better index structure, leading to significant performance improvements. Testing was done using following setup, with about 50 million rows: CREATE EXTENSION btree_gist; CREATE TABLE t (id uuid, block_range int4range); CREATE INDEX ON before USING GIST (id, block_range); COPY t FROM '..' DELIMITER ',' CSV HEADER; using SELECT * FROM t WHERE id = '..' AND block_range && '..' as test query, using a unpatched instance and one with the patch applied. Some stats for fetching 10,000 random rows using the query above, 100 iterations to get good averages. The benchmarking was done on a unpatched instance compiled using the exact same options as with the patch applied. [ Results are noted in a unpatched -> patched fashion. ] First set of results are after the initial CREATE TABLE, CREATE INDEX and a COPY to the table, thereby incrementally building the index. Shared Hit Blocks (average): 110.97 -> 78.58 Shared Read Blocks (average): 58.90 -> 47.42 Execution Time (average): 1.10 -> 0.83 ms I/O Read Time (average): 0.19 -> 0.15 ms After a REINDEX on the table, the results improve even more: Shared Hit Blocks (average): 84.24 -> 8.54 Shared Read Blocks (average): 49.89 -> 0.74 Execution Time (average): 0.84 -> 0.065 ms I/O Read Time (average): 0.16 -> 0.004 ms Additionally, the time a REINDEX takes also improves significantly: 672407.584 ms (11:12.408) -> 130670.232 ms (02:10.670) Most of the sortsupport for btree_gist was implemented by re-using already existing infrastructure. For the few remaining types (bit, bool, cash, enum, interval, macaddress8 and time) I manually implemented them directly in btree_gist. It might make sense to move them into the backend for uniformity, but I wanted to get other opinions on that first. `make check-world` reports no regressions. Attached below, besides the patch, are also two scripts for benchmarking. `bench-gist.py` to benchmark the actual patch, example usage of this would be e.g. `./bench-gist.py -o results.csv public.table`. This expects a local instance with no authentication and default `postgres` user. The port can be set using the `--port` option. `plot.py` prints average values (as used above) and creates boxplots for each statistic from the result files produced with `bench-gist.py`. Depends on matplotlib and pandas. Additionally, if needed, the sample dataset used to benchmark this is available to independently verify the results [1]. Thanks, Christoph Heiss --- [1] https://drive.google.com/file/d/1SKRiUYd78_zl7CeD8pLDoggzCCh0wj39From 22e1b60929c39309e06c2477d8d027e60ad49b9d Mon Sep 17 00:00:00 2001 From: Christoph Heiss Date: Thu, 9 Jun 2022 17:07:46 +0200 Subject: [PATCH 1/1] Add sortsupport for range types and btree_gist Incrementally building up a GiST index can result in a sub-optimal index structure for range types. By sorting the data before inserting it into the index will result in a much better index. This can provide sizeable improvements in query execution time, I/O read time and shared block hits/reads. Signed-off-by: Christoph Heiss --- contrib/btree_gist/Makefile | 3 +- contrib/btree_gist/btree_bit.c | 19 contrib/btree_gist/btree_bool.c | 22 contrib/btree_gist/btree_cash.c | 22 contrib/btree_gist/btree_enum.c | 19 contrib/btree_gist/btree_gist--1.7--1.8.sql | 105 contrib/btree_gist/btree_gist.control | 2 +- contrib/btree_gist/btree_interval.c | 19 contrib/btree_gist/btree_macaddr8.c | 19 contrib/btree_gist/btree_time.c | 19 src/backend/utils/adt/rangetypes_gist.c | 70 + src/include/catalog/pg_amproc.dat | 3 + src/include/catalog/pg_proc.dat | 3 + 13 files changed, 323 insertions(+), 2 deletions(-) create mode 100644 contrib/btree_gist/btree_gist--1.7--1.8.sql diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile index 48997c75f6..4096de73ea 100644 --- a/contrib/btree_gist/Makefile +++ b/contrib/btree_gist/Makefile @@ -33,7 +33,8 @@ EXTENSION = btree_gist DATA = btree_gist--1.0--1.1.sql \ btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql \ btree_gist--1.3--1.4.sql btree_gist--1.4--1.5.sql \ - btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql + btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql \ + btree_gist--1.7--1.8.sql PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes" REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp t
Re: [PATCH] Completed unaccent dictionary with many missing characters
Two fixes (bad comment and fixed Latin-ASCII.xml). Michael Paquier wrote on 17.05.2022 09:11: On Thu, May 05, 2022 at 09:44:15PM +0200, Przemysław Sztoch wrote: Tom, I disagree with you because many similar numerical conversions are already taking place, e.g. 1/2, 1/4... This part sounds like a valid argument to me. unaccent.rules does already the conversion of some mathematical signs, and the additions proposed in the patch don't look that weird to me. I agree with Peter and Przemysław that this is reasonable. -- Michael -- Przemysław Sztoch | Mobile +48 509 99 00 66 commit 5ea0f226e1d0d7f0cc7b32fa4bd1bc6ed120c194 Author: Przemyslaw Sztoch Date: Wed May 4 18:28:59 2022 +0200 Update unnaccent rules generator. diff --git a/contrib/unaccent/generate_unaccent_rules.py b/contrib/unaccent/generate_unaccent_rules.py index c405e231b3..d94a2a3bf6 100644 --- a/contrib/unaccent/generate_unaccent_rules.py +++ b/contrib/unaccent/generate_unaccent_rules.py @@ -40,6 +40,7 @@ sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer) # language knowledge. PLAIN_LETTER_RANGES = ((ord('a'), ord('z')), # Latin lower case (ord('A'), ord('Z')), # Latin upper case + (ord('0'), ord('9')), # Digits (0x03b1, 0x03c9), # GREEK SMALL LETTER ALPHA, GREEK SMALL LETTER OMEGA (0x0391, 0x03a9)) # GREEK CAPITAL LETTER ALPHA, GREEK CAPITAL LETTER OMEGA @@ -139,17 +140,17 @@ def get_plain_letter(codepoint, table): return codepoint # Should not come here -assert(False) +assert False, 'Codepoint U+%0.2X' % codepoint.id def is_ligature(codepoint, table): """Return true for letters combined with letters.""" -return all(is_letter(table[i], table) for i in codepoint.combining_ids) +return all(i in table and is_letter(table[i], table) for i in codepoint.combining_ids) def get_plain_letters(codepoint, table): """Return a list of plain letters from a ligature.""" -assert(is_ligature(codepoint, table)) +assert is_ligature(codepoint, table), 'Codepoint U+%0.2X' % codepoint.id return [get_plain_letter(table[id], table) for id in codepoint.combining_ids] @@ -248,7 +249,7 @@ def main(args): # walk through all the codepoints looking for interesting mappings for codepoint in all: if codepoint.general_category.startswith('L') and \ - len(codepoint.combining_ids) > 1: + len(codepoint.combining_ids) > 0: if is_letter_with_marks(codepoint, table): charactersSet.add((codepoint.id, chr(get_plain_letter(codepoint, table).id))) @@ -257,6 +258,13 @@ def main(args): "".join(chr(combining_codepoint.id) for combining_codepoint in get_plain_letters(codepoint, table +elif codepoint.general_category.startswith('N') and \ + len(codepoint.combining_ids) > 0 and \ + args.noLigaturesExpansion is False and is_ligature(codepoint, table): +charactersSet.add((codepoint.id, + "".join(chr(combining_codepoint.id) + for combining_codepoint + in get_plain_letters(codepoint, table elif is_mark_to_remove(codepoint): charactersSet.add((codepoint.id, None)) diff --git a/contrib/unaccent/unaccent.rules b/contrib/unaccent/unaccent.rules index 3030166ed6..ff484d09d7 100644 --- a/contrib/unaccent/unaccent.rules +++ b/contrib/unaccent/unaccent.rules @@ -1,9 +1,15 @@ ¡ ! © (C) +ª a « << - ® (R) ± +/- +² 2 +³ 3 +µ μ +¹ 1 +º o » >> ¼ 1/4 ½ 1/2 @@ -402,6 +408,11 @@ ʦ ts ʪ ls ʫ lz +ʰ h +ʲ j +ʳ r +ʷ w +ʸ y ʹ ' ʺ " ʻ ' @@ -417,6 +428,9 @@ ˖ + ˗ - ˜ ~ +ˡ l +ˢ s +ˣ x ̀ ́ ̂ @@ -536,6 +550,17 @@ ό ο ύ υ ώ ω +ϐ β +ϑ θ +ϒ Υ +ϕ φ +ϖ π +ϰ κ +ϱ ρ +ϲ ς +ϴ Θ +ϵ ε +Ϲ Σ Ё Е ё е ᴀ A @@ -556,6 +581,50 @@ ᴠ V ᴡ W ᴢ Z +ᴬ A +ᴮ B +ᴰ D +ᴱ E +ᴳ G +ᴴ H +ᴵ I +ᴶ J +ᴷ K +ᴸ L +ᴹ M +ᴺ N +ᴼ O +ᴾ P +ᴿ R +ᵀ T +ᵁ U +ᵂ W +ᵃ a +ᵇ b +ᵈ d +ᵉ e +ᵍ g +ᵏ k +ᵐ m +ᵒ o +ᵖ p +ᵗ t +ᵘ u +ᵛ v +ᵝ β +ᵞ γ +ᵟ δ +ᵠ φ +ᵡ χ +ᵢ i +ᵣ r +ᵤ u +ᵥ v +ᵦ β +ᵧ γ +ᵨ ρ +ᵩ φ +ᵪ χ ᵫ ue ᵬ b ᵭ d @@ -592,6 +661,10 @@ ᶓ e ᶖ i ᶙ u +ᶜ c +ᶠ f +ᶻ z +ᶿ θ Ḁ A ḁ a Ḃ B @@ -947,12 +1020,19 @@ Ὦ
Re: Add header support to text format and matching feature
Julien Rouhaud wrote: > Maybe that's just me but I understand "not supported" as "this makes > sense, but this is currently a limitation that might be lifted > later". Looking at ProcessCopyOptions(), there are quite a few invalid combinations of options that produce ERRCODE_FEATURE_NOT_SUPPORTED currently: - HEADER in binary mode - FORCE_QUOTE outside of csv - FORCE_QUOTE outside of COPY TO - FORCE_NOT_NULL outside of csv - FORCE_NOT_NULL outside of COPY FROM - ESCAPE outside of csv - delimiter appearing in the NULL specification - csv quote appearing in the NULL specification FORCE_QUOTE and FORCE_NOT_NULL are options that only make sense in one direction, so the errors when using these in the wrong direction are comparable to the "HEADER MATCH outside of COPY FROM" error that we want to add. In that sense, ERRCODE_FEATURE_NOT_SUPPORTED would be consistent. The other errors in the list above are more about the format itself, with options that make sense only for one format. So the way we're supposed to understand ERRCODE_FEATURE_NOT_SUPPORTED in these other cases is that such format does not support such feature, but without implying that it's a limitation. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Small TAP improvements
On 2022-06-14 Tu 19:24, Michael Paquier wrote: > On Tue, Jun 14, 2022 at 05:08:28PM -0400, Tom Lane wrote: >> Andrew Dunstan writes: >>> OK, here's a more principled couple of patches. For config_data, if you >>> give multiple options it gives you back the list of values. If you don't >>> specify any, in scalar context it just gives you back all of pg_config's >>> output, but in array context it gives you a map, so you should be able >>> to say things like: >>> my %node_config = $node->config_data; >> Might be overkill, but since you wrote it already, looks OK to me. > + # exactly one option: hand back the output (minus LF) > + return $stdout if (@options == 1); > + my @lines = split(/\n/, $stdout); > + # more than one option: hand back the list of values; > + return @lines if (@options); > + # no options, array context: return a map > + my @map; > + foreach my $line (@lines) > + { > + my ($k,$v) = split (/ = /,$line,2); > + push(@map, $k, $v); > + } > > This patch is able to handle the case of no option and one option > specified by the caller of the routine. However, pg_config is able to > return a set of values when specifying multiple switches, respecting > the order of the switches, so wouldn't it be better to return a map > made of ($option, $line)? For example, on a command like `pg_config > --sysconfdir --`, we would get back: > (('--sysconfdir', sysconfdir_val), ('--localedir', localedir_val)) > > If this is not worth the trouble, I think that you'd better die() hard > if the caller specifies more than two option switches. My would we do that? If you want a map don't pass any switches. But as written you could do: my ($incdir, $localedir, $sharedir) = $node->config_data(qw(--includedir --localedir --sharedir)); No map needed to get what you want, in fact a map would get in the way. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Small TAP improvements
On 2022-06-14 Tu 19:13, Michael Paquier wrote: > On Tue, Jun 14, 2022 at 12:20:56PM -0400, Tom Lane wrote: >> Andrew Dunstan writes: >>> The second changes the new GUCs TAP test to check against the installed >>> postgresql.conf.sample rather than the one in the original source >>> location. There are probably arguments both ways, but if we ever decided >>> to postprocess the file before installation, this would do the right thing. >> Seems like a good idea, especially since it also makes the test code >> shorter and more robust(-looking). > It seems to me that you did not look at the git history very closely. > The first version of 003_check_guc.pl did exactly what 0002 is > proposing to do, see b0a55f4. That's also why config_data() has been > introduced in the first place. This original logic has been reverted > once shortly after, as of 52377bb, per a complain by Christoph Berg > because this broke some of the assumptions the custom patches of > Debian relied on: > https://www.postgresql.org/message-id/ygyw25oxv5men...@msg.df7cb.de Quite right, I missed that. Still, it now seems to be moot, given what Christoph said at the bottom of the thread. If I'd seen the thread I would probably have been inclined to say that is Debian can patch pg_config they can also patch the test :-) > > And it was also pointed out that we'd better use the version in the > source tree rather than a logic that depends on finding the path from > the output of pg_config with an installation tree assumed to exist > (there should be one for installcheck anyway), as of: > https://www.postgresql.org/message-id/2023925.1644591...@sss.pgh.pa.us > > If the change of 0002 is applied, we will just loop back to the > original issue with Debian. So I am adding Christoph in CC, as he has > also mentioned that the patch applied to PG for Debian that > manipulates the installation paths has been removed, but I may be > wrong in assuming that it is the case. Honestly, I don't care all that much. I noticed these issues when dealing with something for EDB that turned out not to be related to these things. I can see arguments both ways on this one. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, Jun 14, 2022 at 9:07 AM wangw.f...@fujitsu.com wrote: > > > Attach the new patches. > Only changed patches 0001, 0004 and added new separate patch 0005. > Few questions/comments on 0001 === 1. In the commit message, I see: "We also need to allow stream_stop to complete by the apply background worker to avoid deadlocks because T-1's current stream of changes can update rows in conflicting order with T-2's next stream of changes." Thinking about this, won't the T-1 and T-2 deadlock on the publisher node as well if the above statement is true? 2. + +The apply background workers are taken from the pool defined by +max_logical_replication_workers. + + +The default value is 3. This parameter can only be set in the +postgresql.conf file or on the server command +line. + Is there a reason to choose this number as 3? Why not 2 similar to max_sync_workers_per_subscription? 3. + + + Setting streaming mode to apply could export invalid LSN + as finish LSN of failed transaction. Changing the streaming mode and making + the same conflict writes the finish LSN of the failed transaction in the + server log if required. + How will the user identify that this is an invalid LSN value and she shouldn't use it to SKIP the transaction? Can we change the second sentence to: "User should change the streaming mode to 'on' if they would instead wish to see the finish LSN on error. Users can use finish LSN to SKIP applying the transaction." I think we can give reference to docs where the SKIP feature is explained. 4. + * This file contains routines that are intended to support setting up, using, + * and tearing down a ApplyBgworkerState. + * Refer to the comments in file header of logical/worker.c to see more + * informations about apply background worker. Typo. /informations/information. Consider having an empty line between the above two lines. 5. +ApplyBgworkerState * +apply_bgworker_find_or_start(TransactionId xid, bool start) { ... ... + if (!TransactionIdIsValid(xid)) + return NULL; + + /* + * We don't start new background worker if we are not in streaming apply + * mode. + */ + if (MySubscription->stream != SUBSTREAM_APPLY) + return NULL; + + /* + * We don't start new background worker if user has set skiplsn as it's + * possible that user want to skip the streaming transaction. For + * streaming transaction, we need to spill the transaction to disk so that + * we can get the last LSN of the transaction to judge whether to skip + * before starting to apply the change. + */ + if (start && !XLogRecPtrIsInvalid(MySubscription->skiplsn)) + return NULL; + + /* + * For streaming transactions that are being applied in apply background + * worker, we cannot decide whether to apply the change for a relation + * that is not in the READY state (see should_apply_changes_for_rel) as we + * won't know remote_final_lsn by that time. So, we don't start new apply + * background worker in this case. + */ + if (start && !AllTablesyncsReady()) + return NULL; ... ... } Can we move some of these starting checks to a separate function like canstartapplybgworker()? -- With Regards, Amit Kapila.
Re: Replica Identity check of partition table on subscriber
On Wed, Jun 15, 2022 at 8:52 AM shiy.f...@fujitsu.com wrote: > > On Tue, Jun 14, 2022 8:57 PM Amit Kapila wrote: > > > > > v4-0002 looks good btw, except the bitpick about test comment similar > > > to my earlier comment regarding v5-0001: > > > > > > +# Change the column order of table on publisher > > > > > > I think it might be better to say something specific to describe the > > > test intent, like: > > > > > > Test that replication into partitioned target table continues to works > > > correctly when the published table is altered > > > > > > > Okay changed this and slightly modify the comments and commit message. > > I am just attaching the HEAD patches for the first two issues. > > > > Thanks for updating the patch. > > Attached the new patch set which ran pgindent, and the patches for pg14 and > pg13. (Only the first two patches of the patch set.) > I have pushed the first bug-fix patch today. -- With Regards, Amit Kapila.
Re: [PATCH] Compression dictionaries for JSONB
Hi Matthias, > The bulk of the patch > should still be usable, but I think that the way it interfaces with > the CREATE TABLE (column ...) APIs would need reworking to build on > top of the api's of the "pluggable toaster" patches (so, creating > toasters instead of types). I think that would allow for an overall > better user experience and better performance due to decreased need > for fully decompressed type casting. Many thanks for the feedback. The "pluggable TOASTer" patch looks very interesting indeed. I'm currently trying to make heads and tails of it and trying to figure out if it can be used as a base for compression dictionaries, especially for implementing the partial decompression. Hopefully I will be able to contribute to it and to the dependent patch [1] in the upcoming CF, at least as a tester/reviewer. Focusing our efforts on [1] for now seems to be a good strategy. My current impression of your idea is somewhat mixed at this point though. Teodor's goal is to allow creating _extensions_ that implement alternative TOAST strategies, which use alternative compression algorithms and/or use the knowledge of the binary representation of the particular type. For sure, this would be a nice thing to have. However, during the discussion of the "compression dictionaries" RFC the consensus was reached that the community wants to see it as a _built_in_ functionality rather than an extension. Otherwise we could simply add ZSON to /contrib/ as it was originally proposed. So if we are going to keep "compression dictionaries" a built-in functionality, putting artificial constraints on its particular implementation, or adding artificial dependencies of two rather complicated patches, is arguably a controversial idea. Especially considering the fact that it was shown that the feature can be implemented without these dependencies, in a very non-invasive way. These are just my initial thoughts I would like to share though. I may change my mind after diving deeper into a "pluggable TOASTer" patch. I cc:'ed Teodor in case he would like to share his insights on the topic. [1]: https://commitfest.postgresql.org/38/3479/ -- Best regards, Aleksander Alekseev
Re: replacing role-level NOINHERIT with a grant-level option
On Wed, Jun 15, 2022 at 5:23 AM Peter Eisentraut wrote: > > Consider a user who in general prefers the NOINHERIT behavior but also > > wants to use predefined roles. Perhaps user 'peter' is to be granted > > both 'paul' and 'pg_execute_server_programs'. If role 'peter' is set > > to INHERIT, Peter will be sad, because his love for NOINHERIT probably > > means that he doesn't want to exercise Paul's privileges > > automatically. However, he needs to inherit the privileges of > > 'pg_execute_server_programs' or they are of no use to him. Peter > > presumably wants to use COPY TO/FROM program to put data into a table > > owned by 'peter', not a table owned by 'pg_execute_server_programs'. > > If so, being able to SET ROLE to 'pg_execute_server_programs' is of no > > use to him at all, but inheriting the privilege is useful. > > That's because our implementation of SET ROLE is bogus. We should have > a SET ROLE that is separate from SET SESSION AUTHORIZATION, where the > current user can keep their current user-ness and additionally enable > (non-inherited) roles. It would help me to have a better description of what you think the behavior ought to be. I've always thought there was something funny about SET ROLE and SET SESSION AUTHORIZATION, because it seems like they are too similar to each other. But it would surprise me if SET ROLE added additional privileges to my session while leaving the old ones intact, too, much as I'd be surprised if SET work_mem = '8MB' followed by SET work_mem = '1GB' somehow left both values partly in effect at the same time. It feels to me like SET is describing an action that changes the session state, rather than adding to it. > I'm mainly concerned that (AAIU), you propose to remove the current > INHERIT/NOINHERIT attribute of roles. I wouldn't like that. If you > want a feature that allows overriding that per-grant, maybe that's okay. Yeah, I want to remove it and replace it with something more fine-grained. I don't yet understand why that's a problem for anything you want to do. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Add sortsupport for range types and btree_gist
Hi Christoph! > On 15 Jun 2022, at 15:45, Christoph Heiss wrote: > > By sorting the data before inserting it into the index results in a much > better index structure, leading to significant performance improvements. Here's my version of the very similar idea [0]. It lacks range types support. On a quick glance your version lacks support of abbreviated sort, so I think benchmarks can be pushed event further :) Let's merge our efforts and create combined patch? Please, create a new entry for the patch on Commitfest. Thank you! Best regards, Andrey Borodin. [0] https://commitfest.postgresql.org/37/2824/
Re: CREATE TABLE ( .. STORAGE ..)
Hi hackers, I noticed that cfbot is not entirely happy with the patch, so I rebased it. > I see that COMPRESSION and STORAGE now are handled slightly > differently in the grammar. Maybe we could standardize that a bit > more; so that we have only one `STORAGE [kind]` definition in the > grammar? > > As I'm new to the grammar files; would you know the difference between > `name` and `ColId`, and why you would change from one to the other in > ALTER COLUMN STORAGE? Good point, Matthias. I addressed this in 0002. Does it look better now? -- Best regards, Aleksander Alekseev v3-0001-CREATE-TABLE-.-STORAGE.patch Description: Binary data v3-0002-Handle-SET-STORAGE-and-SET-COMPRESSION-similarly.patch Description: Binary data
Re: support for MERGE
Pushed this. I noticed that the paragraph that described MERGE in the read-committed subsection had been put in the middle of some other paras that describe interactions with other transactions, so I moved it up so that it appears together with all the other paras that describe the behavior of specific commands, which is where I believe it belongs. Thanks! -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Thou shalt study thy libraries and strive not to reinvent them without cause, that thy code may be short and readable and thy days pleasant and productive. (7th Commandment for C Programmers)
Re: Remove trailing newlines from pg_upgrade's messages
Kyotaro Horiguchi writes: > Also leading newlines and just "\n" bug me when I edit message > catalogues with poedit. I might want a line-spacing function like > pg_log_newline(PG_REPORT) if we need line-breaks in the ends of a > message. Yeah, that is sort of the inverse problem. I think those are there to ensure that the text appears on a fresh line even if the current line has transient status on it. We could get rid of those perhaps if we teach pg_log_v to remember whether it ended the last output with a newline or not, and then put out a leading newline only if necessary, rather than hard-wiring one into the message texts. This might take a little bit of fiddling to make it work, because we'd not want the extra newline when completing an incomplete line by adding status. That would mean that report_status would have to do something special, plus we'd have to be sure that all such cases do go through report_status rather than calling pg_log directly. (I'm fairly sure that the code is sloppy about that today :-(.) It seems probably do-able, though. regards, tom lane
Re: better page-level checksums
On Wed, Jun 15, 2022 at 4:54 AM Peter Eisentraut wrote: > It's hard to get any definite information about what size of checksum is > "good enough", since after all it depends on what kinds of errors you > expect and what kinds of probabilities you want to accept. But the best > I could gather so far is that 16-bit CRC are good until about 16 kB > block size. Not really. There's a lot of misinformation on this topic floating around on this mailing list, and some of that misinformation is my fault. I keep learning more about this topic. However, I'm pretty confident that, on the one hand, there's no hard limit on the size of the data that can be effectively validated via a CRC, and on the other hand, CRC isn't a particularly great algorithm, although it does have certain interesting advantages for certain purposes. For example, according to https://en.wikipedia.org/wiki/Mathematics_of_cyclic_redundancy_checks#Error_detection_strength a CRC is guaranteed to detect all single-bit errors. This property is easy to achieve: for example, a parity bit has this property. According to the same source, a CRC is guaranteed to detect two-bit errors only if the distance between them is less than some limit that gets larger as the CRC gets wider. Imagine that you have a CRC-16 of a message 64k+1 bits in length. Suppose that an error in the first bit changes the result from v to v'. Can we, by flipping a second bit later in the message, change the final result from v' back to v? The calculation only has 64k possible answers, and we have 64k bits we can flip to try to get the desired answer. If every one of those bit flips produces a different answer, then one of those answers must be v -- which means detection of two-bits errors is not guaranteed. If at least two of those bit flips produce the same answer, then consider the messages produced by those two different bit flips. They differ from each other by exactly two bits and yet produced the same CRC, so detection of two-bit errors is still not guaranteed. On the other hand, it's still highly likely. If a message of length 2^16+1 bits contains two bit errors one of which is in the first bit, the chances that the other one is in exactly the right place to cancel out the first error are about 2^-16. That's not zero, but it's just as good as our chances of detecting a replacement of the entire message with some other message chosen completely at random. I think the reason why discussion of CRCs tends to focus on the types of bit errors that it can detect is that the algorithm was designed when people were doing stuff like sending files over a modem. It's easy to understand how individual bits could get garbled without anybody noticing, while large-scale corruption would be less likely, but the risks are not necessarily the same for a PostgreSQL data file. Lower levels of the stack are probably already using checksums to try to detect errors at the level of the physical medium. I'm sure some stuff slips through the cracks, but in practice we also see failure modes where the filesystem substitutes 8kB of data from an unrelated file, or where a torn write in combination with unreliable fsync results in half of the page contents being from an older version of the page. These kinds of large-scale replacements aren't what CRCs are designed to detect, and the chances that we will detect them are roughly 1-2^-bits, whether we use a CRC or something else. Of course, that partly depends on the algorithm quality. If an algorithm is more likely to generate some results than others, then its actual error detection rate will not be as good as the number of output bits would suggest. If the result doesn't depend equally on every input bit, then the actual error detection rate will not be as good as the number of output bits would suggest. And CRC-32 is apparently not great by modern standards: https://github.com/rurban/smhasher Compare the results for CRC-32 with, say, Spooky32. Apparently the latter is faster yet produces better output. So maybe we would've been better off if we'd made Spooky32 the default algorithm for backup manifest checksums rather than CRC-32. > The benefits of doubling the checksum size are exponential rather than > linear, so there is no significant benefit of using a 64-bit checksum > over a 32-bit one, for supported block sizes (current max is 32 kB). I'm still unconvinced that the block size is very relevant here. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
On Wed, Jun 15, 2022 at 1:51 AM Michael Paquier wrote: > Handle is more consistent with the other types of interruptions in the > SIGUSR1 handler, so the name proposed in the patch in not that > confusing to me. And so does Process, in spirit with > ProcessProcSignalBarrier() and ProcessLogMemoryContextInterrupt(). > While on it, is postgres.c the best home for > HandleRecoveryConflictInterrupt()? That's a very generic file, for > starters. Not related to the actual bug, just asking. Yeah, there's existing precedent for this kind of split in, for example, HandleCatchupInterrupt() and ProcessCatchupInterrupt(). I think the idea is that "process" is supposed to sound like the more involved part of the operation, whereas "handle" is supposed to sound like the initial response to the signal. I'm not sure it's the clearest possible naming, but naming things is hard, and this patch is apparently not inventing a new way to do it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: "buffer too small" or "path too long"?
On Wed, Jun 15, 2022 at 2:51 AM Peter Eisentraut wrote: > We have this problem of long file names being silently truncated all > over the source code. Instead of equipping each one of them with a > length check, why don't we get rid of the fixed-size buffers and > allocate dynamically, as in the attached patch. I've always wondered why we rely on MAXPGPATH instead of dynamic allocation. It seems pretty lame. I don't know how much we gain by fixing one place and not all the others, but maybe it would set a trend. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Skipping logical replication transactions on subscriber side
On Tue, Jun 14, 2022 at 3:54 AM Masahiko Sawada wrote: > > AFAICS, we could do that by: > > > > 1. De-supporting platforms that have this problem, or > > 2. Introducing new typalign values, as Noah proposed back on April 2, or > > 3. Somehow forcing values that are sometimes 4-byte aligned and > > sometimes 8-byte aligned to be 8-byte alignment on all platforms > > Introducing new typalign values seems a good idea to me as it's more > future-proof. Will this item be for PG16, right? The main concern > seems that what this test case enforces would be nuisance when > introducing a new system catalog or a new column to the existing > catalog but given we're in post PG15-beta1 it is unlikely to happen in > PG15. I agree that we're not likely to introduce a new typalign value any sooner than v16. There are a couple of things that bother me about that solution. One is that I don't know how many different behaviors exist out there in the wild. If we distinguish the alignment of double from the alignment of int8, is that good enough, or are there other data types whose properties aren't necessarily the same as either of those? The other is that 32-bit systems are already relatively rare and probably will become more rare until they disappear completely. It doesn't seem like a ton of fun to engineer solutions to problems that may go away by themselves with the passage of time. On the other hand, if the alternative is to live with this kind of ugliness for another 5 years, maybe the time it takes to craft a solution is effort well spent. -- Robert Haas EDB: http://www.enterprisedb.com
Re: docs: mention "pg_read_all_stats" in "track_activities" description
On Tue, Jun 07, 2022 at 10:08:21PM +0900, Ian Lawrence Barwick wrote: > A little late to the party, but as an alternative suggestion for the last > part: > > "... and users who either own the session being reported on, or who have > privileges of the role to which the session belongs," > > so the whole sentence would read: > > Note that even when enabled, this information is only visible to superusers, > roles with privileges of the pg_read_all_stats role, and users who either > own > the session being reported on or who have privileges of the role to which > the > session belongs, so it should not represent a security risk. This seems clearer to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: "buffer too small" or "path too long"?
Robert Haas writes: > On Wed, Jun 15, 2022 at 2:51 AM Peter Eisentraut > wrote: >> We have this problem of long file names being silently truncated all >> over the source code. Instead of equipping each one of them with a >> length check, why don't we get rid of the fixed-size buffers and >> allocate dynamically, as in the attached patch. > I don't know how much we gain by fixing one place and not all the > others, but maybe it would set a trend. Yeah, that was what was bugging me about this proposal. Removing one function's dependency on MAXPGPATH isn't much of a step forward. I note also that the patch leaks quite a lot of memory (a kilobyte or so per pathname, IIRC). That's probably negligible in this particular context, but anyplace that was called more than once per program run would need to be more tidy. regards, tom lane
Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
On 2022-Jun-10, Kyotaro Horiguchi wrote: > (Moved to -hackers) > > At Wed, 8 Jun 2022 17:08:47 +0200, Alvaro Herrera > wrote in > > What that Close message is doing is closing the unnamed portal, which > > is otherwise closed implicitly when the next one is opened. That's how > > single-query mode works: if you run a single portal, it'll be kept open. > > > > I believe that the right fix is to not send that Close message in > > PQsendQuery. > > Agreed. At least Close message in that context is useless and > PQsendQueryGuts doesn't send it. And removes the Close message surely > fixes the issue. So, git archaeology led me to this thread https://postgr.es/m/202106072107.d4i55hdscxqj@alvherre.pgsql which is why we added that message in the first place. I was about to push the attached patch (a merged version of Kyotaro's and mine), but now I'm wondering if that's the right approach. Alternatives: - Have the client not complain if it gets CloseComplete in idle state. (After all, it's a pretty useless message, since we already do nothing with it if we get it in BUSY state.) - Have the server not send CloseComplete at all in the cases where the client is not expecting it. Not sure how this would be implemented. - Other ideas? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "That sort of implies that there are Emacs keystrokes which aren't obscure. I've been using it daily for 2 years now and have yet to discover any key sequence which makes any sense."(Paul Thomas) >From d9f0ff57fe8aa7f963a9411741bb1d68082cc31a Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 15 Jun 2022 19:56:41 +0200 Subject: [PATCH v2] PQsendQuery: Don't send Close in pipeline mode In commit 4efcf47053ea, we modified PQsendQuery to send a Close message when in pipeline mode. But now we discover that that's not a good thing: under certain circumstances, it causes the server to deliver a CloseComplete message at a time when the client is not expecting it. We failed to noticed it because the tests don't have any scenario where the problem is hit. Remove the offending Close, and add a test case that tickles the problematic scenario. Co-authored-by: Kyotaro Horiguchi Reported-by: Daniele Varrazzo Discussion: https://postgr.es/m/ca+mi_8bvd0_cw3sumgwpvwdnzxy32itog_16tdyru_1s2gv...@mail.gmail.com --- src/interfaces/libpq/fe-exec.c| 5 -- .../modules/libpq_pipeline/libpq_pipeline.c | 62 +++ .../traces/pipeline_abort.trace | 2 - .../traces/simple_pipeline.trace | 12 4 files changed, 74 insertions(+), 7 deletions(-) diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index 4180683194..e2df3a3480 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -1403,11 +1403,6 @@ PQsendQueryInternal(PGconn *conn, const char *query, bool newQuery) pqPutInt(0, 4, conn) < 0 || pqPutMsgEnd(conn) < 0) goto sendFailed; - if (pqPutMsgStart('C', conn) < 0 || - pqPutc('P', conn) < 0 || - pqPuts("", conn) < 0 || - pqPutMsgEnd(conn) < 0) - goto sendFailed; entry->queryclass = PGQUERY_EXTENDED; entry->query = strdup(query); diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c index c27c4e0ada..e24fbfe1cc 100644 --- a/src/test/modules/libpq_pipeline/libpq_pipeline.c +++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c @@ -968,15 +968,29 @@ test_prepared(PGconn *conn) fprintf(stderr, "ok\n"); } +/* Notice processor: print them, and count how many we got */ +static void +notice_processor(void *arg, const char *message) +{ + int *n_notices = (int *) arg; + + (*n_notices)++; + fprintf(stderr, "NOTICE %d: %s", *n_notices, message); +} + static void test_simple_pipeline(PGconn *conn) { PGresult *res = NULL; const char *dummy_params[1] = {"1"}; Oid dummy_param_oids[1] = {INT4OID}; + PQnoticeProcessor oldproc; + int n_notices = 0; fprintf(stderr, "simple pipeline... "); + oldproc = PQsetNoticeProcessor(conn, notice_processor, &n_notices); + /* * Enter pipeline mode and dispatch a set of operations, which we'll then * process the results of as they come in. @@ -1052,6 +1066,54 @@ test_simple_pipeline(PGconn *conn) if (PQpipelineStatus(conn) != PQ_PIPELINE_OFF) pg_fatal("Exiting pipeline mode didn't seem to work"); + if (n_notices > 0) + pg_fatal("unexpected notice"); + + /* Try the same thing with PQsendQuery */ + if (PQenterPipelineMode(conn) != 1) + pg_fatal("failed to enter pipeline mode: %s", PQerrorMessage(conn)); + + if (PQsendQuery(conn, "SELECT 1") != 1) + pg_fatal("failed to send query: %s", PQerrorMessage(conn)); + PQsendFlushRequest(conn); + res = PQgetResult(conn); + if (res == NULL) + pg_fatal("PQgetResult returned null when there's a pipeline item: %s", + PQerrorMessage(conn)); + if (PQresultStatus(res) != PGRE
Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
Alvaro Herrera writes: > So, git archaeology led me to this thread > https://postgr.es/m/202106072107.d4i55hdscxqj@alvherre.pgsql > which is why we added that message in the first place. Um. Good thing you looked. I doubt we want to revert that change now. > Alternatives: > - Have the client not complain if it gets CloseComplete in idle state. > (After all, it's a pretty useless message, since we already do nothing > with it if we get it in BUSY state.) ISTM the actual problem here is that we're reverting to IDLE state too soon. I didn't try to trace down exactly where that's happening, but I notice that in the non-pipeline case we don't go to IDLE till we've seen 'Z' (Sync). Something in the pipeline logic must be jumping the gun on that state transition. regards, tom lane
Re: pg_upgrade (12->14) fails on aggregate
Justin Pryzby writes: > But Petr has a point - pg_upgrade should aspire to catch errors in --check, > rather than starting and then leaving a mess behind for the user to clean up Agreed; pg_upgrade has historically tried to find problems similar to this. However, it's not just aggregates that are at risk. People might also have built user-defined plain functions, or operators, atop these functions. How far do we want to go in looking? As for the query, I think it could be simplified quite a bit by relying on regprocedure literals, that is something like WHERE ... a.aggtransfn IN ('array_append(anyarray,anyelement)'::regprocedure, 'array_prepend(anyelement,anyarray)'::regprocedure, ...) Not sure if it's necessary to stick explicit "pg_catalog." schema qualifications into this --- IIRC pg_upgrade runs with restrictive search_path, so that this would be safe as-is. Also, I think you need to check aggfinalfn too. Also, I'd be inclined to reject system-provided objects by checking for OID >= 16384 rather than hard-wiring assumptions about things being in pg_catalog or not. regards, tom lane
Re: better page-level checksums
On Tue, Jun 14, 2022 at 10:30 PM Peter Geoghegan wrote: > Basically I think that this is giving up rather a lot. For example, > isn't it possible that we'd have corruption that could be a bug in > either the checksum code, or in recovery? > > I'd feel a lot better about it if there was some sense of both the > costs and the benefits. I think that, if and when we get TDE, debuggability is likely to be a huge issue. Something will go wrong for someone at some point, and when it does, what they'll have is a supposedly-encrypted page that cannot be decrypted, and it will be totally unclear what has gone wrong. Did the page get corrupted on disk by a random bit flip? Is there a bug in the algorithm? Torn page? As things stand today, when a page gets corrupted, a human being can look at the page and make an educated guess about what has gone wrong and whether PostgreSQL or some other system is to blame, and if it's PostgreSQL, perhaps have some ideas as to where to look for the bug. If the pages are encrypted, that's a lot harder. I think what will happen, depending on the encryption mode, is probably that either (a) the page will decrypt to complete garbage or (b) the page will fail some kind of verification and you won't be able to decrypt it at all. Either way, you won't be able to infer anything about what caused the problem. All you'll know is that something is wrong. That sucks - a lot - and I don't have a lot of good ideas as to what can be done about it. The idea that an encrypted page is unintelligible and that small changes to either the encrypted or unencrypted data should result in large changes to the other is intrinsic to the nature of encryption. It's more or less un-debuggable by design. With extended checksums, I don't think the issues are anywhere near as bad. I'm not deeply opposed to setting a page-level flag but I expect nominal benefits. A human being looking at the page isn't going to have a ton of trouble figuring out whether or not the extended checksum is present unless the page is horribly, horribly garbled, and even if that happens, will debugging that problem really be any worse than debugging a horribly, horribly garbled page today? I don't think so. I likewise expect that pg_filedump could use heuristics to figure out what's going on just by looking at the page, even if no external information is available. You are probably right when you say that there's no need to be so parsimonious with pd_flags space as all that, but I believe that if we did decide to set no bit in pd_flags, whoever maintains pg_filedump these days would not have huge difficulty inventing a suitable heuristic. A page with an extended checksum is basically still an intelligible page, and we shouldn't understate the value of that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: better page-level checksums
On Wed, Jun 15, 2022 at 1:27 PM Robert Haas wrote: > I think what will happen, depending on > the encryption mode, is probably that either (a) the page will decrypt > to complete garbage or (b) the page will fail some kind of > verification and you won't be able to decrypt it at all. Either way, > you won't be able to infer anything about what caused the problem. All > you'll know is that something is wrong. That sucks - a lot - and I > don't have a lot of good ideas as to what can be done about it. The > idea that an encrypted page is unintelligible and that small changes > to either the encrypted or unencrypted data should result in large > changes to the other is intrinsic to the nature of encryption. It's > more or less un-debuggable by design. It's pretty clear that there must be a lot of truth to that. But that doesn't mean that there aren't meaningful gradations beyond that. I think that it's worth doing the following exercise (humor me): Why wouldn't it be okay to just encrypt the tuple space and the line pointer array, leaving both the page header and page special area unencrypted? What kind of user would find that trade-off to be unacceptable, and why? What's the nuance of it? For all I know you're right (about encrypting the whole page, metadata and all). I just want to know why that is. I understand that this whole area is one where in general we may have to live with a certain amount of uncertainty about what really matters. > With extended checksums, I don't think the issues are anywhere near as > bad. I'm not deeply opposed to setting a page-level flag but I expect > nominal benefits. I also expect only a small benefit. But that isn't a particularly important factor in my mind. Let's suppose that it turns out to be significantly more useful than we originally expected, for whatever reason. Assuming all that, what else can be said about it now? Isn't it now *relatively* likely that including that status bit metadata will be *extremely* valuable, and not merely somewhat more valuable? I guess it doesn't matter much now (since you have all but conceded that using a bit for this makes sense), but FWIW that's the main reason why I almost took it for granted that we'd need to use a status bit (or bits) for this. -- Peter Geoghegan
Re: "buffer too small" or "path too long"?
On Wed, Jun 15, 2022 at 02:02:03PM -0400, Tom Lane wrote: > Yeah, that was what was bugging me about this proposal. Removing > one function's dependency on MAXPGPATH isn't much of a step forward. This comes down to out-of-memory vs path length at the end. Changing only the paths of make_outputdirs() without touching all the paths in check.c and the one in function.c does not sound good to me, as this increases the risk of failing pg_upgrade in the middle, and that's what we should avoid, as said upthread. > I note also that the patch leaks quite a lot of memory (a kilobyte or > so per pathname, IIRC). That's probably negligible in this particular > context, but anyplace that was called more than once per program run > would need to be more tidy. Surely. -- Michael signature.asc Description: PGP signature
Re: Small TAP improvements
On Wed, Jun 15, 2022 at 07:59:10AM -0400, Andrew Dunstan wrote: > My would we do that? If you want a map don't pass any switches. But as > written you could do: > > my ($incdir, $localedir, $sharedir) = $node->config_data(qw(--includedir > --localedir --sharedir)); > > No map needed to get what you want, in fact a map would get in the > way. Nice, I didn't know you could do that. That's a pattern worth mentioning in the perldoc part as an example, in my opinion. -- Michael signature.asc Description: PGP signature
Modest proposal to extend TableAM API for controlling cluster commands
Hackers, While developing various Table Access Methods, I have wanted a callback for determining if CLUSTER (and VACUUM FULL) should be run against a table backed by a given TAM. The current API contains a callback for doing the guts of the cluster, but by that time, it's a bit too late to cleanly back out. For single relation cluster commands, raising an error from that callback is probably not too bad. For multi-relation cluster commands, that aborts the clustering of other yet to be processed relations, which doesn't seem acceptable. I tried fixing this with a ProcessUtility_hook, but that fires before the multi-relation cluster command has compiled the list of relations to cluster, meaning the ProcessUtility_hook doesn't have access to the necessary information. (It can be hacked to compile the list of relations itself, but that duplicates both code and effort, with the usual risks that the code will get out of sync.) For my personal development, I have declared a new hook, bool (*relation_supports_cluster) (Relation rel). It gets called from commands/cluster.c in both the single-relation and multi-relation code paths, with warning or debug log messages output for relations that decline to be clustered, respectively. Before posting a patch, do people think this sounds useful? Would you like the hook function signature to differ in some way? Is a simple "yes this relation may be clustered" vs. "no this relation may not be clustered" interface overly simplistic? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: tablesync copy ignores publication actions
On Wed, Jun 15, 2022 at 5:05 PM shiy.f...@fujitsu.com wrote: > ... > Thanks for updating the patch. Two comments: > > 1. > + it means the copied table t3 contains all rows even > when > + they do not patch the row filter of publication > pub3b. > > Typo. I think "they do not patch the row filter" should be "they do not match > the row filter", right? > > 2. > @@ -500,7 +704,6 @@ > > > > - > > > > > It seems we should remove this change. > Thank you for your review comments. Those reported mistakes are fixed in the attached patch v3. -- Kind Regards, Peter Smith. Fujitsu Australia v3-0001-PGDOCS-tablesync-ignores-publish-operation.patch Description: Binary data
Re:
On Thu, Jun 9, 2022 at 11:50 PM Tom Lane wrote: > > Peter Eisentraut writes: > > Initially, that chapter did not document any system views. > > Maybe we could make the system views a separate chapter? +1 -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Modest proposal to extend TableAM API for controlling cluster commands
Hi, On 2022-06-15 17:21:56 -0700, Mark Dilger wrote: > While developing various Table Access Methods, I have wanted a callback for > determining if CLUSTER (and VACUUM FULL) should be run against a table > backed by a given TAM. The current API contains a callback for doing the > guts of the cluster, but by that time, it's a bit too late to cleanly back > out. For single relation cluster commands, raising an error from that > callback is probably not too bad. For multi-relation cluster commands, that > aborts the clustering of other yet to be processed relations, which doesn't > seem acceptable. Why not? What else do you want to do in that case? Silently ignoring non-clusterable tables doesn't seem right either. What's the use-case for swallowing the error? > I tried fixing this with a ProcessUtility_hook, but that > fires before the multi-relation cluster command has compiled the list of > relations to cluster, meaning the ProcessUtility_hook doesn't have access to > the necessary information. (It can be hacked to compile the list of > relations itself, but that duplicates both code and effort, with the usual > risks that the code will get out of sync.) > > For my personal development, I have declared a new hook, bool > (*relation_supports_cluster) (Relation rel). It gets called from > commands/cluster.c in both the single-relation and multi-relation code > paths, with warning or debug log messages output for relations that decline > to be clustered, respectively. Do you actually need to dynamically decide whether CLUSTER is supported? Otherwise we could just make the existing cluster callback optional and error out if a table is clustered that doesn't have the callback. > Before posting a patch, do people think this sounds useful? Would you like > the hook function signature to differ in some way? Is a simple "yes this > relation may be clustered" vs. "no this relation may not be clustered" > interface overly simplistic? It seems overly complicated, if anything ;) Greetings, Andres Freund
Extending USING [heap | mytam | yourtam] grammar and behavior
Hackers, I have extended the grammar to allow "USING NOT method [, ...]" to exclude one or more TAMs in a CREATE TABLE statement. This may sound like a weird thing to do, but it is surprisingly useful when developing new Table Access Methods, particularly when you are developing two or more, not just one. To explain: Developing a new TAM takes an awful lot of testing, and much of it is duplicative of the existing core regression test suite. Leveraging the existing tests saves an awful lot of test development. When developing just one TAM, leveraging the existing tests isn't too hard. Without much work*, you can set default_table_access_method=mytam for the duration of the check-world. You'll get a few test failures this way. Some will be in tests that probe the catalogs to verify that /heap/ is stored there, and instead /mytam/ is found. Others will be tests that are sensitive to the number of rows that fit per page, etc. But a surprising number of tests just pass, at least after you get the TAM itself debugged. When developing two or more TAMs, this falls apart. Some tests may be worth fixing up (perhaps with alternate output files) for "mytam", but not for "columnar_tam". That might be because the test is checking fundamentally row-store-ish properties of the table, which has no applicability to your column-store-ish TAM. In that case, "USING NOT columnar_tam" fixes the test failure when columnar is the default, without preventing the test from testing "mytam" when it happens to be the default. Once you have enough TAMs developed and deployed, this USING NOT business becomes useful in production. You might have different defaults on different servers, or for different customers, etc., and for a given piece of DDL that you want to release you only want to say which TAMs not to use, not to nail down which TAM must be used. Thoughts? I'll hold off posting a patch until the general idea is debated. [*] It takes some extra work to get the TAP tests to play along. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Modest proposal to extend TableAM API for controlling cluster commands
> On Jun 15, 2022, at 6:01 PM, Andres Freund wrote: > > Hi, > > On 2022-06-15 17:21:56 -0700, Mark Dilger wrote: >> While developing various Table Access Methods, I have wanted a callback for >> determining if CLUSTER (and VACUUM FULL) should be run against a table >> backed by a given TAM. The current API contains a callback for doing the >> guts of the cluster, but by that time, it's a bit too late to cleanly back >> out. For single relation cluster commands, raising an error from that >> callback is probably not too bad. For multi-relation cluster commands, that >> aborts the clustering of other yet to be processed relations, which doesn't >> seem acceptable. > > Why not? What else do you want to do in that case? Silently ignoring > non-clusterable tables doesn't seem right either. What's the use-case for > swallowing the error? Imagine you develop a TAM for which the concept of "clustering" doesn't have any defined meaning. Perhaps you've arranged the data in a way that has no similarity to heap, and now somebody runs a CLUSTER command (with no arguments.) It's reasonable that they want all their heap tables to get the usual cluster_rel treatment, and that the existence of a table using your exotic TAM shouldn't interfere with that. Then what? You are forced to copy all the data from your OldHeap (badly named) to the NewHeap (also badly named), or to raise an error. That doesn't seem ok. >> I tried fixing this with a ProcessUtility_hook, but that >> fires before the multi-relation cluster command has compiled the list of >> relations to cluster, meaning the ProcessUtility_hook doesn't have access to >> the necessary information. (It can be hacked to compile the list of >> relations itself, but that duplicates both code and effort, with the usual >> risks that the code will get out of sync.) >> >> For my personal development, I have declared a new hook, bool >> (*relation_supports_cluster) (Relation rel). It gets called from >> commands/cluster.c in both the single-relation and multi-relation code >> paths, with warning or debug log messages output for relations that decline >> to be clustered, respectively. > > Do you actually need to dynamically decide whether CLUSTER is supported? > Otherwise we could just make the existing cluster callback optional and error > out if a table is clustered that doesn't have the callback. Same as above, I don't know why erroring would be the right thing to do. As a comparison, consider that we don't attempt to cluster a partitioned table, but rather just silently skip it. Imagine if, when we introduced the concept of partitioned tables, we made unqualified CLUSTER commands always fail when they encountered a partitioned table. >> Before posting a patch, do people think this sounds useful? Would you like >> the hook function signature to differ in some way? Is a simple "yes this >> relation may be clustered" vs. "no this relation may not be clustered" >> interface overly simplistic? > > It seems overly complicated, if anything ;) For the TAMs I've developed thus far, I don't need the (Relation rel) parameter, and could just have easily used (void). But that seems to fence in what other TAM authors could do in future. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
At Wed, 15 Jun 2022 14:56:42 -0400, Tom Lane wrote in > Alvaro Herrera writes: > > So, git archaeology led me to this thread > > https://postgr.es/m/202106072107.d4i55hdscxqj@alvherre.pgsql > > which is why we added that message in the first place. > > Um. Good thing you looked. I doubt we want to revert that change now. > > > Alternatives: > > - Have the client not complain if it gets CloseComplete in idle state. > > (After all, it's a pretty useless message, since we already do nothing > > with it if we get it in BUSY state.) > > ISTM the actual problem here is that we're reverting to IDLE state too > soon. I didn't try to trace down exactly where that's happening, but Yes. I once visited that fact but also I thought that in the comparison with non-pipelined PQsendQuery, the three messages look extra. Thus I concluded (at the time) that removing Close is enough here. > I notice that in the non-pipeline case we don't go to IDLE till we've > seen 'Z' (Sync). Something in the pipeline logic must be jumping the > gun on that state transition. PQgetResult() resets the state to IDLE when not in pipeline mode. fe-exec.c:2171 > if (conn->pipelineStatus != PQ_PIPELINE_OFF) > { > /* >* We're about to send the results of the > current query. Set >* us idle now, and ... >*/ > conn->asyncStatus = PGASYNC_IDLE; And actually that code let the connection state enter to IDLE before CloseComplete. In the test case I posted, the following happens. PQsendQuery(conn, "SELECT 1;"); PQsendFlushRequest(conn); PQgetResult(conn); // state enters IDLE, reads down to PQgetResult(conn); // reads PQpipelineSync(conn); // sync too late Pipeline feature seems intending to allow PQgetResult called before PQpipelineSync. And also seems allowing to call QPpipelineSync() after PQgetResult(). I haven't come up with a valid *fix* of this flow.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
At Thu, 16 Jun 2022 10:34:22 +0900 (JST), Kyotaro Horiguchi wrote in > At Wed, 15 Jun 2022 14:56:42 -0400, Tom Lane wrote in > > Alvaro Herrera writes: > > > So, git archaeology led me to this thread > > > https://postgr.es/m/202106072107.d4i55hdscxqj@alvherre.pgsql > > > which is why we added that message in the first place. > > > > Um. Good thing you looked. I doubt we want to revert that change now. > > > > > Alternatives: > > > - Have the client not complain if it gets CloseComplete in idle state. > > > (After all, it's a pretty useless message, since we already do nothing > > > with it if we get it in BUSY state.) > > > > ISTM the actual problem here is that we're reverting to IDLE state too > > soon. I didn't try to trace down exactly where that's happening, but > > Yes. I once visited that fact but also I thought that in the > comparison with non-pipelined PQsendQuery, the three messages look > extra. Thus I concluded (at the time) that removing Close is enough > here. > > > I notice that in the non-pipeline case we don't go to IDLE till we've > > seen 'Z' (Sync). Something in the pipeline logic must be jumping the > > gun on that state transition. > - PQgetResult() resets the state to IDLE when not in pipeline mode. D... the "not" should not be there. + PQgetResult() resets the state to IDLE while in pipeline mode. > fe-exec.c:2171 > > > if (conn->pipelineStatus != PQ_PIPELINE_OFF) > > { > > /* > > * We're about to send the results of the > > current query. Set > > * us idle now, and ... > > */ > > conn->asyncStatus = PGASYNC_IDLE; > > And actually that code let the connection state enter to IDLE before > CloseComplete. In the test case I posted, the following happens. > > PQsendQuery(conn, "SELECT 1;"); > PQsendFlushRequest(conn); > PQgetResult(conn); // state enters IDLE, reads down to > > PQgetResult(conn); // reads > PQpipelineSync(conn); // sync too late > > Pipeline feature seems intending to allow PQgetResult called before > PQpipelineSync. And also seems allowing to call QPpipelineSync() after > PQgetResult(). > > I haven't come up with a valid *fix* of this flow.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Modest proposal to extend TableAM API for controlling cluster commands
Hi, On 2022-06-15 18:24:45 -0700, Mark Dilger wrote: > > On Jun 15, 2022, at 6:01 PM, Andres Freund wrote: > > On 2022-06-15 17:21:56 -0700, Mark Dilger wrote: > >> While developing various Table Access Methods, I have wanted a callback for > >> determining if CLUSTER (and VACUUM FULL) should be run against a table > >> backed by a given TAM. The current API contains a callback for doing the > >> guts of the cluster, but by that time, it's a bit too late to cleanly back > >> out. For single relation cluster commands, raising an error from that > >> callback is probably not too bad. For multi-relation cluster commands, > >> that > >> aborts the clustering of other yet to be processed relations, which doesn't > >> seem acceptable. > > > > Why not? What else do you want to do in that case? Silently ignoring > > non-clusterable tables doesn't seem right either. What's the use-case for > > swallowing the error? > > Imagine you develop a TAM for which the concept of "clustering" doesn't have > any defined meaning. Perhaps you've arranged the data in a way that has no > similarity to heap, and now somebody runs a CLUSTER command (with no > arguments.) I think nothing would happen in this case - only pre-clustered tables get clustered in an argumentless CLUSTER. What am I missing? Greetings, Andres Freund
Re: Modest proposal to extend TableAM API for controlling cluster commands
> On Jun 15, 2022, at 6:55 PM, Andres Freund wrote: > > I think nothing would happen in this case - only pre-clustered tables get > clustered in an argumentless CLUSTER. What am I missing? The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target is pre-clustered, and both will run against the table if the user has run an ALTER TABLE..CLUSTER ON. Now, we could try to catch that latter command with a utility hook, but since the VACUUM FULL is still problematic, it seems cleaner to just add the callback I am proposing. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Modest proposal to extend TableAM API for controlling cluster commands
Hi, On 2022-06-15 19:07:50 -0700, Mark Dilger wrote: > > On Jun 15, 2022, at 6:55 PM, Andres Freund wrote: > > > > I think nothing would happen in this case - only pre-clustered tables get > > clustered in an argumentless CLUSTER. What am I missing? > > The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target > is pre-clustered VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the implementation is shared, VACUUM FULL doesn't order the table contents. I see now reason why an AM shouldn't support VACUUM FULL? > , and both will run against the table if the user has run an ALTER > TABLE..CLUSTER ON. If a user does that for a table that doesn't support clustering, well, I don't see what's gained by not erroring out. Greetings, Andres Freund
Re: Modest proposal to extend TableAM API for controlling cluster commands
> On Jun 15, 2022, at 7:14 PM, Andres Freund wrote: > > Hi, > > On 2022-06-15 19:07:50 -0700, Mark Dilger wrote: >>> On Jun 15, 2022, at 6:55 PM, Andres Freund wrote: >>> >>> I think nothing would happen in this case - only pre-clustered tables get >>> clustered in an argumentless CLUSTER. What am I missing? >> >> The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target >> is pre-clustered > > VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the implementation > is shared, VACUUM FULL doesn't order the table contents. I see now reason why > an AM shouldn't support VACUUM FULL? It's effectively a synonym which determines whether the "bool use_sort" parameter of the table AM's relation_copy_for_cluster will be set. Heap-AM plays along and sorts or not based on that. But it's up to the TAM what it wants to do with that boolean, if in fact it does anything at all based on that. A TAM could decide to do the exact opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort on CLUSTER, or perhaps perform a randomized shuffle, or . From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are synonyms. Or am I missing something? >> , and both will run against the table if the user has run an ALTER >> TABLE..CLUSTER ON. > > If a user does that for a table that doesn't support clustering, well, I don't > see what's gained by not erroring out. Perhaps they want to give the TAM information about which index to use for sorting, on those occasions when the TAM's logic dictates that sorting is appropriate, but not in response to a cluster command. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Modest proposal to extend TableAM API for controlling cluster commands
> On Jun 15, 2022, at 7:21 PM, Mark Dilger wrote: > >> If a user does that for a table that doesn't support clustering, well, I >> don't >> see what's gained by not erroring out. > > Perhaps they want to give the TAM information about which index to use for > sorting, on those occasions when the TAM's logic dictates that sorting is > appropriate, but not in response to a cluster command. I should admit that this is a bit hack-ish, but TAM authors haven't been left a lot of options here. Index AMs allow for custom storage parameters, but Table AMs don't, so getting information to the TAM about how to behave takes more than a little slight of hand. Simon's proposal from a while back (don't have the link just now) to allow TAMs to define custom storage parameters would go some distance here. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Modest proposal to extend TableAM API for controlling cluster commands
Hi, On 2022-06-15 19:21:42 -0700, Mark Dilger wrote: > > On Jun 15, 2022, at 7:14 PM, Andres Freund wrote: > > On 2022-06-15 19:07:50 -0700, Mark Dilger wrote: > >>> On Jun 15, 2022, at 6:55 PM, Andres Freund wrote: > >>> > >>> I think nothing would happen in this case - only pre-clustered tables get > >>> clustered in an argumentless CLUSTER. What am I missing? > >> > >> The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target > >> is pre-clustered > > > > VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the > > implementation > > is shared, VACUUM FULL doesn't order the table contents. I see now reason > > why > > an AM shouldn't support VACUUM FULL? > > It's effectively a synonym which determines whether the "bool use_sort" > parameter of the table AM's relation_copy_for_cluster will be set. Heap-AM > plays along and sorts or not based on that. Hardly a synonym if it behaves differently? > But it's up to the TAM what it wants to do with that boolean, if in fact it > does anything at all based on that. A TAM could decide to do the exact > opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort > on CLUSTER, or perhaps perform a randomized shuffle, or behavior here>. That's bogus. Yes, an AM can do stupid stuff in a callback. But so what, that's possible with all extension APIs. > From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are > synonyms. Or am I missing something? The callback gets passed use_sort. So just implement it use_sort = false and error out if use_sort = true? > >> , and both will run against the table if the user has run an ALTER > >> TABLE..CLUSTER ON. > > > > If a user does that for a table that doesn't support clustering, well, I > > don't > > see what's gained by not erroring out. > > Perhaps they want to give the TAM information about which index to use for > sorting, on those occasions when the TAM's logic dictates that sorting is > appropriate, but not in response to a cluster command. I have little sympathy to randomly misusing catalog contents and then complaining that those catalog contents have an effect. Greetings, Andres Freund
Re: fix stats_fetch_consistency value in postgresql.conf.sample
On Sat, Jun 11, 2022 at 09:41:37AM -0500, Justin Pryzby wrote: > Note that this gives: > > guc.c:7573:9: warning: ‘dst’ may be used uninitialized in this function > [-Wmaybe-uninitialized] > > with gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2) > > I wonder whether you'd consider renaming pg_normalize_config_value() to > pg_pretty_config_value() or similar. I have looked at the patch, and I am not convinced that we need a function that does a integer -> integer-with-unit conversion for the purpose of this test. One thing is that it can be unstable with the unit in the conversion where values are close to a given threshold (aka for cases like 2048kB/2MB). On top of that, this overlaps with the existing system function in charge of converting values with bytes as size unit, while this stuff handles more unit types and all GUC types. I think that there could be some room in doing the opposite conversion, feeding the value from postgresql.conf.sample to something and compare it directly with boot_val. That's solvable at SQL level, still a system function may be more user-friendly. Extending the tests to check after the values is something worth doing, but I think that I would limit the checks to the parameters that do not have units for now, until we figure out which interface would be more adapted for doing the normalization of the parameter values. -#syslog_facility = 'LOCAL0' +#syslog_facility = 'local0' Those changes should not be necessary in postgresql.conf.sample. The test should be in charge of applying the lower() conversion, in the same way as guc.c does internally, and that's a mode supported by the parameter parsing. Using an upper-case value in the sample file is actually meaningful sometimes (for example, syslog can use upper-case strings to refer to LOCAL0~7). -- Michael signature.asc Description: PGP signature
Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
At Thu, 16 Jun 2022 10:34:22 +0900 (JST), Kyotaro Horiguchi wrote in > PQgetResult() resets the state to IDLE while in pipeline mode. > > fe-exec.c:2171 > > > if (conn->pipelineStatus != PQ_PIPELINE_OFF) > > { > > /* > > * We're about to send the results of the > > current query. Set > > * us idle now, and ... > > */ > > conn->asyncStatus = PGASYNC_IDLE; > > And actually that code let the connection state enter to IDLE before > CloseComplete. In the test case I posted, the following happens. > > PQsendQuery(conn, "SELECT 1;"); > PQsendFlushRequest(conn); > PQgetResult(conn); // state enters IDLE, reads down to > > PQgetResult(conn); // reads > PQpipelineSync(conn); // sync too late > > Pipeline feature seems intending to allow PQgetResult called before > PQpipelineSync. And also seems allowing to call QPpipelineSync() after > PQgetResult(). > > I haven't come up with a valid *fix* of this flow.. The attached is a crude patch to separate the state for PIPELINE-IDLE from PGASYNC_IDLE. I haven't found a better way.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c index 0ff563f59a..261ccc3ed4 100644 --- a/src/test/modules/libpq_pipeline/libpq_pipeline.c +++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c @@ -968,15 +968,27 @@ test_prepared(PGconn *conn) fprintf(stderr, "ok\n"); } +static int n_notice; +static void +notice_processor(void *arg, const char *message) +{ + n_notice++; + fprintf(stderr, "%s", message); +} + static void test_simple_pipeline(PGconn *conn) { PGresult *res = NULL; const char *dummy_params[1] = {"1"}; Oid dummy_param_oids[1] = {INT4OID}; - + PQnoticeProcessor oldproc; + fprintf(stderr, "simple pipeline... "); + n_notice = 0; + oldproc = PQsetNoticeProcessor(conn, notice_processor, NULL); + /* * Enter pipeline mode and dispatch a set of operations, which we'll then * process the results of as they come in. @@ -1052,6 +1064,51 @@ test_simple_pipeline(PGconn *conn) if (PQpipelineStatus(conn) != PQ_PIPELINE_OFF) pg_fatal("Exiting pipeline mode didn't seem to work"); + if (n_notice > 0) + pg_fatal("unexpected notice"); + + /* Try the same thing with PQsendQuery */ + if (PQenterPipelineMode(conn) != 1) + pg_fatal("failed to enter pipeline mode: %s", PQerrorMessage(conn)); + + if (PQsendQuery(conn, "SELECT 1;") != 1) + pg_fatal("failed to send query: %s", PQerrorMessage(conn)); + PQsendFlushRequest(conn); + + res = PQgetResult(conn); + if (res == NULL) + pg_fatal("PQgetResult returned null when there's a pipeline item: %s", + PQerrorMessage(conn)); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + pg_fatal("Unexpected result code %s from first pipeline item", + PQresStatus(PQresultStatus(res))); + PQclear(res); + + res = PQgetResult(conn); + if (res != NULL) + pg_fatal("expected NULL result"); + + if (PQpipelineSync(conn) != 1) + pg_fatal("pipeline sync failed: %s", PQerrorMessage(conn)); + res = PQgetResult(conn); + if (res == NULL) + pg_fatal("PQgetResult returned null when there's a pipeline item: %s", + PQerrorMessage(conn)); + if (PQresultStatus(res) != PGRES_PIPELINE_SYNC) + pg_fatal("Unexpected result code %s instead of PGRES_PIPELINE_SYNC, error: %s", + PQresStatus(PQresultStatus(res)), PQerrorMessage(conn)); + PQclear(res); + res =NULL; + + if (PQexitPipelineMode(conn) != 1) + pg_fatal("attempt to exit pipeline mode failed when it should've succeeded: %s", + PQerrorMessage(conn)); + + if (n_notice > 0) + pg_fatal("unexpected notice"); + + PQsetNoticeProcessor(conn, oldproc, NULL); + fprintf(stderr, "ok\n"); } diff --git a/src/test/modules/libpq_pipeline/traces/simple_pipeline.trace b/src/test/modules/libpq_pipeline/traces/simple_pipeline.trace index 5c94749bc1..1612c371c0 100644 --- a/src/test/modules/libpq_pipeline/traces/simple_pipeline.trace +++ b/src/test/modules/libpq_pipeline/traces/simple_pipeline.trace @@ -9,4 +9,18 @@ B 33 RowDescription 1 "?column?" 0 4 -1 0 B 11 DataRow 1 1 '1' B 13 CommandComplete "SELECT 1" B 5 ReadyForQuery I +F 17 Parse "" "SELECT 1;" 0 +F 12 Bind "" "" 0 0 0 +F 6 Describe P "" +F 9 Execute "" 0 +F 6 Close P "" +F 4 Flush +B 4 ParseComplete +B 4 BindComplete +B 33 RowDescription 1 "?column?" 0 4 -1 0 +B 11 DataRow 1 1 '1' +B 13 CommandComplete "SELECT 1" +F 4 Sync +B 4 CloseComplete +B 5 ReadyForQuery I F 4 Terminate diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index 919cf5741d..a811d7 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -1380,7 +1380,7 @@ pqAppendCmdQueueEntry(PGconn *conn, PGcmdQueueEntry *entry) * itself co
Re: Modest proposal to extend TableAM API for controlling cluster commands
> On Jun 15, 2022, at 7:30 PM, Andres Freund wrote: > >> It's effectively a synonym which determines whether the "bool use_sort" >> parameter of the table AM's relation_copy_for_cluster will be set. Heap-AM >> plays along and sorts or not based on that. > > Hardly a synonym if it behaves differently? I don't think this point is really worth arguing. We don't have to call it a synonym, and the rest of the discussion remains the same. >> But it's up to the TAM what it wants to do with that boolean, if in fact it >> does anything at all based on that. A TAM could decide to do the exact >> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort >> on CLUSTER, or perhaps perform a randomized shuffle, or > behavior here>. > > That's bogus. Yes, an AM can do stupid stuff in a callback. But so what, > that's possible with all extension APIs. I don't think it's "stupid stuff". A major motivation, perhaps the only useful motivation, for implementing a TAM is to get a non-trivial performance boost (relative to heap) for some target workload, almost certainly at the expense of worse performance for another workload. To optimize any particular workload sufficiently to make it worth the bother, you've pretty much got to do something meaningfully different than what heap does. >> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are >> synonyms. Or am I missing something? > > The callback gets passed use_sort. So just implement it use_sort = false and > error out if use_sort = true? I'm not going to say that your idea is unreasonable for a TAM that you might choose to implement, but I don't see why that should be required of all TAMs anybody might ever implement. The callback that gets use_sort is called from copy_table_data(). By that time, it's too late to avoid the /* * Open the relations we need. */ NewHeap = table_open(OIDNewHeap, AccessExclusiveLock); OldHeap = table_open(OIDOldHeap, AccessExclusiveLock); code that has already happened in cluster.c's copy_table_data() function, and unless I raise an error, after returning from my TAM's callback, the cluster code will replace the old table with the new one. I'm left no choices but to copy my data over, loose my data, or abort the command. None of those are OK options for me. I'm open to different solutions. If a simple callback like relation_supports_cluster(Relation rel) is too simplistic, and more parameters with more context information is wanted, then fine, let's do that. But I can't really complete my work with the interface as it stands now. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: warn if GUC set to an invalid shared library
On Wed, Mar 23, 2022 at 03:02:23PM -0400, Tom Lane wrote: > I agree with Maciek's concerns about these HINTs being emitted > inappropriately, but I also have a stylistic gripe: they're only > halfway hints. Given that we fix things so they only print when they > should, the complaint about the server not starting is not a hint, > it's a fact, which per style guidelines means it should be errdetail. > So I think this ought to look more like > > WARNING: could not access file "xyz" > DETAIL: The server will fail to start with this setting. > HINT: If the server is shut down, it will be necessary to manually edit the > postgresql.auto.conf file to allow it to start again. > > I adjusted the wording a bit too --- YMMV, but I think my text is clearer. It seems to me that there is no objection to the proposed patch, but an update is required. Justin? -- Michael signature.asc Description: PGP signature
Re: Modest proposal to extend TableAM API for controlling cluster commands
Hi, On 2022-06-15 20:10:30 -0700, Mark Dilger wrote: > > On Jun 15, 2022, at 7:30 PM, Andres Freund wrote: > >> But it's up to the TAM what it wants to do with that boolean, if in fact it > >> does anything at all based on that. A TAM could decide to do the exact > >> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort > >> on CLUSTER, or perhaps perform a randomized shuffle, or >> behavior here>. > > > > That's bogus. Yes, an AM can do stupid stuff in a callback. But so what, > > that's possible with all extension APIs. > > I don't think it's "stupid stuff". A major motivation, perhaps the only > useful motivation, for implementing a TAM is to get a non-trivial > performance boost (relative to heap) for some target workload, almost > certainly at the expense of worse performance for another workload. To > optimize any particular workload sufficiently to make it worth the bother, > you've pretty much got to do something meaningfully different than what heap > does. Sure. I just don't see what that has to do with doing something widely differing in VACUUM FULL. Which AM can't support that? I guess there's some where implementing the full visibility semantics isn't feasible, but that's afaics OK. > >> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are > >> synonyms. Or am I missing something? > > > > The callback gets passed use_sort. So just implement it use_sort = false and > > error out if use_sort = true? > > I'm not going to say that your idea is unreasonable for a TAM that you might > choose to implement, but I don't see why that should be required of all TAMs > anybody might ever implement. > The callback that gets use_sort is called from copy_table_data(). By that > time, it's too late to avoid the > > /* > * Open the relations we need. > */ > NewHeap = table_open(OIDNewHeap, AccessExclusiveLock); > OldHeap = table_open(OIDOldHeap, AccessExclusiveLock); > > code that has already happened in cluster.c's copy_table_data() function, > and unless I raise an error, after returning from my TAM's callback, the > cluster code will replace the old table with the new one. I'm left no > choices but to copy my data over, loose my data, or abort the command. None > of those are OK options for me. I think you need to do a bit more explaining of what you're actually trying to achieve here. You're just saying "I don't want to", which doesn't really help me to understand the set of useful options. > I'm open to different solutions. If a simple callback like > relation_supports_cluster(Relation rel) is too simplistic, and more > parameters with more context information is wanted, then fine, let's do > that. FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the relation_copy_for_cluster optional. I still think it's a terrible idea to silently ignore tables in CLUSTER or VACUUM FULL. > But I can't really complete my work with the interface as it stands > now. Since you've not described that work to a meaningful degree... Greetings, Andres Freund
Re: [Proposal] pg_rewind integration into core
On Wed, Mar 23, 2022 at 05:13:47PM +0530, RKN Sai Krishna wrote: > Considering the scenarios where primary is ahead of sync standbys, upon > promotion of a standby, pg_rewind is needed on the old primary if it has to > be up as a standby. Similarly in the scenarios where async standbys(in > physical replication context) go ahead of sync standbys, and upon promotion > of a standby, there is need for pg_rewind to be performed on the async > standbys which are ahead of sync standby being promoted. > With these scenarios under consideration, integrating pg_rewind into > postgres core might be a better option IMO. We could optionally choose to > have pg_rewind dry run performed during the standby startup and based on > the need, perform the rewind and have the standby in sync with the primary. pg_rewind is already part of the core code as a binary tool, but what you mean is to integrate a portion of it in the backend code, as of a startup sequence (with the node to rewind using primary_conninfo for the source?). Once thing that we would need to be careful about is that no assumptions a rewind relies on are messed up in any way at the step where the rewind begins. One such thing is that the standby has achieved crash recovery correctly, so you would need to somewhat complicate more the startup sequence, which is already a complicated and sensitive piece of logic, with more internal dependencies between each piece. I am not really convinced that we need to add more technical debt in this area, particularly now that pg_rewind is able to enforce recovery on the target node once so as it has a clean state when the rewind can begin, so the assumptions around crash recovery and rewind have a clear frontier cut. -- Michael signature.asc Description: PGP signature
Re: warn if GUC set to an invalid shared library
I've started to think that we should really WARN whenever a (set of) GUC is set in a manner that the server will fail to start - not just for shared libraries. In particular, I think the server should also warn if it's going to fail to start like this: 2022-06-15 22:48:34.279 CDT postmaster[20782] FATAL: WAL streaming (max_wal_senders > 0) requires wal_level "replica" or "logical" -- Justin
Re: Modest proposal to extend TableAM API for controlling cluster commands
On Wed, Jun 15, 2022 at 8:18 PM Andres Freund wrote: > > If a simple callback like > > relation_supports_cluster(Relation rel) is too simplistic > Seems like it should be called: relation_supports_compaction[_by_removal_of_interspersed_dead_tuples] Basically, if the user tells the table to make itself smaller on disk by removing dead tuples, should we support the case where the Table AM says: "Sorry, I cannot do that"? If yes, then naming the table explicitly should elicit an error. Having the table chosen implicitly should provoke a warning. For ALTER TABLE CLUSTER there should be an error: which makes the implicit CLUSTER command a non-factor. However, given that should the table structure change it is imperative that the Table AM be capable of producing a new physical relation with the correct data, which will have been compacted as a side-effect, it seems like, explicit or implicit, expecting any Table AM to do that when faced with Vacuum Full is reasonable. Which leaves deciding how to allow a table with a given TAM to prevent itself from being added to the CLUSTER roster. And decide whether an opt-out feature for implicit VACUUM FULL is something we should offer as well. I'm doubtful that a TAM that is pluggable into the MVCC and WAL architecture of PostgreSQL could avoid this basic contract between the system and its users. David J.
Re: Extending USING [heap | mytam | yourtam] grammar and behavior
On Wed, Jun 15, 2022 at 06:16:21PM -0700, Mark Dilger wrote: > When developing two or more TAMs, this falls apart. Some tests may > be worth fixing up (perhaps with alternate output files) for > "mytam", but not for "columnar_tam". That might be because the test > is checking fundamentally row-store-ish properties of the table, > which has no applicability to your column-store-ish TAM. In that > case, "USING NOT columnar_tam" fixes the test failure when columnar > is the default, without preventing the test from testing "mytam" > when it happens to be the default. I think that it is very important for the in-core test suite to remain transparent in terms of options used for table AMs (or compression), and this has improved a lot over the last years with options like HIDE_TABLEAM and HIDE_TOAST_COMPRESSION. Things could have actually more ORDER BY clauses to ensure more ordering of the results, as long as the tests don't want to stress a specific planning path. However, your problem is basically that you develop multiple AMs, but you want to have regression tests that do checks across more than one table AM at the same time. Am I getting that right? Why is a grammar extension necessary for what looks like a test structure problem when there are interdependencies across multiple AMs developped? > Once you have enough TAMs developed and deployed, this USING NOT > business becomes useful in production. You might have different > defaults on different servers, or for different customers, etc., and > for a given piece of DDL that you want to release you only want to > say which TAMs not to use, not to nail down which TAM must be used. I am not sure to see why this would be something users would actually use in prod. That means to pick up something else than what the server thinks is the best default AM but where somebody does not want to trust the default, while generating an error if specifying the default AM in the USING NOT clause. On top of that default_table_access_method is user-settable. -- Michael signature.asc Description: PGP signature
Re: Skipping schema changes in publication
On Tue, Jun 14, 2022 at 9:10 AM houzj.f...@fujitsu.com wrote: > > On Wednesday, June 8, 2022 7:04 PM Amit Kapila > wrote: > > > > On Fri, Jun 3, 2022 at 3:37 PM vignesh C wrote: > > > > > > Thanks for the comments, the attached v8 patch has the changes for the > > same. > > > > > > > AFAICS, the summary of this proposal is that we want to support > > exclude of certain objects from publication with two kinds of > > variants. The first variant is to add support to exclude specific > > tables from ALL TABLES PUBLICATION. Without this feature, users need > > to manually add all tables for a database even when she wants to avoid > > only a handful of tables from the database say because they contain > > sensitive information or are not required. We have seen that other > > database like MySQL also provides similar feature [1] (See > > REPLICATE_WILD_IGNORE_TABLE). The proposed syntax for this is as > > follows: > > > > CREATE PUBLICATION pub1 FOR ALL TABLES EXCEPT TABLE t1,t2; > > or > > ALTER PUBLICATION pub1 ADD ALL TABLES EXCEPT TABLE t1,t2; > > > > This will allow us to publish all the tables in the current database > > except t1 and t2. Now, I see that pg_dump has a similar option > > provided by switch --exclude-table but that allows tables matching > > patterns which is not the case here. I am not sure if we need a > > similar variant here. > > > > Then users will be allowed to reset the publication by: > > ALTER PUBLICATION pub1 RESET; > > > > This will reset the publication to the default state which includes > > resetting the publication parameters, setting the ALL TABLES flag to > > false, and dropping the relations and schemas that are associated with > > the publication. I don't know if we want to go further with allowing > > to RESET specific parameters and if so which parameters and what would > > its syntax be? > > > > The second variant is to add support to exclude certain columns of a > > table while publishing a particular table. Currently, users need to > > list all required columns' names even if they don't want to hide most > > of the columns in the table (for example Create Publication pub For > > Table t1 (c1, c2)). Consider user doesn't want to publish the 'salary' > > or other sensitive information of executives/employees but would like > > to publish all other columns. I feel in such cases it will be a lot of > > work for the user especially when the table has many columns. I see > > that Oracle has a similar feature [2]. I think without this it will be > > difficult for users to use this feature in some cases. The patch for > > this is not proposed but I would imagine syntax for it to be something > > like "Create Publication pub For Table t1 Except (c3)" and similar > > variants for Alter Publication. > > I think the feature to exclude certain columns of a table would be useful. > > In some production scenarios, we usually do not want to replicate > sensitive fields(column) in the table. Although we already can achieve > this by specify all replicated columns in the list[1], but that seems a > hard work when the table has hundreds of columns. > > [1] > CREATE TABLE test(a int, b int, c int,..., sensitive text); > CRAETE PUBLICATION pub FOR TABLE test(a,b,c,...); > > In addition, it's not easy to maintain the column list like above. Because > we sometimes need to add new fields or delete fields due to business > needs. Every time we add a column(or delete a column in column list), we > need to update the column list. > > If we support Except: > CRAETE PUBLICATION pub FOR TABLE test EXCEPT (sensitive); > > We don't need to update the column list in most cases. > Right, this is a valid point and I think it makes sense for me to support such a feature for column list and also to exclude a particular table(s) from the ALL TABLES publication. Peter E., Euler, and others, do you have any objections to supporting the above-mentioned two cases? -- With Regards, Amit Kapila.
Re: bogus: logical replication rows/cols combinations
On Mon, Jun 13, 2022 at 8:54 AM Amit Kapila wrote: > > I would like to close the Open item listed corresponding to this > thread [1] as the fix for the reported issue is committed > (fd0b9dcebd). Do let me know if you or others think otherwise? > Seeing no objections, I have closed this item. -- With Regards, Amit Kapila.
Re: Extending USING [heap | mytam | yourtam] grammar and behavior
On Wed, Jun 15, 2022 at 8:51 PM Michael Paquier wrote: > On top of that > default_table_access_method is user-settable. > > FWIW this proposal acknowledges that and basically leverages it to the hilt, turning it into something like search_path. I strongly dislike the idea of any workflow that depends on a GUC in this manner. The fact that it is user-settable is, IMO, a flaw, not a feature, at least as far as production settings are concerned. It is a novel API for PostgreSQL to rely upon setting a GUC then attaching "unless" configurations to individual objects to ignore it. And what would be chosen (ultimately fallback is heap?), or whether it would simply error, is presently, as you say, undefined. In production this general behavior becomes useful only under the condition that among the various named access methods some of them don't even exist on the server in question, but that a fallback option would be acceptable in that case. But that suggests extending "USING" to accept multiple names, not inventing a "NOT USING". That all said, I can understand that testing presents its own special needs. But testing is probably where GUCs shine. So why not implement this capability as a GUC that is set just before the table is created instead of extending the grammar for it? Add it to "developer options" and call it a day. Dump/Restore no longer has to care about it, and its value once the table exists is basically zero anyway. David J.
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Wed, May 25, 2022 at 11:48 AM Masahiko Sawada wrote: > > On Tue, May 10, 2022 at 6:58 PM John Naylor > wrote: > > > > On Tue, May 10, 2022 at 8:52 AM Masahiko Sawada > > wrote: > > > > > > Overall, radix tree implementations have good numbers. Once we got an > > > agreement on moving in this direction, I'll start a new thread for > > > that and move the implementation further; there are many things to do > > > and discuss: deletion, API design, SIMD support, more tests etc. > > > > +1 > > > > Thanks! > > I've attached an updated version patch. It is still WIP but I've > implemented deletion and improved test cases and comments. I've attached an updated version patch that changes the configure script. I'm still studying how to support AVX2 on msvc build. Also, added more regression tests. The integration with lazy vacuum and parallel vacuum is missing for now. In order to support parallel vacuum, we need to have the radix tree support to be created on DSA. Added this item to the next CF. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/ diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index d3562d6fee..a56d6e89da 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -676,3 +676,27 @@ if test x"$Ac_cachevar" = x"yes"; then fi undefine([Ac_cachevar])dnl ])# PGAC_ARMV8_CRC32C_INTRINSICS + +# PGAC_AVX2_INTRINSICS +# +# Check if the compiler supports the Intel AVX2 instructinos. +# +# If the intrinsics are supported, sets pgac_avx2_intrinsics, and CFLAGS_AVX2. +AC_DEFUN([PGAC_AVX2_INTRINSICS], +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx2_intrinsics_$1])])dnl +AC_CACHE_CHECK([for _mm256_set_1_epi8 _mm256_cmpeq_epi8 _mm256_movemask_epi8 CFLAGS=$1], [Ac_cachevar], +[pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS $1" +AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ], + [__m256i vec = _mm256_set1_epi8(0); + __m256i cmp = _mm256_cmpeq_epi8(vec, vec); + return _mm256_movemask_epi8(cmp) > 0;])], + [Ac_cachevar=yes], + [Ac_cachevar=no]) +CFLAGS="$pgac_save_CFLAGS"]) +if test x"$Ac_cachevar" = x"yes"; then + CFLAGS_AVX2="$1" + pgac_avx2_intrinsics=yes +fi +undefine([Ac_cachevar])dnl +])# PGAC_AVX2_INTRINSICS diff --git a/configure b/configure index 7dec6b7bf9..6ebc15a8c1 100755 --- a/configure +++ b/configure @@ -645,6 +645,7 @@ XGETTEXT MSGMERGE MSGFMT_FLAGS MSGFMT +CFLAGS_AVX2 PG_CRC32C_OBJS CFLAGS_ARMV8_CRC32C CFLAGS_SSE42 @@ -18829,6 +18830,82 @@ $as_echo "slicing-by-8" >&6; } fi +# Check for Intel AVX2 intrinsics. +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for _mm256i CFLAGS=" >&5 +$as_echo_n "checking for _mm256i CFLAGS=... " >&6; } +if ${pgac_cv_avx2_intrinsics_+:} false; then : + $as_echo_n "(cached) " >&6 +else + pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS " +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +#include +int +main () +{ +__m256i vec = _mm256_set1_epi8(0); + __m256i cmp = _mm256_cmpeq_epi8(vec, vec); + return _mm256_movemask_epi8(cmp) > 0; + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + pgac_cv_avx2_intrinsics_=yes +else + pgac_cv_avx2_intrinsics_=no +fi +rm -f core conftest.err conftest.$ac_objext \ +conftest$ac_exeext conftest.$ac_ext +CFLAGS="$pgac_save_CFLAGS" +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_avx2_intrinsics_" >&5 +$as_echo "$pgac_cv_avx2_intrinsics_" >&6; } +if test x"$pgac_cv_avx2_intrinsics_" = x"yes"; then + CFLAGS_AVX2="" + pgac_avx2_intrinsics=yes +fi + +if test x"pgac_avx2_intrinsics" != x"yes"; then + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for _mm256i CFLAGS=-mavx2" >&5 +$as_echo_n "checking for _mm256i CFLAGS=-mavx2... " >&6; } +if ${pgac_cv_avx2_intrinsics__mavx2+:} false; then : + $as_echo_n "(cached) " >&6 +else + pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS -mavx2" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +#include +int +main () +{ +__m256i vec = _mm256_set1_epi8(0); + __m256i cmp = _mm256_cmpeq_epi8(vec, vec); + return _mm256_movemask_epi8(cmp) > 0; + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + pgac_cv_avx2_intrinsics__mavx2=yes +else + pgac_cv_avx2_intrinsics__mavx2=no +fi +rm -f core conftest.err conftest.$ac_objext \ +conftest$ac_exeext conftest.$ac_ext +CFLAGS="$pgac_save_CFLAGS" +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_avx2_intrinsics__mavx2" >&5 +$as_echo "$pgac_cv_avx2_intrinsics__mavx2" >&6; } +if test x"$pgac_cv_avx2_intrinsics__mavx2" = x"yes"; then + CFLAGS_AVX2="-mavx2" + pgac_avx2_intrinsics=yes +fi + +fi + # Select semaphore implementation type. if test "$PORTNAME" != "win32"; then diff --git a/configure.ac b/configure.ac index d093fb88dd..6b6d095306 100644 --- a/configure.ac +++ b/configure.ac @@ -2300,6 +2300,12 @@ else fi AC_SUBST(PG_CRC32C_OBJS) +# Check for Intel AVX2 intrinsics. +PGAC_AVX2_INTRINSICS([]) +if test x"pgac_avx2_intrinsics" != x"yes"; then + PGAC
RE: Replica Identity check of partition table on subscriber
On Wed, Jun 15, 2022 8:30 PM Amit Kapila wrote: > > I have pushed the first bug-fix patch today. > Thanks. Attached the remaining patches which are rebased. Regards, Shi yu v9-0002-fix-memory-leak-about-attrmap.patch Description: v9-0002-fix-memory-leak-about-attrmap.patch v9-0001-Check-partition-table-replica-identity-on-subscri.patch Description: v9-0001-Check-partition-table-replica-identity-on-subscri.patch
Re: Extending USING [heap | mytam | yourtam] grammar and behavior
> On Jun 15, 2022, at 8:51 PM, Michael Paquier wrote: > > However, > your problem is basically that you develop multiple AMs, but you want > to have regression tests that do checks across more than one table AM > at the same time. It is true that I test multiple table AMs at the same time, but that's a somewhat different concern. > Am I getting that right? Not exactly. > Why is a grammar > extension necessary for what looks like a test structure problem when > there are interdependencies across multiple AMs developped? Ok, I didn't want to get into my exact process, because it involves other changes that I don't expect -hackers to want. But basically what I do is: ./configure --with-default-tam=chicago && make && make check-world That fails for a few tests, and I manually change the create table statements in tests that are not chicago-compatible to "using not chicago". Then ./configure --with-default-tam=detroit && make && make check-world That fails for some other set of tests, but note that the tests with "using not chicago" are still using detroit in this second run. That wouldn't be true if I'd fixed up the tests in the first run "using heap". Then I can also add my own tests which might make some chicago backed tables plus some detroit backed tables and see how they interact. But that's superfluous to the issue of just trying to leverage the existing tests as much as I can without having to reinvent tests to cover "chicago", and then reinvent again to cover "detroit", and so forth. If you develop enough TAMs in parallel, and go with the "using heap" solution, you eventually have zero coverage for any of the TAMs, because you'll eventually be "using heap" in all the tables of all the tests. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
Hi, On Wed, Jun 15, 2022 at 06:45:38PM +, Imseih (AWS), Sami wrote: > Adding a plan_id to pg_stat_activity allows users > to determine if a plan for a particular statement > has changed and if the new plan is performing better > or worse for a particular statement. > [...] > Attached is a POC patch that computes the plan_id > and presents the top-level plan_id in pg_stat_activity. AFAICS you're proposing to add an identifier for a specific plan, but no way to know what that plan was? How are users supposed to use the information if they know something changed but don't know what changed exactly? > - In the POC, the compute_query_id GUC determines if a > plan_id is to be computed. Should this be a separate GUC? Probably, as computing it will likely be quite expensive. Some benchmark on various workloads would be needed here. I only had a quick look at the patch, but I see that you have some code to avoid storing the query text multiple times with different planid. How does it work exactly, and does it ensure that the query text is only removed once the last entry that uses it is removed? It seems that you identify a specific query text by queryid, but that seems wrong as collision can (easily?) happen in different databases. The real identifier of a query text should be (dbid, queryid). Note that this problem already exists, as the query texts are now stored per (userid, dbid, queryid, istoplevel). Maybe this part could be split in a different commit as it could already be useful without a planid.
Re: Modest proposal to extend TableAM API for controlling cluster commands
> On Jun 15, 2022, at 8:18 PM, Andres Freund wrote: > > Hi, > > On 2022-06-15 20:10:30 -0700, Mark Dilger wrote: >>> On Jun 15, 2022, at 7:30 PM, Andres Freund wrote: But it's up to the TAM what it wants to do with that boolean, if in fact it does anything at all based on that. A TAM could decide to do the exact opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort on CLUSTER, or perhaps perform a randomized shuffle, or >>> behavior here>. >>> >>> That's bogus. Yes, an AM can do stupid stuff in a callback. But so what, >>> that's possible with all extension APIs. >> >> I don't think it's "stupid stuff". A major motivation, perhaps the only >> useful motivation, for implementing a TAM is to get a non-trivial >> performance boost (relative to heap) for some target workload, almost >> certainly at the expense of worse performance for another workload. To >> optimize any particular workload sufficiently to make it worth the bother, >> you've pretty much got to do something meaningfully different than what heap >> does. > > Sure. I just don't see what that has to do with doing something widely > differing in VACUUM FULL. Which AM can't support that? I guess there's some > where implementing the full visibility semantics isn't feasible, but that's > afaics OK. The problem isn't that the TAM can't do it. Any TAM can probably copy its contents verbatim from the OldHeap to the NewHeap. But it's really stupid to have to do that if you're not going to change anything along the way, and I think if you divorce your thinking from how heap-AM works sufficiently, you might start to see how other TAMs might have nothing useful to do at this step. So there's a strong motivation to not be forced into a "move all the data uselessly" step. I don't really want to discuss the proprietary details of any TAMs I'm developing, so I'll use some made up examples. Imagine you have decided to trade off the need to vacuum against the cost of storing 64bit xids. That doesn't mean that compaction isn't maybe still a good thing, but you don't need to vacuum for anti-wraparound purposes anymore. Now imagine you've also decided to trade off insert speed for select speed, and you sort the entire table on every insert, and to keep indexes happy, you keep a "externally visible TID" to "internal actual location" mapping in a separate fork. Let's say you also don't support UPDATE or DELETE at all. Those immediately draw an error, "not implemented by this tam". At this point, you have a fully sorted and completely compacted table at all times. It's simply an invariant of the TAM. So, now what exactly is VACUUM FULL or CLUSTER supposed to do? Seems like the answer is "diddly squat", and yet you seem to propose requiring the TAM to do it. I don't like that. From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are synonyms. Or am I missing something? >>> >>> The callback gets passed use_sort. So just implement it use_sort = false and >>> error out if use_sort = true? >> >> I'm not going to say that your idea is unreasonable for a TAM that you might >> choose to implement, but I don't see why that should be required of all TAMs >> anybody might ever implement. > >> The callback that gets use_sort is called from copy_table_data(). By that >> time, it's too late to avoid the >> >>/* >> * Open the relations we need. >> */ >>NewHeap = table_open(OIDNewHeap, AccessExclusiveLock); >>OldHeap = table_open(OIDOldHeap, AccessExclusiveLock); >> >> code that has already happened in cluster.c's copy_table_data() function, >> and unless I raise an error, after returning from my TAM's callback, the >> cluster code will replace the old table with the new one. I'm left no >> choices but to copy my data over, loose my data, or abort the command. None >> of those are OK options for me. > > I think you need to do a bit more explaining of what you're actually trying to > achieve here. You're just saying "I don't want to", which doesn't really help > me to understand the set of useful options. I'm trying to opt out of cluster/vacfull. >> I'm open to different solutions. If a simple callback like >> relation_supports_cluster(Relation rel) is too simplistic, and more >> parameters with more context information is wanted, then fine, let's do >> that. > > FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the > relation_copy_for_cluster optional. > > I still think it's a terrible idea to silently ignore tables in CLUSTER or > VACUUM FULL. I'm not entirely against you on that, but it makes me cringe that we impose design decisions like that on any and all future TAMs. It seems better to me to let the TAM author decide to emit an error, warning, notice, or whatever, as they see fit. >> But I can't really complete my work with the interface as it stands >> now. > > Since you've not described that work t
Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
On 15/6/2022 21:45, Imseih (AWS), Sami wrote: Adding a plan_id to pg_stat_activity allows users to determine if a plan for a particular statement has changed and if the new plan is performing better or worse for a particular statement. There are several ways the plan_id in pg_stat_activity In general, your idea is quite useful. But, as discussed earlier [1] extensions would implement many useful things if they could add into a plan some custom data. Maybe implement your feature with some private list of nodes in plan structure instead of single-purpose plan_id field? [1] https://www.postgresql.org/message-id/flat/e0de3423-4bba-1e69-c55a-f76bf18dbd74%40postgrespro.ru -- regards, Andrey Lepikhov Postgres Professional
Re: Extending USING [heap | mytam | yourtam] grammar and behavior
> On Jun 15, 2022, at 8:51 PM, Michael Paquier wrote: > > I am not sure to see why this would be something users would actually > use in prod. That means to pick up something else than what the > server thinks is the best default AM but where somebody does not want > to trust the default, while generating an error if specifying the > default AM in the USING NOT clause. Sorry for the lack of clarity. I do not suggest raising an error. If you say "USING NOT foo", and foo is the default table access method, then you get the same behavior as a "USING heap" would have gotten you, otherwise, you get the same behavior as not providing any USING clause at all. In future, we might want to create a list of fallback tams rather than just hardcoding "heap" as the one and only fallback, but I haven't run into an actual need for that. If you're wondering what "USING NOT heap" falls back to, I think that could error, or it could just use heap anyway. Whatever. That's why I'm still soliciting for comments at this phase rather than posting a patch. > On top of that > default_table_access_method is user-settable. Yeah, but specifying a "USING foo" clause is also open to any user, so I don't see why this matters. "USING NOT foo" is just shorthand for checking the current default_table_access_method, and then either appending a "USING heap" clause or appending no clause. Since the user can do this anyway, what's the security implication in some syntactic sugar? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Replica Identity check of partition table on subscriber
Hi, On Thu, Jun 16, 2022 at 2:07 PM shiy.f...@fujitsu.com wrote: > On Wed, Jun 15, 2022 8:30 PM Amit Kapila wrote: > > I have pushed the first bug-fix patch today. > > Attached the remaining patches which are rebased. Thanks. Comments on v9-0001: + * Don't throw any error here just mark the relation entry as not updatable, + * as replica identity is only for updates and deletes but inserts can be + * replicated even without it. I know you're simply copying the old comment, but I think we can rewrite it to be slightly more useful: We just mark the relation entry as not updatable here if the local replica identity is found to be insufficient and leave it to check_relation_updatable() to throw the actual error if needed. + /* Check that replica identity matches. */ + logicalrep_rel_mark_updatable(entry); Maybe the comment (there are 2 instances) should say: Set if the table's replica identity is enough to apply update/delete. Finally, +# Alter REPLICA IDENTITY on subscriber. +# No REPLICA IDENTITY in the partitioned table on subscriber, but what we check +# is the partition, so it works fine. For consistency with other recently added comments, I'd suggest the following wording: Test that replication works correctly as long as the leaf partition has the necessary REPLICA IDENTITY, even though the actual target partitioned table does not. On v9-0002: + /* cleanup the invalid attrmap */ It seems that "invalid" here really means no-longer-useful, so we should use that phrase as a nearby comment does: Release the no-longer-useful attrmap, if any. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~
On Thu, Jun 09, 2022 at 02:47:36PM +0900, Michael Paquier wrote: > On Thu, Jun 09, 2022 at 04:55:45PM +1200, Thomas Munro wrote: >> I think we can drop mention of Itanium (RIP): the ancient versions of >> Windows that could run on that arch are desupported with your patch. >> It might be more relevant to say that we can't yet run on ARM, and >> Windows 11 is untested by us, but let's fix those problems instead of >> documenting them :-) > > Okay to remove the Itanium part for me. install-windows.sgml has one extra spot mentioning Windows 7 and Server 2008 that can be simplified on top of that. >> There are more mentions of older Windows releases near the compiler >> stuff but perhaps your MSVC version vacuuming work will take care of >> those. > > Yes, I have a few changes like the one in main.c for _M_AMD64. Are > you referring to something else? Actually, this can go with the bump of MIN_WINNT as it uses one of the IsWindows*OrGreater() macros as a runtime check. And there are two more places in pg_ctl.c that can be similarly cleaned up. It is possible that I have missed some spots, of course. -- Michael From 2d51b7dd04e44ace0294531878676ff89e465c0b Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 16 Jun 2022 15:11:53 +0900 Subject: [PATCH v3] Bump WIN_MINNT to 0x0A00 everywhere (Windows 10~) --- src/include/port/win32.h | 11 +++ src/backend/main/main.c | 17 - src/backend/utils/adt/pg_locale.c | 4 ++-- src/bin/pg_ctl/pg_ctl.c | 26 ++ doc/src/sgml/install-windows.sgml | 9 ++--- doc/src/sgml/installation.sgml| 9 - 6 files changed, 13 insertions(+), 63 deletions(-) diff --git a/src/include/port/win32.h b/src/include/port/win32.h index c6213c77c3..fe0829cedc 100644 --- a/src/include/port/win32.h +++ b/src/include/port/win32.h @@ -11,15 +11,10 @@ /* * Make sure _WIN32_WINNT has the minimum required value. - * Leave a higher value in place. When building with at least Visual - * Studio 2015 the minimum requirement is Windows Vista (0x0600) to - * get support for GetLocaleInfoEx() with locales. For everything else - * the minimum version is Windows XP (0x0501). + * Leave a higher value in place. The minimum requirement is Windows 10. */ -#if defined(_MSC_VER) && _MSC_VER >= 1900 -#define MIN_WINNT 0x0600 -#else -#define MIN_WINNT 0x0501 +#if defined(_MSC_VER) +#define MIN_WINNT 0x0A00 #endif #if defined(_WIN32_WINNT) && _WIN32_WINNT < MIN_WINNT diff --git a/src/backend/main/main.c b/src/backend/main/main.c index c43a527d3f..dd82722ee3 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -290,23 +290,6 @@ startup_hacks(const char *progname) _CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR); _CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG); _CrtSetReportFile(_CRT_WARN, _CRTDBG_FILE_STDERR); - -#if defined(_M_AMD64) && _MSC_VER == 1800 - - /*-- - * Avoid crashing in certain floating-point operations if we were - * compiled for x64 with MS Visual Studio 2013 and are running on - * Windows prior to 7/2008R2 SP1 on an AVX2-capable CPU. - * - * Ref: https://connect.microsoft.com/VisualStudio/feedback/details/811093/visual-studio-2013-rtm-c-x64-code-generation-bug-for-avx2-instructions - *-- - */ - if (!IsWindows7SP1OrGreater()) - { - _set_FMA3_enable(0); - } -#endif /* defined(_M_AMD64) && _MSC_VER == 1800 */ - } #endif /* WIN32 */ diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index a0490a7522..4c39841b99 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1729,7 +1729,7 @@ get_collation_actual_version(char collprovider, const char *collcollate) else ereport(ERROR, (errmsg("could not load locale \"%s\"", collcollate))); -#elif defined(WIN32) && _WIN32_WINNT >= 0x0600 +#elif defined(WIN32) /* * If we are targeting Windows Vista and above, we can ask for a name * given a collation name (earlier versions required a location code @@ -1757,7 +1757,7 @@ get_collation_actual_version(char collprovider, const char *collcollate) collcollate, GetLastError(; } - collversion = psprintf("%d.%d,%d.%d", + collversion = psprintf("%ld.%ld,%ld.%ld", (version.dwNLSVersion >> 8) & 0x, version.dwNLSVersion & 0xFF, (version.dwDefinedVersion >> 8) & 0x, diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index dd78e5bc66..ef58883a5c 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -1896,17 +1896,8 @@ CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_ser /* Verify that we found all functions */ if (_IsProcessInJob == NULL || _CreateJobObject == NULL || _SetInformationJobObject == NULL || _AssignProcessToJobObject == NULL || _QueryInformationJobObject == NULL) { - /* - * IsProcessIn
Re: Modest proposal to extend TableAM API for controlling cluster commands
> On Jun 15, 2022, at 8:50 PM, David G. Johnston > wrote: > > On Wed, Jun 15, 2022 at 8:18 PM Andres Freund wrote: > > If a simple callback like > > relation_supports_cluster(Relation rel) is too simplistic > > Seems like it should be called: > relation_supports_compaction[_by_removal_of_interspersed_dead_tuples] Ok. > Basically, if the user tells the table to make itself smaller on disk by > removing dead tuples, should we support the case where the Table AM says: > "Sorry, I cannot do that"? I submit that's the only sane thing to do if the table AM already guarantees that the table will always be fully compacted. There is no justification for forcing the table contents to be copied without benefit. > If yes, then naming the table explicitly should elicit an error. Having the > table chosen implicitly should provoke a warning. For ALTER TABLE CLUSTER > there should be an error: which makes the implicit CLUSTER command a > non-factor. I'm basically fine with how you would design the TAM, but I'm going to argue again that the core project should not dictate these decisions. The TAM's relation_supports_compaction() function can return true/false, or raise an error. If raising an error is the right action, the TAM can do that. If the core code makes that decision, the TAM can't override, and that paints TAM authors into a corner. > However, given that should the table structure change it is imperative that > the Table AM be capable of producing a new physical relation with the correct > data, which will have been compacted as a side-effect, it seems like, > explicit or implicit, expecting any Table AM to do that when faced with > Vacuum Full is reasonable. Which leaves deciding how to allow a table with a > given TAM to prevent itself from being added to the CLUSTER roster. And > decide whether an opt-out feature for implicit VACUUM FULL is something we > should offer as well. > > I'm doubtful that a TAM that is pluggable into the MVCC and WAL architecture > of PostgreSQL could avoid this basic contract between the system and its > users. How about a TAM that implements a write-once, read-many logic. You get one multi-insert, and forever after you can't modify it (other than to drop the table, or perhaps to truncate it). That's a completely made-up-on-the-spot example, but it's not entirely without merit. You could avoid a lot of locking overhead when using such a table, since you'd know a priori that nobody else is modifying it. It could also be implemented with a smaller tuple header, since a lot of the header bytes in heap tuples are dedicated to tracking updates. You wouldn't need a per-row inserting transaction-Id either, since you could just store one per table, knowing that all the rows were inserted in the same transaction. In what sense does this made-up TAM conflict with mvcc and wal? It doesn't have all the features of heap, but that's not the same thing as violating mvcc or breaking wal. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Prevent writes on large objects in read-only transactions
On Thu, 2 Jun 2022 07:43:06 +0900 Michael Paquier wrote: > On Wed, Jun 01, 2022 at 10:15:17AM -0400, Tom Lane wrote: > > It's always appropriate to consider backwards compatibility, and we > > frequently don't back-patch a change because of worries about that. > > However, if someone complains because we start rejecting this as of > > v15 or v16, I don't think they have good grounds for that. It's just > > obviously wrong ... unless someone can come up with a plausible > > definition of read-only-ness that excludes large objects. I don't > > say that that's impossible, but it sure seems like it'd be contorted > > reasoning. They're definitely inside-the-database entities. > > FWIW, I find the removal of error paths to authorize new behaviors > easy to think about in terms of compatibility, because nobody is going > to complain about that as long as it works as intended. The opposite, > aka enforcing an error in a code path is much harder to reason about. > Anyway, if I am outnumbered on this one that's fine by me :) I attached the updated patch. Per discussions above, I undo the change so that it prevents large object writes in read-only transactions again. > There are a couple of things in the original patch that may require to > be adjusted though: > 1) Should we complain in lo_open() when using the write mode for a > read-only transaction? My answer to that would be yes. I fixed to raise the error in lo_open() when using the write mode. > 2) We still publish two non-fmgr-callable routines, lo_read() and > lo_write(). Pe4rhaps we'd better make them static to be-fsstubs.c or > put the same restriction to the write routine as its SQL flavor? I am not sure if we should use PreventCommandIfReadOnly in lo_write() because there are other public functions that write to catalogs but there are not the similar restrictions in such functions. I think it is caller's responsibility to prevent to use such public functions in read-only context. I also fixed to raise the error in each of lo_truncate and lo_truncate64 per Michael's comment in the other post. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c index 5804532881..19c0a0c5de 100644 --- a/src/backend/libpq/be-fsstubs.c +++ b/src/backend/libpq/be-fsstubs.c @@ -93,6 +93,9 @@ be_lo_open(PG_FUNCTION_ARGS) elog(DEBUG4, "lo_open(%u,%d)", lobjId, mode); #endif + if (mode & INV_WRITE) + PreventCommandIfReadOnly("lo_open() in write mode"); + /* * Allocate a large object descriptor first. This will also create * 'fscxt' if this is the first LO opened in this transaction. @@ -179,6 +182,8 @@ lo_write(int fd, const char *buf, int len) int status; LargeObjectDesc *lobj; + PreventCommandIfReadOnly("lo_write"); + if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), @@ -245,6 +250,8 @@ be_lo_creat(PG_FUNCTION_ARGS) { Oid lobjId; + PreventCommandIfReadOnly("lo_creat()"); + lo_cleanup_needed = true; lobjId = inv_create(InvalidOid); @@ -256,6 +263,8 @@ be_lo_create(PG_FUNCTION_ARGS) { Oid lobjId = PG_GETARG_OID(0); + PreventCommandIfReadOnly("lo_create()"); + lo_cleanup_needed = true; lobjId = inv_create(lobjId); @@ -306,6 +315,8 @@ be_lo_unlink(PG_FUNCTION_ARGS) { Oid lobjId = PG_GETARG_OID(0); + PreventCommandIfReadOnly("lo_unlink()"); + /* * Must be owner of the large object. It would be cleaner to check this * in inv_drop(), but we want to throw the error before not after closing @@ -368,6 +379,8 @@ be_lowrite(PG_FUNCTION_ARGS) int bytestowrite; int totalwritten; + PreventCommandIfReadOnly("lowrite()"); + bytestowrite = VARSIZE_ANY_EXHDR(wbuf); totalwritten = lo_write(fd, VARDATA_ANY(wbuf), bytestowrite); PG_RETURN_INT32(totalwritten); @@ -413,6 +426,8 @@ lo_import_internal(text *filename, Oid lobjOid) LargeObjectDesc *lobj; Oid oid; + PreventCommandIfReadOnly("lo_import()"); + /* * open the file to be read in */ @@ -561,6 +576,8 @@ be_lo_truncate(PG_FUNCTION_ARGS) int32 fd = PG_GETARG_INT32(0); int32 len = PG_GETARG_INT32(1); + PreventCommandIfReadOnly("lo_truncate()"); + lo_truncate_internal(fd, len); PG_RETURN_INT32(0); } @@ -571,6 +588,8 @@ be_lo_truncate64(PG_FUNCTION_ARGS) int32 fd = PG_GETARG_INT32(0); int64 len = PG_GETARG_INT64(1); + PreventCommandIfReadOnly("lo_truncate64()"); + lo_truncate_internal(fd, len); PG_RETURN_INT32(0); } @@ -815,6 +834,8 @@ be_lo_from_bytea(PG_FUNCTION_ARGS) LargeObjectDesc *loDesc; int written PG_USED_FOR_ASSERTS_ONLY; + PreventCommandIfReadOnly("lo_from_bytea()"); + lo_cleanup_needed = true; loOid = inv_create(loOid); loDesc = inv_open(loOid, INV_WRITE, CurrentMemoryContext); @@ -837,6 +858,8 @@ be_lo_put(PG_FUNCTION_ARGS) LargeObjectDesc *loDesc; int written PG_USED_FOR_ASSERTS_ONLY; + PreventCommandIfReadOnly("lo_put()"); + lo_cleanup_needed = t
Re: Replica Identity check of partition table on subscriber
On Thu, Jun 16, 2022 at 11:43 AM Amit Langote wrote: > > On Thu, Jun 16, 2022 at 2:07 PM shiy.f...@fujitsu.com > wrote: > > On Wed, Jun 15, 2022 8:30 PM Amit Kapila wrote: > > > I have pushed the first bug-fix patch today. > > > > Attached the remaining patches which are rebased. > > Thanks. > > Comments on v9-0001: > > + * Don't throw any error here just mark the relation entry as not updatable, > + * as replica identity is only for updates and deletes but inserts can be > + * replicated even without it. > > I know you're simply copying the old comment, but I think we can > rewrite it to be slightly more useful: > > We just mark the relation entry as not updatable here if the local > replica identity is found to be insufficient and leave it to > check_relation_updatable() to throw the actual error if needed. > I am fine with improving this comment but it would be better if in some way we keep the following part of the comment: "as replica identity is only for updates and deletes but inserts can be replicated even without it." as that makes it more clear why it is okay to just mark the entry as not updatable. One idea could be: "We just mark the relation entry as not updatable here if the local replica identity is found to be insufficient and leave it to check_relation_updatable() to throw the actual error if needed. This is because replica identity is only for updates and deletes but inserts can be replicated even without it.". Feel free to suggest if you have any better ideas? -- With Regards, Amit Kapila.