Re: O(n^2) system calls in RemoveOldXlogFiles()
On Tue, Jan 12, 2021 at 07:15:21PM +1300, Thomas Munro wrote: > Hah, I even knew that, apparently, but forgot. Adding Michael who > wrote a patch. It'd be nice to fix this, at least in 14. Yeah, this rings a bell. I never went back to it even if the thing looks rather clean at quick glance (not tested), but I may be able to spend some cycles on that. I don't think that's critical enough for a backpatch, so doing something only on HEAD is fine by me. What's your take? -- Michael signature.asc Description: PGP signature
RE: Disable WAL logging to speed up data loading
From: Osumi, Takamichi/大墨 昂道 > Oops, sorry for this careless mistake. > Fixed. And again, regression test produces no failure. Now correct. (Remains ready for committer.) Regards Takayuki Tsunakawa
Re: Fix a typo in SearchCatCache function comment
On Tue, Jan 12, 2021 at 12:17:20PM +0530, Bharath Rupireddy wrote: > While reviewing the patch in [1], I found that SearchCatCache function > uses SearchCatCacheInternal as it's function name in the comment. It > looks like a typo, attaching a very small patch to correct it. Good catch. I'll go fix it tomorrow if nobody objects. -- Michael signature.asc Description: PGP signature
Re: Reduce the number of special cases to build contrib modules on windows
On Wed, Dec 30, 2020 at 10:07:29AM +1300, David Rowley wrote: > -#ifdef LOWER_NODE > +/* > + * Below we ignore the fact that LOWER_NODE is defined when compiling with > + * MSVC. The reason for this is that earlier versions of the MSVC build > + * scripts failed to define LOWER_NODE. More recent version of the MSVC > + * build scripts parse makefiles which results in LOWER_NODE now being > + * defined. We check for _MSC_VER here so as not to break pg_upgrade when > + * upgrading from versions MSVC versions where LOWER_NODE was not defined. > + */ > +#if defined(LOWER_NODE) && !defined(_MSC_VER) > #include > #define TOLOWER(x) tolower((unsigned char) (x)) > #else While on it, do you think that it would be more readable if we remove completely LOWER_NODE and use only a check based on _MSC_VER for those two files in ltree? This could also be handled as a separate change. > + foreach my $line (split /\n/, $mf) > + { > + if ($line =~ /^[A-Za-z0-9_]*\.o:\s(.*)/) > + { > + foreach my $file (split /\s+/, $1) > + { > + foreach my $proj (@projects) > + { > + > $proj->AddFileConditional("$subdir/$n/$file"); > + } > + } > + } > + } Looking closer at this change, I don't think that this is completely correct and that could become a trap. This is adding quite a bit of complexity to take care of contrib_extrasource getting empty, and it actually overlaps with the handling of OBJS done in AddDir(), no? -- Michael signature.asc Description: PGP signature
RE: Disable WAL logging to speed up data loading
On Tuesday, January 12, 2021 12:52 PM Takayuki/綱川 貴之 wrote: > From: Osumi, Takamichi/大墨 昂道 > > I updated the patch following this discussion, and fixed the > > documentation as well. > > > + (rmid == RM_GIST_ID && info == RM_GIST_ID) || > + (rmid == RM_GENERIC_ID))) > > info is wrong for GiST, and one pair of parenthesis is unnecessary. The above > would be: > > + (rmid == RM_GIST_ID && info == > XLOG_GIST_ASSIGN_LSN) || > + rmid == RM_GENERIC_ID)) Oops, sorry for this careless mistake. Fixed. And again, regression test produces no failure. Best Regards, Takamichi Osumi disable_WAL_logging_v07.patch Description: disable_WAL_logging_v07.patch
Fix a typo in SearchCatCache function comment
Hi, While reviewing the patch in [1], I found that SearchCatCache function uses SearchCatCacheInternal as it's function name in the comment. It looks like a typo, attaching a very small patch to correct it. Thoughts? [1] - https://www.postgresql.org/message-id/CAD21AoAwxbd-zXXUAeJ2FBRHr%2B%3DbfMUHoN7xJuXcwu1sFi1-sQ%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v1-0001-Use-correct-function-name-in-SearchCatCache-comme.patch Description: Binary data
RE: Add Nullif case for eval_const_expressions_mutator
> > I think this patch should be about a tenth the size. Try modeling it > > on the T_SubscriptingRef-etc case, ie, use ece_generic_processing and > > then ece_evaluate_expr to cover the generic cases. OpExpr is common > > enough to deserve specially optimized code, but NullIf isn't, so shorter > is better. > > Thanks for the review. > > Attaching v2 patch , which followed the suggestion to use > ece_generic_processing and ece_evaluate_expr to simplify the code. > > Please have a check. Sorry, I found the code still be simplified better. Attaching the new patch. Best regards, houzj v2_1-0001-add-nullif-case-for-eval_const_expressions.patch Description: v2_1-0001-add-nullif-case-for-eval_const_expressions.patch
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy wrote: > > On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila wrote: > > > > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy > > wrote: > > > > > > Hi, > > > > > > While providing thoughts on the design in [1], I found a strange > > > behaviour with the $subject. The use case is shown below as a sequence > > > of steps that need to be run on publisher and subscriber to arrive at > > > the strange behaviour. In step 5, the table is dropped from the > > > publication and in step 6, the refresh publication is run on the > > > subscriber, from here onwards, the expectation is that no further > > > inserts into the publisher table have to be replicated on to the > > > subscriber, but the opposite happens i.e. the inserts are still > > > replicated to the subscriber. ISTM as a bug. Let me know if I'm > > > missing anything. > > > > > > > Did you try to investigate what's going on? Can you please check what > > is the behavior if, after step-5, you restart the subscriber and > > separately try creating a new subscription (maybe on a different > > server) for that publication after step-5 and see if that allows the > > relation to be replicated? AFAIU, in AlterSubscription_refresh, we > > remove such dropped rels and stop their corresponding apply workers > > which should stop the further replication of such relations but that > > doesn't seem to be happening in your case. > > Here's my analysis: > 1) in the publisher, alter publication drop table successfully > removes(PublicationDropTables) the table from the catalogue > pg_publication_rel > 2) in the subscriber, alter subscription refresh publication > successfully removes the table from the catalogue pg_subscription_rel > (AlterSubscription_refresh->RemoveSubscriptionRel) > so far so good > Here, it should register the worker to stop on commit, and then on commit it should call AtEOXact_ApplyLauncher to stop the apply worker. Once the apply worker is stopped, the corresponding WALSender will also be stopped. Something here is not happening as per expected behavior. > 3) after the insertion into the table in the publisher(remember that > it's dropped from the publication in (1)), the walsender process is > unable detect that the table has been dropped from the publication > i.e. it doesn't look at the pg_publication_rel catalogue or some > other, but it only does is_publishable_relation() check which returns > true in pgoutput_change(). Maybe the walsender should look at the > catalogue pg_publication_rel in is_publishable_relation()? > We must be somewhere checking pg_publication_rel before sending the decoded change because otherwise, we would have sent the changes for the table which are not even part of this publication. I think you can try to create a separate table that is not part of the publication under test and see how the changes for that are filtered. -- With Regards, Amit Kapila.
Re: O(n^2) system calls in RemoveOldXlogFiles()
On Tue, Jan 12, 2021 at 6:55 PM Andres Freund wrote: > I found this before as well: > https://postgr.es/m/CAB7nPqTB3VcKSSrW2Qj59tYYR2H4+n=5pzbdwou+x9iqvnm...@mail.gmail.com Hah, I even knew that, apparently, but forgot. Adding Michael who wrote a patch. It'd be nice to fix this, at least in 14.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila wrote: > > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy > wrote: > > > > Hi, > > > > While providing thoughts on the design in [1], I found a strange > > behaviour with the $subject. The use case is shown below as a sequence > > of steps that need to be run on publisher and subscriber to arrive at > > the strange behaviour. In step 5, the table is dropped from the > > publication and in step 6, the refresh publication is run on the > > subscriber, from here onwards, the expectation is that no further > > inserts into the publisher table have to be replicated on to the > > subscriber, but the opposite happens i.e. the inserts are still > > replicated to the subscriber. ISTM as a bug. Let me know if I'm > > missing anything. > > > > Did you try to investigate what's going on? Can you please check what > is the behavior if, after step-5, you restart the subscriber and > separately try creating a new subscription (maybe on a different > server) for that publication after step-5 and see if that allows the > relation to be replicated? AFAIU, in AlterSubscription_refresh, we > remove such dropped rels and stop their corresponding apply workers > which should stop the further replication of such relations but that > doesn't seem to be happening in your case. Here's my analysis: 1) in the publisher, alter publication drop table successfully removes(PublicationDropTables) the table from the catalogue pg_publication_rel 2) in the subscriber, alter subscription refresh publication successfully removes the table from the catalogue pg_subscription_rel (AlterSubscription_refresh->RemoveSubscriptionRel) so far so good 3) after the insertion into the table in the publisher(remember that it's dropped from the publication in (1)), the walsender process is unable detect that the table has been dropped from the publication i.e. it doesn't look at the pg_publication_rel catalogue or some other, but it only does is_publishable_relation() check which returns true in pgoutput_change(). Maybe the walsender should look at the catalogue pg_publication_rel in is_publishable_relation()? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: O(n^2) system calls in RemoveOldXlogFiles()
Hi, On 2021-01-11 16:35:56 +1300, Thomas Munro wrote: > I noticed that RemoveXlogFile() has this code: > > /* > * Before deleting the file, see if it can be recycled as a future log > * segment. Only recycle normal files, pg_standby for example can > create > * symbolic links pointing to a separate archive directory. > */ > if (wal_recycle && > endlogSegNo <= recycleSegNo && > lstat(path, ) == 0 && S_ISREG(statbuf.st_mode) && > InstallXLogFileSegment(, path, >true, > recycleSegNo, true)) > { > ereport(DEBUG2, > (errmsg("recycled write-ahead log file > \"%s\"", > segname))); > CheckpointStats.ckpt_segs_recycled++; > /* Needn't recheck that slot on future iterations */ > endlogSegNo++; > } > > I didn't check the migration history of this code but it seems that > endlogSegNo doesn't currently have the right scoping to achieve the > goal of that last comment, so checkpoints finish up repeatedly search > for the next free slot, starting at the low end each time, like so: > > stat("pg_wal/0001004F", {st_mode=S_IFREG|0600, > st_size=16777216, ...}) = 0 > ... > stat("pg_wal/00010073", 0x7fff98b9e060) = -1 ENOENT > (No such file or directory) > > stat("pg_wal/0001004F", {st_mode=S_IFREG|0600, > st_size=16777216, ...}) = 0 > ... > stat("pg_wal/00010074", 0x7fff98b9e060) = -1 ENOENT > (No such file or directory) > > ... and so on until we've recycled all our recyclable segments. Ouch. I found this before as well: https://postgr.es/m/CAB7nPqTB3VcKSSrW2Qj59tYYR2H4+n=5pzbdwou+x9iqvnm...@mail.gmail.com I did put a hastily rebased version of that commit in my aio branch during development: https://github.com/anarazel/postgres/commit/b3cc8adacf7860add8cc62ec373ac955d9d12992 Greetings, Andres Freund
"has_column_privilege()" issue with attnums and non-existent columns
Greetings Consider a table set up as follows: CREATE TABLE foo (id INT, val TEXT); ALTER TABLE foo DROP COLUMN val; When specifying the name of a dropped or non-existent column, the function "has_column_privilege()" works as expected: postgres=# SELECT has_column_privilege('foo', 'val', 'SELECT'); ERROR: column "val" of relation "foo" does not exist postgres=# SELECT has_column_privilege('foo', 'bar', 'SELECT'); ERROR: column "bar" of relation "foo" does not exist However when specifying a dropped or non-existent attnum, it returns "TRUE", which is unexpected: postgres=# SELECT has_column_privilege('foo', 2::int2, 'SELECT'); has_column_privilege -- t (1 row) postgres=# SELECT has_column_privilege('foo', 999::int2, 'SELECT'); has_column_privilege -- t (1 row) The comment on the relevant code section in "src/backend/utils/adt/acl.c" (related to "column_privilege_check()") indicate that NULL is the intended return value in these cases: Likewise, the variants that take an integer attnum return NULL (rather than throwing an error) if there is no such pg_attribute entry. All variants return NULL if an attisdropped column is selected. The unexpected "TRUE" value is a result of "column_privilege_check()" returning TRUE if the user has table-level privileges. This returns a valid result with the function variants where the column name is specified, as the calling function will have already performed a check of the column through its call to "convert_column_name()". However when the attnum is specified, the status of the column never gets checked. Attached patch resolves this by having "column_privilege_check()" callers indicate whether they have checked the column. If this is the case, and if the user as table-level privileges, "column_privilege_check()" can return as before with no further lookups needed. If the user has table-level privileges but the column was not checked (i.e attnum provided), the column lookup is performed. The regression tests did contain checks for dropped/non-existent attnums, however one group was acting on a table where the user had no table-level privilege, giving a fall-through to the column-level check; the other group contained a check which was just plain wrong. This issue appears to have been present since "has_column_privilege()" was introduced in PostreSQL 8.4 (commit 7449427a1e6a099bc7e76164cb99a01d5e87237b), so falls into the "obscure, extreme corner-case" category, OTOH seems worth backpatching to supported versions. The second patch adds a bunch of missing static prototypes to "acl.c", on general principles. I'll add this to the next commitfest. Regards Ian Barwick -- EnterpriseDB: https://www.enterprisedb.com commit 7e22f2d245888c41b66ae214c226b4a101f27a89 Author: Ian Barwick Date: Tue Jan 12 13:40:31 2021 +0900 Fix has_column_privilege() with attnums and non-existent columns The existence of a column should be confirmed even if the user has the table-level privilege, otherwise the function will happily report the user has privilege on a dropped or non-existent column if an invalid attnum is provided. diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c index c7f029e218..be5649b7ac 100644 --- a/src/backend/utils/adt/acl.c +++ b/src/backend/utils/adt/acl.c @@ -2447,7 +2447,7 @@ has_any_column_privilege_id_id(PG_FUNCTION_ARGS) */ static int column_privilege_check(Oid tableoid, AttrNumber attnum, - Oid roleid, AclMode mode) + Oid roleid, AclMode mode, bool column_checked) { AclResult aclresult; HeapTuple attTuple; @@ -2472,7 +2472,11 @@ column_privilege_check(Oid tableoid, AttrNumber attnum, aclresult = pg_class_aclcheck(tableoid, roleid, mode); - if (aclresult == ACLCHECK_OK) + /* + * Table-level privilege is present and the column has been checked by the + * caller. + */ + if (aclresult == ACLCHECK_OK && column_checked == true) return true; /* @@ -2493,6 +2497,16 @@ column_privilege_check(Oid tableoid, AttrNumber attnum, } ReleaseSysCache(attTuple); + /* + * If the table level privilege exists, and the existence of the column has + * been confirmed, we can skip the per-column check. + */ + if (aclresult == ACLCHECK_OK) + return true; + + /* + * No table privilege, so try per-column privileges. + */ aclresult = pg_attribute_aclcheck(tableoid, attnum, roleid, mode); return (aclresult == ACLCHECK_OK); @@ -2521,7 +2535,7 @@ has_column_privilege_name_name_name(PG_FUNCTION_ARGS) colattnum = convert_column_name(tableoid, column); mode = convert_column_priv_string(priv_type_text); - privresult = column_privilege_check(tableoid, colattnum, roleid, mode); + privresult = column_privilege_check(tableoid, colattnum, roleid, mode, true); if (privresult < 0) PG_RETURN_NULL(); PG_RETURN_BOOL(privresult); @@ -2548,7 +2562,8
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, 12 Jan 2021 at 13:39, Amit Kapila wrote: > On Tue, Jan 12, 2021 at 9:58 AM japin wrote: >> >> >> On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote: >> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy >> > wrote: >> >> >> >> Hi, >> >> >> >> While providing thoughts on the design in [1], I found a strange >> >> behaviour with the $subject. The use case is shown below as a sequence >> >> of steps that need to be run on publisher and subscriber to arrive at >> >> the strange behaviour. In step 5, the table is dropped from the >> >> publication and in step 6, the refresh publication is run on the >> >> subscriber, from here onwards, the expectation is that no further >> >> inserts into the publisher table have to be replicated on to the >> >> subscriber, but the opposite happens i.e. the inserts are still >> >> replicated to the subscriber. ISTM as a bug. Let me know if I'm >> >> missing anything. >> >> >> > >> > Did you try to investigate what's going on? Can you please check what >> > is the behavior if, after step-5, you restart the subscriber and >> > separately try creating a new subscription (maybe on a different >> > server) for that publication after step-5 and see if that allows the >> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we >> > remove such dropped rels and stop their corresponding apply workers >> > which should stop the further replication of such relations but that >> > doesn't seem to be happening in your case. >> >> If we restart the subscriber after step-5, it will not replicate the records. >> >> As I said in [1], if we don't insert a new data in step-5, it will not >> replicate the records. >> > > Hmm, but in Bharath's test, it is replicating the Inserts in step-7 > and step-9 as well. Are you seeing something different? > Yes, however if we don't Inserts in step-5, the Inserts in step-7 and step-9 will not replicate. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: libpq compression
On 11.01.2021 20:38, Tomas Vondra wrote: On 1/11/21 2:53 PM, Konstantin Knizhnik wrote: ... New version of libpq compression patch is attached. It can be also be found at g...@github.com:postgrespro/libpq_compression.git Seems it bit-rotted already, so here's a slightly fixed version. 1) Fixes the MSVC makefile. The list of files is sorted alphabetically, so I've added the file at the end. 2) Fixes duplicate OID. It's a good practice to assign OIDs from the end of the range, to prevent collisions during development. Thank you Other than that, I wonder what's the easiest way to run all tests with compression enabled. ISTM it'd be nice to add pg_regress option forcing a particular compression algorithm to be used, or something similar. I'd like a convenient way to pass this through a valgrind, for example. Or how do we get this tested on a buildfarm? I run regression tests with PG_COMPRESSION environment variable set to "true". Do we need some other way (like pg_regress options to run tests with compression enabled? I'm not convinced it's very user-friendly to not have a psql option enabling compression. It's true it can be enabled in a connection string, but I doubt many people will notice that. The sgml docs need a bit more love / formatting. The lines in libpq.sgml are far too long, and there are no tags whatsoever. Presumably zlib/zstd should be marked as , and so on. regards Thank you, I will fix it.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 9:58 AM japin wrote: > > > On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote: > > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy > > wrote: > >> > >> Hi, > >> > >> While providing thoughts on the design in [1], I found a strange > >> behaviour with the $subject. The use case is shown below as a sequence > >> of steps that need to be run on publisher and subscriber to arrive at > >> the strange behaviour. In step 5, the table is dropped from the > >> publication and in step 6, the refresh publication is run on the > >> subscriber, from here onwards, the expectation is that no further > >> inserts into the publisher table have to be replicated on to the > >> subscriber, but the opposite happens i.e. the inserts are still > >> replicated to the subscriber. ISTM as a bug. Let me know if I'm > >> missing anything. > >> > > > > Did you try to investigate what's going on? Can you please check what > > is the behavior if, after step-5, you restart the subscriber and > > separately try creating a new subscription (maybe on a different > > server) for that publication after step-5 and see if that allows the > > relation to be replicated? AFAIU, in AlterSubscription_refresh, we > > remove such dropped rels and stop their corresponding apply workers > > which should stop the further replication of such relations but that > > doesn't seem to be happening in your case. > > If we restart the subscriber after step-5, it will not replicate the records. > > As I said in [1], if we don't insert a new data in step-5, it will not > replicate the records. > Hmm, but in Bharath's test, it is replicating the Inserts in step-7 and step-9 as well. Are you seeing something different? > In both cases, the AlterSubscription_refresh() call RemoveSubscriptionRel() > and logicalrep_worker_stop_at_commit(). However, if we insert a data in > step-5, it doesn't work as expected. Any thoughts? > I think the data inserted in step-5 might be visible because we have stopped the apply process after that but it is not clear why the data inserted in steps 7 and 9 is getting replicated. I think due to some reason apply worker is not getting stopped even after Refresh Publication statement is finished due to which the data is being replicated after that as well and after restart the apply worker won't be restarted so the data replication doesn't happen. -- With Regards, Amit Kapila.
Re: Deleting older versions in unique indexes to avoid page splits
On Sun, Jan 10, 2021 at 4:06 PM Peter Geoghegan wrote: > Attached is v13, which has this tweak, and other miscellaneous cleanup > based on review from both Victor and Heikki. I consider this version > of the patch to be committable. I intend to commit something close to > it in the next week, probably no later than Thursday. I still haven't > got to the bottom of the shellsort question raised by Heikki. I intend > to do further performance validation before committing the patch. I benchmarked the patch with one array and without the shellsort specialization using two patches on top of v13, both of which are attached. This benchmark was similar to other low cardinality index benchmarks I've run in the past (with indexes named fiver, tenner, score, plus a pgbench_accounts INCLUDE unique index instead of the regular primary key). I used pgbench scale 500, for 30 minutes, no rate limit. One run with 16 clients, another with 32 clients. Original v13: patch.r1c32.bench.out: "tps = 32709.772257 (including connections establishing)" "latency average = 0.974 ms" "latency stddev = 0.514 ms" patch.r1c16.bench.out: "tps = 34670.929998 (including connections establishing)" "latency average = 0.461 ms" "latency stddev = 0.314 ms" v13 + attached simplifying patches: patch.r1c32.bench.out: "tps = 31848.632770 (including connections establishing)" "latency average = 1.000 ms" "latency stddev = 0.548 ms" patch.r1c16.bench.out: "tps = 33864.099333 (including connections establishing)" "latency average = 0.472 ms" "latency stddev = 0.399 ms" Clearly the optimization work still has some value, since we're looking at about a 2% - 3% increase in throughput here. This seems like it makes the cost of retaining the optimizations acceptable. The difference is much less visible with a rate-limit, which is rather more realistic. To some extent the sort is hot here because the patched version of Postgres updates tuples as fast as it can, and must therefore delete from the index as fast as it can. The sort itself was consistently near the top consumer of CPU cycles according to "perf top", even if it didn't get as bad as what I saw in earlier versions of the patch months ago. There are actually two sorts involved here (not just the heapam.c shellsort). We need to sort the deltids array twice -- once in heapam.c, and a second time (to restore the original leaf-page-wise order) in nbtpage.c, using qsort(). I'm pretty sure that the latter sort also matters, though it matters less than the heapam.c shellsort. I'm going to proceed with committing the original version of the patch -- I feel that this settles it. The code would be a bit tidier without two arrays or the shellsort, but it really doesn't make that much difference. Whereas the benefit is quite visible, and will be something that all varieties of index tuple deletion see a performance benefit from (not just bottom-up deletion). -- Peter Geoghegan 0006-Experiment-One-array-again.patch Description: Binary data 0007-Experiment-Remove-shellsort-just-use-qsort.patch Description: Binary data
RE: Add Nullif case for eval_const_expressions_mutator
> > I notice that there are no Nullif case in eval_const_expression. > > Since Nullif is similar to Opexpr and is easy to implement, I try add > > this case in eval_const_expressions_mutator. > > I think this patch should be about a tenth the size. Try modeling it on > the T_SubscriptingRef-etc case, ie, use ece_generic_processing and then > ece_evaluate_expr to cover the generic cases. OpExpr is common enough to > deserve specially optimized code, but NullIf isn't, so shorter is better. Thanks for the review. Attaching v2 patch , which followed the suggestion to use ece_generic_processing and ece_evaluate_expr to simplify the code. Please have a check. Best regards, houzj v2-0001-add-nullif-case-for-eval_const_expressions.patch Description: v2-0001-add-nullif-case-for-eval_const_expressions.patch
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote: > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy > wrote: >> >> Hi, >> >> While providing thoughts on the design in [1], I found a strange >> behaviour with the $subject. The use case is shown below as a sequence >> of steps that need to be run on publisher and subscriber to arrive at >> the strange behaviour. In step 5, the table is dropped from the >> publication and in step 6, the refresh publication is run on the >> subscriber, from here onwards, the expectation is that no further >> inserts into the publisher table have to be replicated on to the >> subscriber, but the opposite happens i.e. the inserts are still >> replicated to the subscriber. ISTM as a bug. Let me know if I'm >> missing anything. >> > > Did you try to investigate what's going on? Can you please check what > is the behavior if, after step-5, you restart the subscriber and > separately try creating a new subscription (maybe on a different > server) for that publication after step-5 and see if that allows the > relation to be replicated? AFAIU, in AlterSubscription_refresh, we > remove such dropped rels and stop their corresponding apply workers > which should stop the further replication of such relations but that > doesn't seem to be happening in your case. If we restart the subscriber after step-5, it will not replicate the records. As I said in [1], if we don't insert a new data in step-5, it will not replicate the records. In both cases, the AlterSubscription_refresh() call RemoveSubscriptionRel() and logicalrep_worker_stop_at_commit(). However, if we insert a data in step-5, it doesn't work as expected. Any thoughts? [1] https://www.postgresql.org/message-id/a7a618fb-f87c-439c-90a3-93cf9e734...@hotmail.com -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: pg_upgrade test for binary compatibility of core data types
On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote: > On 2020-12-27 20:07, Justin Pryzby wrote: > > On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote: > > > On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote: > > > > I meant to notice if the binary format is accidentally changed again, > > > > which was > > > > what happened here: > > > > 7c15cef86 Base information_schema.sql_identifier domain on name, not > > > > varchar. > > > > > > > > I added a table to the regression tests so it's processed by pg_upgrade > > > > tests, > > > > run like: > > > > > > > > | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 > > > > oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin > > > > > > Per cfbot, this avoids testing ::xml (support for which may not be > > > enabled) > > > And also now tests oid types. > > > > > > I think the per-version hacks should be grouped by logical change, rather > > > than > > > by version. Which I've started doing here. > > > > rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f > > I think these patches could use some in-place documentation of what they are > trying to achieve and how they do it. The required information is spread > over a lengthy thread. No one wants to read that. Add commit messages to > the patches. 0001 patch fixes pg_upgrade/test.sh, which was disfunctional. Portions of the first patch were independently handled by commits 52202bb39, fa744697c, 091866724. So this is rebased on those. I guess updating this script should be a part of a beta-checklist somewhere, since I guess nobody will want to backpatch changes for testing older releases. 0002 allows detecting the information_schema problem that was introduced at: 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar. +-- Create a table with different data types, to exercise binary compatibility +-- during pg_upgrade test If binary compatibility is changed I expect this will error, crash, at least return wrong data, and thereby fail tests. -- Justin On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote: > I checked that if I cherry-pick 0002 to v11, and comment out > old_11_check_for_sql_identifier_data_type_usage(), then pg_upgrade/test.sh > detects the original problem: > pg_dump: error: Error message from server: ERROR: invalid memory alloc > request size 18446744073709551613 > > I understand the buildfarm has its own cross-version-upgrade test, which I > think would catch this on its own. > > These all seem to complicate use of pg_upgrade/test.sh, so 0001 is needed to > allow testing upgrade from older releases. > > e78900afd217fa3eaa77c51e23a94c1466af421c Create by default sql/ and expected/ > for output directory in pg_regress > 40b132c1afbb4b1494aa8e48cc35ec98d2b90777 In the pg_upgrade test suite, don't > write to src/test/regress. > fc49e24fa69a15efacd5b8958115ed9c43c48f9a Make WAL segment size configurable > at initdb time. > c37b3d08ca6873f9d4eaf24c72a90a550970cbb8 Allow group access on PGDATA > da9b580d89903fee871cf54845ffa2b26bda2e11 Refactor dir/file permissions >From b3f829ab0fd880962d43eac0222bdaab2b8070f4 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 5 Dec 2020 22:31:19 -0600 Subject: [PATCH v4 1/3] WIP: pg_upgrade/test.sh: changes needed to allow testing upgrade to v14dev from v9.5-v13 --- src/bin/pg_upgrade/test.sh | 93 +++--- 1 file changed, 86 insertions(+), 7 deletions(-) diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh index ca923ba01b..b36fca4233 100644 --- a/src/bin/pg_upgrade/test.sh +++ b/src/bin/pg_upgrade/test.sh @@ -177,18 +177,97 @@ if "$MAKE" -C "$oldsrc" installcheck-parallel; then esac fix_sql="$fix_sql DROP FUNCTION IF EXISTS - public.oldstyle_length(integer, text); -- last in 9.6 + public.oldstyle_length(integer, text);" # last in 9.6 -- commit 5ded4bd21 + fix_sql="$fix_sql DROP FUNCTION IF EXISTS - public.putenv(text); -- last in v13 - DROP OPERATOR IF EXISTS -- last in v13 - public.#@# (pg_catalog.int8, NONE), - public.#%# (pg_catalog.int8, NONE), - public.!=- (pg_catalog.int8, NONE), + public.putenv(text);" # last in v13 + # last in v13 commit 76f412ab3 + # public.!=- This one is only needed for v11+ ?? + # Note, until v10, operators could only be dropped one at a time + fix_sql="$fix_sql + DROP OPERATOR IF EXISTS + public.#@# (pg_catalog.int8, NONE);" + fix_sql="$fix_sql + DROP OPERATOR IF EXISTS + public.#%# (pg_catalog.int8, NONE);" + fix_sql="$fix_sql + DROP OPERATOR IF EXISTS + public.!=- (pg_catalog.int8, NONE);" + fix_sql="$fix_sql + DROP OPERATOR IF EXISTS public.#@%# (pg_catalog.int8, NONE);" + + # commit 068503c76511cdb0080bab689662a20e86b9c845 + case $oldpgversion in + 10) +fix_sql="$fix_sql + DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;" +;; + esac + + # commit
Re: [POC] Fast COPY FROM command for the table with foreign partitions
On 1/11/21 11:16 PM, Tomas Vondra wrote: Hi Andrey, Unfortunately, this no longer applies :-( I tried to apply this on top of c532d15ddd (from 2020/12/30) but even that has non-trivial conflicts. Can you send a rebased version? regards Applied on 044aa9e70e. -- regards, Andrey Lepikhov Postgres Professional >From f8e0cd305c691108313c2365cc4576e4d5e0bd38 Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Tue, 12 Jan 2021 08:54:45 +0500 Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table. This feature enables bulk COPY into foreign table in the case of multi inserts is possible and foreign table has non-zero number of columns. FDWAPI was extended by next routines: * BeginForeignCopy * EndForeignCopy * ExecForeignCopy BeginForeignCopy and EndForeignCopy initialize and free the CopyState of bulk COPY. The ExecForeignCopy routine send 'COPY ... FROM STDIN' command to the foreign server, in iterative manner send tuples by CopyTo() machinery, send EOF to this connection. Code that constructed list of columns for a given foreign relation in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList(). It is reused in the deparseCopyFromSql(). Added TAP-tests on the specific corner cases of COPY FROM STDIN operation. By the analogy of CopyFrom() the CopyState structure was extended with data_dest_cb callback. It is used for send text representation of a tuple to a custom destination. The PgFdwModifyState structure is extended with the cstate field. It is needed for avoid repeated initialization of CopyState. ALso for this reason CopyTo() routine was split into the set of routines CopyToStart()/ CopyTo()/CopyToFinish(). Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert field of the ResultRelInfo sructure. Discussion: https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote --- contrib/postgres_fdw/deparse.c| 60 ++-- .../postgres_fdw/expected/postgres_fdw.out| 46 ++- contrib/postgres_fdw/postgres_fdw.c | 130 ++ contrib/postgres_fdw/postgres_fdw.h | 1 + contrib/postgres_fdw/sql/postgres_fdw.sql | 45 ++ doc/src/sgml/fdwhandler.sgml | 73 ++ src/backend/commands/copy.c | 4 +- src/backend/commands/copyfrom.c | 126 ++--- src/backend/commands/copyto.c | 84 --- src/backend/executor/execMain.c | 8 +- src/backend/executor/execPartition.c | 27 +++- src/include/commands/copy.h | 8 +- src/include/foreign/fdwapi.h | 15 ++ 13 files changed, 533 insertions(+), 94 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 3cf7b4eb1e..b1ca479a65 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList, static void appendFunctionName(Oid funcid, deparse_expr_cxt *context); static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno, deparse_expr_cxt *context); +static List *deparseRelColumnList(StringInfo buf, Relation rel, + bool enclose_in_parens); /* * Helper functions @@ -1763,6 +1765,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte, withCheckOptionList, returningList, retrieved_attrs); } +/* + * Deparse COPY FROM into given buf. + * We need to use list of parameters at each query. + */ +void +deparseCopyFromSql(StringInfo buf, Relation rel) +{ + appendStringInfoString(buf, "COPY "); + deparseRelation(buf, rel); + (void) deparseRelColumnList(buf, rel, true); + + appendStringInfoString(buf, " FROM STDIN "); +} + /* * deparse remote UPDATE statement * @@ -2066,6 +2082,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel) */ void deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs) +{ + appendStringInfoString(buf, "SELECT "); + *retrieved_attrs = deparseRelColumnList(buf, rel, false); + + /* Don't generate bad syntax for zero-column relation. */ + if (list_length(*retrieved_attrs) == 0) + appendStringInfoString(buf, "NULL"); + + /* + * Construct FROM clause + */ + appendStringInfoString(buf, " FROM "); + deparseRelation(buf, rel); +} + +/* + * Construct the list of columns of given foreign relation in the order they + * appear in the tuple descriptor of the relation. Ignore any dropped columns. + * Use column names on the foreign server instead of local names. + * + * Optionally enclose the list in parantheses. + */ +static List * +deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens) { Oid relid = RelationGetRelid(rel); TupleDesc tupdesc = RelationGetDescr(rel); @@ -2074,10 +2114,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List
Re: logical replication worker accesses catalogs in error context callback
On Mon, Jan 11, 2021 at 4:46 PM Bharath Rupireddy wrote: > > On Mon, Jan 11, 2021 at 10:56 AM Masahiko Sawada > wrote: > > Agreed. Attached the updated patch. > > Thanks for the updated patch. Looks like the comment crosses the 80 > char limit, I have corrected it. And also changed the variable name > from remotetypeid to remotetypid, so that logicalrep_typmap_gettypname > will not cross the 80 char limit. And also added a commit message. > Attaching v3 patch, please have a look. Both make check and make > check-world passes. Thanks! The change looks good to me. > > > > I quickly searched in places where error callbacks are being used, I > > > think we need a similar kind of fix for conversion_error_callback in > > > postgres_fdw.c, because get_attname and get_rel_name are being used > > > which do catalogue lookups. ISTM that all the other error callbacks > > > are good in the sense that they are not doing sys catalogue lookups. > > > > Indeed. If we need to disallow the catalog lookup during executing > > error callbacks we might want to have an assertion checking that in > > SearchCatCacheInternal(), in addition to Assert(IsTransactionState()). > > I tried to add(as attached in > v1-0001-Avoid-Sys-Cache-Lookups-in-Error-Callbacks.patch) the > Assert(!error_context_stack); in SearchCatCacheInternal, initdb itself > fails [1]. That means, we are doing a bunch of catalogue access from > error context callbacks. Given this, I'm not quite sure whether we can > have such an assertion in SearchCatCacheInternal. I think checking !error_context_stack is not a correct check if we're executing an error context callback since it's a stack to store callbacks. It can be non-NULL by setting an error callback, see setup_parser_errposition_callback() for instance. Probably we need to check if (recursion_depth > 0) and elevel. Attached a patch for that as an example. > > Although unrelated to what we are discussing here - when I looked at > SearchCatCacheInternal, I found that the function SearchCatCache has > SearchCatCacheInternal in the function comment, I think we should > correct it. Thoughts? If okay, I will post a separate patch for this > minor comment fix. Perhaps you mean this? /* * SearchCatCacheInternal * * This call searches a system cache for a tuple, opening the relation * if necessary (on the first access to a particular cache). * * The result is NULL if not found, or a pointer to a HeapTuple in * the cache. The caller must not modify the tuple, and must call * ReleaseCatCache() when done with it. * * The search key values should be expressed as Datums of the key columns' * datatype(s). (Pass zeroes for any unused parameters.) As a special * exception, the passed-in key for a NAME column can be just a C string; * the caller need not go to the trouble of converting it to a fully * null-padded NAME. */ HeapTuple SearchCatCache(CatCache *cache, Looking at commit 141fd1b66 it intentionally changed to SearchCatCacheInternal from SearchCatCache but it seems to me that it's a typo. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/ avoid_sys_cache_lookup_in_error_callback_v2.patch Description: Binary data
RE: Disable WAL logging to speed up data loading
From: Osumi, Takamichi/大墨 昂道 > I updated the patch following this discussion, and fixed the documentation as > well. + (rmid == RM_GIST_ID && info == RM_GIST_ID) || + (rmid == RM_GENERIC_ID))) info is wrong for GiST, and one pair of parenthesis is unnecessary. The above would be: + (rmid == RM_GIST_ID && info == XLOG_GIST_ASSIGN_LSN) || + rmid == RM_GENERIC_ID)) Regards Takayuki Tsunakawa
Re: [POC] Fast COPY FROM command for the table with foreign partitions
On 1/11/21 4:59 PM, Tang, Haiying wrote: Hi Andrey, I had a general look at this extension feature, I think it's beneficial for some application scenarios of PostgreSQL. So I did 7 performance cases test on your patch(v13). The results are really good. As you can see below we can get 7-10 times improvement with this patch. PSA test_copy_from.sql shows my test cases detail(I didn't attach my data file since it's too big). Below are the test results: 'Test No' corresponds to the number(0 1...6) in attached test_copy_from.sql. %reg=(Patched-Unpatched)/Unpatched), Unit is millisecond. |Test No| Test Case |Patched(ms) | Unpatched(ms) |%reg | |---|-|-|---|---| |0 |COPY FROM insertion into the partitioned table(parition is foreign table)| 102483.223 | 1083300.907 | -91% | |1 |COPY FROM insertion into the partitioned table(parition is foreign partition)| 104779.893 | 1207320.287 | -91% | |2 |COPY FROM insertion into the foreign table(without partition) | 100268.730 | 1077309.158 | -91% | |3 |COPY FROM insertion into the partitioned table(part of foreign partitions) | 104110.620 | 1134781.855 | -91% | |4 |COPY FROM insertion into the partitioned table with constraint(part of foreign partition)| 136356.201 | 1238539.603 | -89% | |5 |COPY FROM insertion into the foreign table with constraint(without partition)| 136818.262 | 1189921.742 | -89% | |6 |\copy insertion into the partitioned table with constraint. | 140368.072 | 1242689.924 | -89% | If there is any question on my tests, please feel free to ask. Best Regard, Tang Thank you for this work. Sometimes before i suggested additional optimization [1] which can additionally speed up COPY by 2-4 times. Maybe you can perform the benchmark for this solution too? [1] https://www.postgresql.org/message-id/da7ed3f5-b596-2549-3710-4cc2a602ec17%40postgrespro.ru -- regards, Andrey Lepikhov Postgres Professional
Re: Movement of restart_lsn position movement of logical replication slots is very slow
Hi Amit, Thanks for the response . Can you please let me know what pg_current_wal_lsn returns ? is this position the LSN of the next log record to be created, or is it the LSN of the last log record already created and inserted in the log? The document says - it returns current WAL write location. Regards Shailesh On Thu, 24 Dec, 2020, 7:43 pm Amit Kapila, wrote: > On Thu, Dec 24, 2020 at 7:30 PM Jammie wrote: > > > > Sorry dont have the debug setup handy. However the sql commands now > works though to move the restart_lsn of the slots in standlone code from > psql. > > > > A few followup questions. > > > > What is catalog_xmin in the pg_replication_slots ? and how is it playing > role in moving the restart_lsn of the slot. > > > > I am just checking possibility that if a special transaction can cause > private slot to stale ? > > > > Yeah, it is possible if there is some old transaction is active in the > system. The restart_lsn is lsn required by the oldesttxn. But it is > strange that it affects only one of the slots. > > -- > With Regards, > Amit Kapila. >
Re: Key management with tests
Hi Stephen, On Tue, Jan 12, 2021 at 10:47 AM Stephen Frost wrote: > > This is an interesting question but ultimately I don't think we should > be looking at this from the perspective of allowing arbitrary changes to > the page format. The challenge is that much of the page format, today, > is defined by a C struct and changing the way that works would require a > great deal of code to be modified and turn this into a massive effort, > assuming we wish to have the same compiled binary able to work with both > unencrypted and encrypted clusters, which I do believe is a requirement. > > The thought that I had was to, instead, try to figure out if we could > fudge some space by, say, putting a 128-bit 'hole' at the end of the > page and just move pd_special back, effectively making the page seem > 'smaller' to all of the code that uses it, except for the code that > knows how to do the decryption. I ran into some trouble with that but > haven't quite sorted out what happened yet. Other ideas would be to put > it before pd_special, or maybe somewhere else, but a lot depends on the > code's expectations. > > I agree that we should not make too many changes to affect the use of unencrypted clusters. But as a personal opinion only, I don't think it's a good idea to add some "implicit" tricks. To provide an inspiration, can we add a flag to mark whether the page format has been changed: --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -181,8 +185,9 @@ typedef PageHeaderData *PageHeader; #define PD_PAGE_FULL 0x0002 /* not enough free space for new tuple? */ #define PD_ALL_VISIBLE 0x0004 /* all tuples on page are visible to * everyone */ +#define PD_PAGE_ENCRYPTED 0x0008 /* Is page encrypted? */ -#define PD_VALID_FLAG_BITS 0x0007 /* OR of all valid pd_flags bits */ +#define PD_VALID_FLAG_BITS 0x000F /* OR of all valid pd_flags bits */ /* * Page layout version number 0 is for pre-7.3 Postgres releases. @@ -389,6 +394,13 @@ PageValidateSpecialPointer(Page page) #define PageClearAllVisible(page) \ (((PageHeader) (page))->pd_flags &= ~PD_ALL_VISIBLE) +#define PageIsEncrypted(page) \ + (((PageHeader) (page))->pd_flags & PD_PAGE_ENCRYPTED) +#define PageSetEncrypted(page) \ + (((PageHeader) (page))->pd_flags |= PD_PAGE_ENCRYPTED) +#define PageClearEncrypted(page) \ + (((PageHeader) (page))->pd_flags &= ~PD_PAGE_ENCRYPTED) + #define PageIsPrunable(page, oldestxmin) \ ( \ AssertMacro(TransactionIdIsNormal(oldestxmin)), \ In this way, I think it has little effect on the unencrypted cluster, and we can also modify the page format as we wish. Of course, it's also possible that I didn't understand your design correctly, or there's something wrong with my idea. :D -- There is no royal road to learning. HighGo Software Co.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy wrote: > > Hi, > > While providing thoughts on the design in [1], I found a strange > behaviour with the $subject. The use case is shown below as a sequence > of steps that need to be run on publisher and subscriber to arrive at > the strange behaviour. In step 5, the table is dropped from the > publication and in step 6, the refresh publication is run on the > subscriber, from here onwards, the expectation is that no further > inserts into the publisher table have to be replicated on to the > subscriber, but the opposite happens i.e. the inserts are still > replicated to the subscriber. ISTM as a bug. Let me know if I'm > missing anything. > Did you try to investigate what's going on? Can you please check what is the behavior if, after step-5, you restart the subscriber and separately try creating a new subscription (maybe on a different server) for that publication after step-5 and see if that allows the relation to be replicated? AFAIU, in AlterSubscription_refresh, we remove such dropped rels and stop their corresponding apply workers which should stop the further replication of such relations but that doesn't seem to be happening in your case. -- With Regards, Amit Kapila.
Re: [Patch] Optimize dropping of relation buffers using dlist
On Fri, Jan 8, 2021 at 7:03 AM Kyotaro Horiguchi wrote: > > At Thu, 7 Jan 2021 09:25:22 +, "k.jami...@fujitsu.com" > wrote in: > > > Thanks for the detailed tests. NBuffers/32 seems like an appropriate > > > value for the threshold based on these results. I would like to > > > slightly modify part of the commit message in the first patch as below > > > [1], otherwise, I am fine with the changes. Unless you or anyone else > > > has any more comments, I am planning to push the 0001 and 0002 > > > sometime next week. > > > > > > [1] > > > "The recovery path of DropRelFileNodeBuffers() is optimized so that > > > scanning of the whole buffer pool can be avoided when the number of > > > blocks to be truncated in a relation is below a certain threshold. For > > > such cases, we find the buffers by doing lookups in BufMapping table. > > > This improves the performance by more than 100 times in many cases > > > when several small tables (tested with 1000 relations) are truncated > > > and where the server is configured with a large value of shared > > > buffers (greater than 100GB)." > > > > Thank you for taking a look at the results of the tests. And it's also > > consistent with the results from Tang too. > > The commit message LGTM. > > +1. > I have pushed the 0001. -- With Regards, Amit Kapila.
RE: Disable WAL logging to speed up data loading
Hi On Mon, Jan 11, 2021 9:14 AM Tsunakawa, Takayuki wrote: > From: Masahiko Sawada > > I think it's better to have index AM (and perhaps table AM) control it > > instead of filtering in XLogInsert(). Because otherwise third-party > > access methods that use LSN like gist indexes cannot write WAL records > > at all when wal_level = none even if they want. > > Hm, third-party extensions use RM_GENERIC_ID, so it should probably be > allowed in XLogInsert() as well instead of allowing control in some other way. > It's not unnatural that WAL records for in-core AMs are filtered in > XLogInsert(). I updated the patch following this discussion, and fixed the documentation as well. No failure is found during make check-world. Best Regards, Takamichi Osumi disable_WAL_logging_v06.patch Description: disable_WAL_logging_v06.patch
RE: POC: postgres_fdw insert batching
From: Tomas Vondra > Attached is a v6 of this patch, rebased to current master and with some minor > improvements (mostly comments and renaming the "end" struct field to > "values_end" which I think is more descriptive). Thank you very much. In fact, my initial patches used values_end, and I changed it to len considering that it may be used for bulk UPDATEand DELETE in the future. But I think values_end is easier to understand its role, too. Regards Takayuki Tsunakawa
Re: pg_dump, ATTACH, and independently restorable child partitions
Justin Pryzby writes: > [ v5-0001-pg_dump-output-separate-object-for-ALTER-TABLE.AT.patch ] Pushed with mostly cosmetic edits. One thing I changed that isn't cosmetic is that I set the ArchiveEntry's owner to be the owner of the child table. Although we aren't going to do any sort of ALTER OWNER on this, it's still important that the owner be marked as someone who has the right permissions to issue the ALTER. The default case is that the original user will issue the ATTACH, which basically only works if you run the restore as superuser. It looks to me like you copied this decision from the INDEX ATTACH code, which is just as broken --- I'm going to go fix/backpatch that momentarily. Another thing that bothers me still is that it's not real clear that this code plays nicely with selective dumps, because it's not doing anything to set the dobj.dump field in a non-default way (which in turn means that the dobj.dump test in dumpTableAttach can never fire). It seems like it might be possible to emit a TABLE ATTACH object even though one or both of the referenced tables didn't get dumped. In some desultory testing I couldn't get that to actually happen, but maybe I just didn't push on it in the right way. I'd be happier about this if we set the flags with something along the lines of attachinfo->dobj.dump = attachinfo->parentTbl->dobj.dump & attachinfo->partitionTbl->dobj.dump; regards, tom lane
Re: Moving other hex functions to /common
On Mon, Jan 11, 2021 at 11:27:30AM -0500, Bruce Momjian wrote: > Sure, I realize the elog/pg_log is odd, but the alternatives seem worse. I guess that it depends on the use cases. If there is no need to worry about shared libraries, elog/pg_log would do just fine. > You can take ownership of my hex patch and just add to it. I know you > already did the hex length part, and have other ideas of what you want. OK, thanks. I have been looking at it, and tweaked the patch as per the attached. That's basically what you did on the following points: - Use size_t for the arguments, uint64 as return result. - Leave ECPG out of the equation. - Use pg_log()/elog() to report failures in src/common/, rather than error codes. - Renamed the arguments of encode.c to src, srclen, dst, dstlen. The two only things that were not present are the set of checks for overflows, and the adjustments for varlena.c. The first point makes the code of encode.c safer, as previously the code would issue a FATAL *after* writing out-of-bound data. Now it issues an ERROR before any overwrite is done, and I have added some assertions as an extra safety net. For the second, I think that it makes the allocation pattern easier to follow, similarly to checksum manifests. Thoughts? -- Michael From 3e5c9fa42ff981933bc6eeede92335ae89e94e21 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 12 Jan 2021 11:21:42 +0900 Subject: [PATCH v5] Refactor hex code --- src/include/common/hex.h | 25 +++ src/include/common/hex_decode.h | 16 -- src/include/utils/builtins.h | 3 - src/backend/replication/backup_manifest.c | 28 ++-- src/backend/utils/adt/encode.c| 96 ++- src/backend/utils/adt/varlena.c | 16 +- src/common/Makefile | 2 +- src/common/hex.c | 196 ++ src/common/hex_decode.c | 106 src/tools/msvc/Mkvcbuild.pm | 2 +- 10 files changed, 308 insertions(+), 182 deletions(-) create mode 100644 src/include/common/hex.h delete mode 100644 src/include/common/hex_decode.h create mode 100644 src/common/hex.c delete mode 100644 src/common/hex_decode.c diff --git a/src/include/common/hex.h b/src/include/common/hex.h new file mode 100644 index 00..3c3c956bb6 --- /dev/null +++ b/src/include/common/hex.h @@ -0,0 +1,25 @@ +/* + * + * hex.h + * Encoding and decoding routines for hex strings. + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * IDENTIFICATION + * src/include/common/hex.h + * + * + */ + +#ifndef COMMON_HEX_H +#define COMMON_HEX_H + +extern uint64 pg_hex_decode(const char *src, size_t srclen, + char *dst, size_t dstlen); +extern uint64 pg_hex_encode(const char *src, size_t srclen, + char *dst, size_t dstlen); +extern uint64 pg_hex_enc_len(size_t srclen); +extern uint64 pg_hex_dec_len(size_t srclen); + +#endif /* COMMON_HEX_H */ diff --git a/src/include/common/hex_decode.h b/src/include/common/hex_decode.h deleted file mode 100644 index 29ab248458..00 --- a/src/include/common/hex_decode.h +++ /dev/null @@ -1,16 +0,0 @@ -/* - * hex_decode.h - * hex decoding - * - * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group - * Portions Copyright (c) 1994, Regents of the University of California - * - * src/include/common/hex_decode.h - */ -#ifndef COMMON_HEX_DECODE_H -#define COMMON_HEX_DECODE_H - -extern uint64 hex_decode(const char *src, size_t len, char *dst); - - -#endif /* COMMON_HEX_DECODE_H */ diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index 6f739a8822..27d2f2ffb3 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -31,9 +31,6 @@ extern void domain_check(Datum value, bool isnull, Oid domainType, extern int errdatatype(Oid datatypeOid); extern int errdomainconstraint(Oid datatypeOid, const char *conname); -/* encode.c */ -extern uint64 hex_encode(const char *src, size_t len, char *dst); - /* int.c */ extern int2vector *buildint2vector(const int16 *int2s, int n); diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c index 8af94610b3..0df5186828 100644 --- a/src/backend/replication/backup_manifest.c +++ b/src/backend/replication/backup_manifest.c @@ -13,11 +13,11 @@ #include "postgres.h" #include "access/timeline.h" +#include "common/hex.h" #include "libpq/libpq.h" #include "libpq/pqformat.h" #include "mb/pg_wchar.h" #include "replication/backup_manifest.h" -#include "utils/builtins.h" #include "utils/json.h" static void AppendStringToManifest(backup_manifest_info *manifest, char *s); @@ -150,10 +150,12 @@
Re: Key management with tests
On Tue, Jan 12, 2021 at 09:32:54AM +0900, Masahiko Sawada wrote: > On Tue, Jan 12, 2021 at 3:23 AM Stephen Frost wrote: > > Right, or ensure that the actual IV used is distinct (such as by using > > another bit in the IV to distinguish logged-vs-unlogged), but it seems > > saner to just use a different key, ultimately. > > Agreed. > > I think we also need to consider how to make sure nonce is unique when > making a page dirty by updating hint bits. Hint bit update changes the > page contents but doesn't change the page lsn if we already write a > full page write. In the PoC patch, I logged a dummy WAL record > (XLOG_NOOP) just to move the page lsn forward, but since this is > required even when changing the page is not the first time since the > last checkpoint we might end up logging too many dummy WAL records. This says: https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Other_requirements wal_log_hints will be enabled automatically in encryption mode. Does that help? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
On Tue, Jan 12, 2021 at 10:00 AM Masahiko Sawada wrote: > > On Mon, Jan 11, 2021 at 11:00 PM Peter Eisentraut > wrote: > > > > On 2021-01-05 10:56, Masahiko Sawada wrote: > > > BTW according to the documentation, the options of DECLARE statement > > > (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive. > > > > > > DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ] > > > CURSOR [ { WITH | WITHOUT } HOLD ] FOR query > > > > > > But I realized that these options are actually order-insensitive. For > > > instance, we can declare a cursor like: > > > > > > =# declare abc scroll binary cursor for select * from pg_class; > > > DECLARE CURSOR > > > > > > The both parser code and documentation has been unchanged from 2003. > > > Is it a documentation bug? > > > > According to the SQL standard, the ordering of the cursor properties is > > fixed. Even if the PostgreSQL parser offers more flexibility, I think > > we should continue to encourage writing the clauses in the standard order. > > Thanks for your comment. Agreed. > > So regarding the tab completion for DECLARE statement, perhaps it > would be better to follow the documentation? IMO yes because it's less confusing to make the document and tab-completion consistent. Regards, -- Fujii Masao
Re: POC: postgres_fdw insert batching
Hi, Attached is a v6 of this patch, rebased to current master and with some minor improvements (mostly comments and renaming the "end" struct field to "values_end" which I think is more descriptive). The one thing that keeps bugging me is convert_prep_stmt_params - it dies the right thing, but the code is somewhat confusing. AFAICS the discussions about making this use COPY and/or libpq pipelining (neither of which is committed yet) ended with the conclusion that those changes are somewhat independent, and that it's worth getting this committed in the current form. Barring objections, I'll push this within the next couple days. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company >From 1e4a99c6d4a5221dadc9e7a9922bdd9e3ebe1310 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Tue, 12 Jan 2021 01:36:01 +0100 Subject: [PATCH] Add bulk insert for foreign tables --- contrib/postgres_fdw/deparse.c| 43 ++- .../postgres_fdw/expected/postgres_fdw.out| 116 ++- contrib/postgres_fdw/option.c | 14 + contrib/postgres_fdw/postgres_fdw.c | 291 ++ contrib/postgres_fdw/postgres_fdw.h | 5 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 91 ++ doc/src/sgml/fdwhandler.sgml | 89 +- doc/src/sgml/postgres-fdw.sgml| 13 + src/backend/executor/execPartition.c | 11 + src/backend/executor/nodeModifyTable.c| 161 ++ src/backend/nodes/list.c | 15 + src/include/executor/execPartition.h | 1 + src/include/foreign/fdwapi.h | 10 + src/include/nodes/execnodes.h | 6 + src/include/nodes/pg_list.h | 15 + 15 files changed, 815 insertions(+), 66 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 3cf7b4eb1e..2d38ab25cb 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte, Index rtindex, Relation rel, List *targetAttrs, bool doNothing, List *withCheckOptionList, List *returningList, - List **retrieved_attrs) + List **retrieved_attrs, int *values_end_len) { AttrNumber pindex; bool first; @@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte, } else appendStringInfoString(buf, " DEFAULT VALUES"); + *values_end_len = buf->len; if (doNothing) appendStringInfoString(buf, " ON CONFLICT DO NOTHING"); @@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte, withCheckOptionList, returningList, retrieved_attrs); } +/* + * rebuild remote INSERT statement + * + */ +void +rebuildInsertSql(StringInfo buf, char *orig_query, + int values_end_len, int num_cols, + int num_rows) +{ + int i, j; + int pindex; + bool first; + + /* Copy up to the end of the first record from the original query */ + appendBinaryStringInfo(buf, orig_query, values_end_len); + + /* Add records to VALUES clause */ + pindex = num_cols + 1; + for (i = 0; i < num_rows; i++) + { + appendStringInfoString(buf, ", ("); + + first = true; + for (j = 0; j < num_cols; j++) + { + if (!first) +appendStringInfoString(buf, ", "); + first = false; + + appendStringInfo(buf, "$%d", pindex); + pindex++; + } + + appendStringInfoChar(buf, ')'); + } + + /* Copy stuff after VALUES clause from the original query */ + appendStringInfoString(buf, orig_query + values_end_len); +} + /* * deparse remote UPDATE statement * diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index c11092f8cc..96bad17ded 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -8911,7 +8911,7 @@ DO $d$ END; $d$; ERROR: invalid option "password" -HINT: Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size +HINT: Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions,
Re: list of extended statistics on psql
Hi Tomas, On 2021/01/09 9:01, Tomas Vondra wrote: On 1/8/21 1:14 AM, Tomas Vondra wrote: On 1/8/21 12:52 AM, Tatsuro Yamada wrote: On 2021/01/08 0:56, Tomas Vondra wrote: On 1/7/21 3:47 PM, Alvaro Herrera wrote: On 2021-Jan-07, Tomas Vondra wrote: On 1/7/21 1:46 AM, Tatsuro Yamada wrote: I overlooked the check for MCV in the logic building query because I created the patch as a new feature on PG14. I'm not sure whether we should do back patch or not. However, I'll add the check on the next patch because it is useful if you decide to do the back patch on PG10, 11, 12, and 13. BTW perhaps a quick look at the other \d commands would show if there are precedents. I didn't have time for that. Yes, we do promise that new psql works with older servers. Yeah, makes sense. That means we need add the check for 12 / MCV. Ah, I got it. I fixed the patch to work with older servers to add the checking versions. And I tested \dX command on older servers (PG10 - 13). These results look fine. 0001: Added the check code to handle pre-PG12. It has not MCV and pg_statistic_ext_data. 0002: This patch is the same as the previous patch (not changed). Please find the attached files. OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to squash the patches into a single commit. Attached is a patch I plan to commit - 0001 is the last submitted version with a couple minor tweaks, mostly in docs/comments, and small rework of branching to be more like the other functions in describe.c. Thanks for revising the patch. I reviewed the 0001, and the branching and comments look good to me. However, I added an alias name in processSQLNamePattern() on the patch: s/"stxname"/"es.stxname"/ While working on that, I realized that 'defined' might be a bit ambiguous, I initially thought it means 'NOT NULL' (which it does not). I propose to change it to 'requested' instead. Tatsuro, do you agree, or do you think 'defined' is better? Regarding the status of extended stats, I think the followings: - "defined": it shows the extended stats defined only. We can't know whether it needs to analyze or not. I agree this name was ambiguous. Therefore we should replace it with a more suitable name. - "requested": it shows the extended stats needs something. Of course, we know it needs to ANALYZE because we can create the patch. However, I feel there is a little ambiguity for DBA. To solve this, it would be better to write an explanation of the status in the document. For example, == The column of the kind of extended stats (e. g. Ndistinct) shows some statuses. "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE was finished, and the planner can use it. NULL means that it doesn't exists. == What do you think? :-D Thanks, Tatsuro Yamada >From 3d2f4ef2ecba9fd7987df665237add6fc4ec03c1 Mon Sep 17 00:00:00 2001 From: Tatsuro Yamada Date: Thu, 7 Jan 2021 14:28:20 +0900 Subject: [PATCH 1/2] psql \dX: list extended statistics objects The new command lists extended statistics objects, possibly with their sizes. All past releases with extended statistics are supported. Author: Tatsuro Yamada Reviewed-by: Julien Rouhaud, Alvaro Herrera, Tomas Vondra Discussion: https://postgr.es/m/c027a541-5856-75a5-0868-341301e1624b%40nttcom.co.jp_1 --- doc/src/sgml/ref/psql-ref.sgml | 14 +++ src/bin/psql/command.c | 3 + src/bin/psql/describe.c | 150 src/bin/psql/describe.h | 3 + src/bin/psql/help.c | 1 + src/bin/psql/tab-complete.c | 4 +- src/test/regress/expected/stats_ext.out | 94 +++ src/test/regress/sql/stats_ext.sql | 31 + 8 files changed, 299 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 221a967bfe..d01acc92b8 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1918,6 +1918,20 @@ testdb= + + +\dX [ pattern ] + + +Lists extended statistics. +If pattern +is specified, only those extended statistics whose names match the +pattern are listed. +If + is appended to the command name, each extended +statistics is listed with its size. + + + \dy[+] [ pattern ] diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 303e7c3ad8..c5ebc1c3f4 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -928,6 +928,9 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd) else success = listExtensions(pattern); break; +
Re: A failure of standby to follow timeline switch
On Sat, Jan 9, 2021 at 5:08 AM Alvaro Herrera wrote: > > Masao-san: Are you intending to act as committer for these? Since the > bug is mine I can look into it, but since you already did all the > reviewing work, I'm good with you giving it the final push. Thanks! I'm thinking to push the patch. > 0001 looks good to me; let's get that one committed quickly so that we > can focus on the interesting stuff. While the implementation of > find_in_log is quite dumb (not this patch's fault), it seems sufficient > to deal with small log files. We can improve the implementation later, > if needed, but we have to get the API right on the first try. > > 0003: The fix looks good to me. I verified that the test fails without > the fix, and it passes with the fix. Yes. > The test added in 0002 is a bit optimistic regarding timing, as well as > potentially slow; it loops 1000 times and sleeps 100 milliseconds each > time. In a very slow server (valgrind or clobber_cache animals) this > could not be sufficient time, while on fast servers it may end up > waiting longer than needed. Maybe we can do something like this: On second thought, I think that the regression test should be in 004_timeline_switch.pl instead of 001_stream_rep.pl because it's the test about timeline switch. Also I'm thinking that it's better to test the timeline switch by checking whether some data is successfully replicatead like the existing regression test for timeline switch in 004_timeline_switch.pl does, instead of finding the specific message in the log file. I attached the POC patch. Thought? Regards, -- Fujii Masao v5-0001-Move-TAP-log-searching-feature-to-common-modules.patch Description: Binary data
Re: libpq compression
On Mon, Jan 11, 2021 at 04:53:51PM +0300, Konstantin Knizhnik wrote: > On 09.01.2021 23:31, Justin Pryzby wrote: > > I suggest that there should be an enum of algorithms, which is constant > > across > > all servers. They would be unconditionally included and not #ifdef > > depending > > on compilation options. > > I do not think that it is possible (even right now, it is possible to build > Postgres without zlib support). > Also if new compression algorithms are added, then in any case we have to > somehow handle situation when > old client is connected to new server and visa versa. I mean an enum of all compression supported in the master branch, starting with ZLIB = 1. I think this applies to libpq compression (which requires client and server to handle a common compression algorithm), and pg_dump (output of which needs to be read by pg_restore), but maybe not TOAST, the patch for which supports extensions, with dynamically allocated OIDs. > > In config.sgml, it says that libpq_compression defaults to on (on the server > > side), but in libpq.sgml it says that it defaults to off (on the client > > side). > > Is that what's intended ? I would've thought the defaults would match, or > > that > > the server would enforce a default more conservative than the client's (the > > DBA > > should probably have to explicitly enable compression, and need to "opt-in" > > rather than "opt-out"). > > Yes, it is intended behavior: libpq_compression GUC allows to prohibit > compression requests fro clients if due to some reasons (security, CPU > consumption is not desired). > But by default server should support compression if it is requested by > client. It's not clear to me if that's true.. It may be what's convenient for you, especially during development, but that doesn't mean it's safe or efficient or what's generally desirable to everyone. > But client should not request compression by default: it makes sense only for > queries returning large result sets or transferring a lot of data (liek > COPY). I think you're making assumptions about everyone's use of the tools, and it's better if the DBA makes that determination. The clients aren't generally under the admin's control, and if they don't request compression, then it's not used. If they request compression, then the DBA still has control over whether it's allowed. We agree that it should be disabled by default, but I suggest that it's most flexible if client's makes the request and allow the server to decide. By default the server should deny/ignore the request, with the client gracefully falling back to no compression. Compression would have little effect on most queries, especially at default level=1. > > Maybe instead of a boolean, this should be a list of permitted compression > > algorithms. This allows the admin to set a "policy" rather than using the > > server's hard-coded preferences. This could be important to disable an > > algorithm at run-time if there's a vulnerability found, or performance > > problem, > > or buggy client, or for diagnostics, or performance testing. Actually, I > > think > > it may be important to allow the admin to disable not just an algorithm, but > > also its options. It should be possible for the server to allow "zstd" > > compression but not "zstd --long", or zstd:99 or arbitrarily large window > > sizes. This seems similar to SSL cipher strings. I think we'd want to be > > able > > to allow (or prohibit) at least alg:* (with options) or alg (with default > > options). > > Sorry, may be you are looking not at the latest version of the patch? > Right now "compression" parameter accepts not only boolean values but also > list of suggested algorithms with optional compression level, like > "zstd:1,zlib" You're talking about the client compression param. I'm suggesting that the server should allow fine-grained control of what compression algs are permitted at *runtime*. This would allow distributions to compile with all compression libraries enabled at configure time, and still allow an DBA to disable one without recompiling. > > Your patch documents "libpq_compression=auto" but that doesn't work: > > WARNING: none of specified algirthms auto is supported by client > > I guess you mean "any" which is what's implemented. > > I suggest to just remove that. > It is some inconsistency with documentation. > It seems to me that I have already fixed it. "auto" was renamed to "any", > > I think maybe your patch should include a way to trivially change the > > client's > > compile-time default: > > "if (!conn->compression) conn->compression = DefaultCompression" > > It can be done using PG_COMPRESSION environment variable. > Do we need some other mechanism for it? That's possible with environment variable or connection string. I'm proposing to simplify change of its default when recompiling, just like this one: src/interfaces/libpq/fe-connect.c:#define DefaultHost "localhost" Think
Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
On Mon, Jan 11, 2021 at 11:00 PM Peter Eisentraut wrote: > > On 2021-01-05 10:56, Masahiko Sawada wrote: > > BTW according to the documentation, the options of DECLARE statement > > (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive. > > > > DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ] > > CURSOR [ { WITH | WITHOUT } HOLD ] FOR query > > > > But I realized that these options are actually order-insensitive. For > > instance, we can declare a cursor like: > > > > =# declare abc scroll binary cursor for select * from pg_class; > > DECLARE CURSOR > > > > The both parser code and documentation has been unchanged from 2003. > > Is it a documentation bug? > > According to the SQL standard, the ordering of the cursor properties is > fixed. Even if the PostgreSQL parser offers more flexibility, I think > we should continue to encourage writing the clauses in the standard order. Thanks for your comment. Agreed. So regarding the tab completion for DECLARE statement, perhaps it would be better to follow the documentation? In the current proposed patch, we complete it with the options in any order. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Re: Huge memory consumption on partitioned table with FKs
Hi Amit-san, > Thanks for checking. Indeed, it should have been added to the January > commit-fest. I've added it to the March one: > > https://commitfest.postgresql.org/32/2930/ Thank you for your quick response! -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
Re: Key management with tests
On Tue, Jan 12, 2021 at 3:23 AM Stephen Frost wrote: > > Greetings, > > * Bruce Momjian (br...@momjian.us) wrote: > > On Mon, Jan 11, 2021 at 12:54:49PM -0500, Stephen Frost wrote: > > > Although, another approach and one that I've discussed a bit with Bruce, > > > is to have more keys- such as a key for temporary files, and perhaps > > > even a key for logged relations and a different for unlogged.. Or > > > > Yes, we have to make sure the nonce (computed as LSN/pageno) is never > > reused, so if we have several LSN usage "spaces", they need different > > data keys. > > Right, or ensure that the actual IV used is distinct (such as by using > another bit in the IV to distinguish logged-vs-unlogged), but it seems > saner to just use a different key, ultimately. Agreed. I think we also need to consider how to make sure nonce is unique when making a page dirty by updating hint bits. Hint bit update changes the page contents but doesn't change the page lsn if we already write a full page write. In the PoC patch, I logged a dummy WAL record (XLOG_NOOP) just to move the page lsn forward, but since this is required even when changing the page is not the first time since the last checkpoint we might end up logging too many dummy WAL records. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Outdated replication protocol error?
Commit 5ee2197767 (about 4 years old) introduced the error: "IDENTIFY_SYSTEM has not been run before START_REPLICATION" But it seems like running START_REPLICATION first works (at least in the simple case). We should either: 1. Document that IDENTIFY_SYSTEM must always be run before START_REPLICATION, and always issue a WARNING if that's not done (an ERROR might break existing applications); or 2. If the commit is out of date and no longer needed, or if it's easy enough to fix, just remove the error (and Assert a valid ThisTimeLineID). Regards, Jeff Davis
Re: Implement for window functions
I started to look through this patch, and the first thing I'm wondering about is why bother with a new pg_proc column, ie why not just apply the behavior to any window function? The fact that the SQL committee restricts this syntax to a few window functions is just their usual design tic of creating one-off syntax that could be much more general. We've not hesitated to generalize in similar situations in the past. The main thing I can see against that position is that it's not very clear what to do if the window function has more than one window-ized argument --- or at least, the only plausible interpretation would be to ignore rows in which any of those arguments is null, which this implementation is incapable of doing (since we don't know exactly which arguments the function will try to treat as window-ized). However, having a pronulltreatment column isn't helping that situation at all: somebody could mark a multi-input window function as ignoring nulls, and we'd silently do the wrong thing in any case where those inputs weren't nulls at exactly the same rows. My thought, therefore, is to drop pg_proc.pronulltreatment and instead enforce an implementation restriction that when IGNORE NULLS is specified, WinGetFuncArgInPartition and WinGetFuncArgInFrame throw an error if asked about any argument position other than the first one. As long as only the first argument is window-ized, the implementation you have here will act correctly. If anybody ever finds that annoying, they can figure out how to relax the restriction at that time. The need for a TREAT NULLS option to CREATE FUNCTION would thereby also go away, which is good because I don't think this patch has fully implemented that (notably, I don't see any pg_dump changes). As far as the actual implementation goes: * The undocumented relationship between "relpos" (which used to be constant and now isn't) and "target" and "step" makes my head hurt. I'm sure this could be redesigned to be simpler, or if not, at least it should be commented a lot more thoroughly. * I'm quite concerned about performance; it looks like this will degrade to O(N^2) in practical situations, which isn't going to satisfy anyone. I think we need to track how many nulls we've already seen so that we aren't re-visiting earlier rows over and over. That should make it possible to un-disable the set_mark optimization, which is something that's independently catastrophic for performance. While I've not stopped to design this fully, maybe we could keep state along the lines of "there are j rows with null values of the window-ized argument before row k of the partition." Updating that by dead reckoning as we navigate would be enough to fix the O(N^2) problem for typical scenarios. I think. regards, tom lane
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
On 1/11/21 10:00 PM, Anastasia Lubennikova wrote: On 11.01.2021 01:35, Tomas Vondra wrote: Hi, I started looking at this patch again, hoping to get it committed in this CF, but I think there's a regression in handling TOAST tables (compared to the v3 patch submitted by Pavan in February 2019). The test I'm running a very simple test (see test.sql): 1) start a transaction 2) create a table with a text column 3) copy freeze data into it 4) use pg_visibility to see how many blocks are all_visible both in the main table and it's TOAST table For v3 patch (applied on top of 278584b526 and s/hi_options/ti_options) I get this: pages NOT all_visible -- main 637 0 toast 50001 3 There was some discussion about relcache invalidations causing a couple TOAST pages not be marked as all_visible, etc. However, for this patch on master I get this pages NOT all_visible -- main 637 0 toast 50001 50001 So no pages in TOAST are marked as all_visible. I haven't investigated what's causing this, but IMO that needs fixing to make ths patch RFC. Attached is the test script I'm using, and a v5 of the patch - rebased on current master, with some minor tweaks to comments etc. Thank you for attaching the test script. I reproduced the problem. This regression occurs because TOAST internally uses heap_insert(). You have asked upthread about adding this optimization to heap_insert(). I wrote a quick fix, see the attached patch 0002. The TOAST test passes now, but I haven't tested performance or any other use-cases yet. I'm going to test it properly in a couple of days and share results. Thanks. I think it's important to make this work for TOAST tables - it often stores most of the data, and it was working in v3 of the patch. I haven't looked into the details, but if it's really just due to TOAST using heap_insert, I'd say it just confirms the importance of tweaking heap_insert too. With this change a lot of new code is repeated in heap_insert() and heap_multi_insert(). I think it's fine, because these functions already have a lot in common. Understood. IMHO a bit of redundancy is not a big issue, but I haven't looked at the code yet. Let's get it working first, then we can decide if some refactoring is appropriate. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Deleting older versions in unique indexes to avoid page splits
пн, 11 янв. 2021 г. в 22:10, Peter Geoghegan : > I imagine that this happened because you have track_io_timing=on in > postgresql.conf. Doesn't the same failure take place with the current > master branch? > > I have my own way of running the locally installed Postgres when I > want "make installcheck" to pass that specifically accounts for this > (and a few other similar things). > Oh, right, haven't thought of this. Thanks for pointing that out. Now everything looks good! -- Victor Yegorov
Re: Deleting older versions in unique indexes to avoid page splits
On Mon, Jan 11, 2021 at 12:19 PM Victor Yegorov wrote: > (see attached diff). It doesn't look like the fault of this patch, though. > > I suppose you plan to send another revision before committing this. > Therefore I didn't perform any tests here, will wait for the next version. I imagine that this happened because you have track_io_timing=on in postgresql.conf. Doesn't the same failure take place with the current master branch? I have my own way of running the locally installed Postgres when I want "make installcheck" to pass that specifically accounts for this (and a few other similar things). -- Peter Geoghegan
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
On 11.01.2021 01:35, Tomas Vondra wrote: Hi, I started looking at this patch again, hoping to get it committed in this CF, but I think there's a regression in handling TOAST tables (compared to the v3 patch submitted by Pavan in February 2019). The test I'm running a very simple test (see test.sql): 1) start a transaction 2) create a table with a text column 3) copy freeze data into it 4) use pg_visibility to see how many blocks are all_visible both in the main table and it's TOAST table For v3 patch (applied on top of 278584b526 and s/hi_options/ti_options) I get this: pages NOT all_visible -- main 637 0 toast 50001 3 There was some discussion about relcache invalidations causing a couple TOAST pages not be marked as all_visible, etc. However, for this patch on master I get this pages NOT all_visible -- main 637 0 toast 50001 50001 So no pages in TOAST are marked as all_visible. I haven't investigated what's causing this, but IMO that needs fixing to make ths patch RFC. Attached is the test script I'm using, and a v5 of the patch - rebased on current master, with some minor tweaks to comments etc. Thank you for attaching the test script. I reproduced the problem. This regression occurs because TOAST internally uses heap_insert(). You have asked upthread about adding this optimization to heap_insert(). I wrote a quick fix, see the attached patch 0002. The TOAST test passes now, but I haven't tested performance or any other use-cases yet. I'm going to test it properly in a couple of days and share results. With this change a lot of new code is repeated in heap_insert() and heap_multi_insert(). I think it's fine, because these functions already have a lot in common. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From 6ccaa4f526b38fbdd2f3a38ac3bd51e96fb140b6 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Sun, 10 Jan 2021 20:30:29 +0100 Subject: [PATCH] Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the visibility map. Until now it only marked individual tuples as frozen, but page-level flags were not updated. This is a fairly old patch, and multiple people worked on it. The first version was written by Jeff Janes, and then reworked by Pavan Deolasee and Anastasia Lubennikova. Author: Pavan Deolasee, Anastasia Lubennikova, Jeff Janes Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada, Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii Discussion: https://postgr.es/m/caboikdn-ptgv0mzntrk2q8otfuuajqaymgmkdu1dckftuxv...@mail.gmail.com Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com --- .../pg_visibility/expected/pg_visibility.out | 64 +++ contrib/pg_visibility/sql/pg_visibility.sql | 77 +++ src/backend/access/heap/heapam.c | 76 -- src/backend/access/heap/hio.c | 17 src/include/access/heapam_xlog.h | 3 + 5 files changed, 229 insertions(+), 8 deletions(-) diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out index ca4b6e186b..0017e3415c 100644 --- a/contrib/pg_visibility/expected/pg_visibility.out +++ b/contrib/pg_visibility/expected/pg_visibility.out @@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition'); (1 row) +-- test copy freeze +create table copyfreeze (a int, b char(1500)); +-- load all rows via COPY FREEZE and ensure that all pages are set all-visible +-- and all-frozen. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | t | t + 1 | t | t + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + +-- load half the rows via regular COPY and rest via COPY FREEZE. The pages +-- which are touched by regular COPY must not be set all-visible/all-frozen. On +-- the other hand, pages allocated by COPY FREEZE should be marked +-- all-frozen/all-visible. +begin; +truncate copyfreeze; +copy copyfreeze from stdin; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | f | f + 1 | f | f + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + +-- Try a mix of regular COPY and COPY FREEZE. +begin; +truncate
Re: Key management with tests
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Mon, Jan 11, 2021 at 02:19:22PM -0500, Stephen Frost wrote: > > outputs from the GCM encryption and is what provides the integrity / > > authentication of the encrypted data to be able to detect if it's been > > modified. Unfortunately, while the page checksum will continue to be > > used and available for checking against disk corruption, it's not > > sufficient. Hence, ideally, we'd find a spot to stick the 128-bit tag > > on each page. > > Agreed. Would checksums be of any value with GCM? The value would be to allow testing of the database integrity, to the amount allowed by the checksum, to be done without having access to the encryption keys, and because there's not much else we'd be using those bits for if we didn't. > > Given that, clearly, it's not possible to go from an unencrypted cluster > > to an encrypted cluster without rewriting the entire cluster, we aren't > > bound to maintain the on-disk page format, we should be able to > > accomadate including the tag somewhere. Unfortuantely, it doesn't seem > > quite as trivial as I'd hoped since there are parts of the code which > > make assumptions about the page beyond perhaps what they should be, but > > I'm still hopeful that it won't be *too* hard to do. > > OK, thanks. Are there other page improvements we should make when we > are requiring a page rewrite? This is an interesting question but ultimately I don't think we should be looking at this from the perspective of allowing arbitrary changes to the page format. The challenge is that much of the page format, today, is defined by a C struct and changing the way that works would require a great deal of code to be modified and turn this into a massive effort, assuming we wish to have the same compiled binary able to work with both unencrypted and encrypted clusters, which I do believe is a requirement. The thought that I had was to, instead, try to figure out if we could fudge some space by, say, putting a 128-bit 'hole' at the end of the page and just move pd_special back, effectively making the page seem 'smaller' to all of the code that uses it, except for the code that knows how to do the decryption. I ran into some trouble with that but haven't quite sorted out what happened yet. Other ideas would be to put it before pd_special, or maybe somewhere else, but a lot depends on the code's expectations. Thanks, Stephen signature.asc Description: PGP signature
Re: Proposal: Global Index
On Mon, Jan 11, 2021 at 12:05:43PM -0800, Peter Geoghegan wrote: > On Mon, Jan 11, 2021 at 11:25 AM Bruce Momjian wrote: > > Once you layer on all the places a global index will be worse than just > > creating a single large table, or a partitioned table with an index per > > child, there might not be much usefulness left. A POC patch might tell > > us that, and might allow us to mark it as "not wanted". > > I'm confused. Of course it's true to some degree that having a global > index "defeats the purpose" of having a partitioned table. But only to > a degree. And for some users it will make the difference between using > partitioning and not using partitioning -- they simply won't be able > to tolerate not having it available (e.g. because of a requirement for > a unique constraint that does not cover the partitioning key). Yes, that is a good point. For those cases, I think we need to look at the code complexity/overhead of supporting that feature. There are going to be a few cases it is a win, but will the code complexity be worth it? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Key management with tests
On Mon, Jan 11, 2021 at 02:19:22PM -0500, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (br...@momjian.us) wrote: > > On Mon, Jan 11, 2021 at 01:23:27PM -0500, Stephen Frost wrote: > > > Yes, and it avoids the issue of using a single key for too much, which > > > is also a concern. The remaining larger issues are to figure out a > > > place to put the tag for each page, and the relatively simple matter of > > > programming a mechanism to cache the keys we're commonly using (current > > > key for encryption, recently used keys for decryption) since we'll > > > eventually get to a point of having written out more data than we are > > > going to keep keys in memory for. > > > > I thought the LSN range would be stored with the keys, so there is no > > need to tag the LSN on each page. > > Yes, LSN range would be stored with the keys in some fashion (maybe just > the start of a particular LSN range would be in the filename of the key > for that range...). The 'tag' that I'm referring to there is one of the Oh, that tag, yes, we need to add that to each page. I thought you mean an LSN-range-key tag. > outputs from the GCM encryption and is what provides the integrity / > authentication of the encrypted data to be able to detect if it's been > modified. Unfortunately, while the page checksum will continue to be > used and available for checking against disk corruption, it's not > sufficient. Hence, ideally, we'd find a spot to stick the 128-bit tag > on each page. Agreed. Would checksums be of any value with GCM? > Given that, clearly, it's not possible to go from an unencrypted cluster > to an encrypted cluster without rewriting the entire cluster, we aren't > bound to maintain the on-disk page format, we should be able to > accomadate including the tag somewhere. Unfortuantely, it doesn't seem > quite as trivial as I'd hoped since there are parts of the code which > make assumptions about the page beyond perhaps what they should be, but > I'm still hopeful that it won't be *too* hard to do. OK, thanks. Are there other page improvements we should make when we are requiring a page rewrite? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Deleting older versions in unique indexes to avoid page splits
пн, 11 янв. 2021 г. в 01:07, Peter Geoghegan : > > Attached is v13, which has this tweak, and other miscellaneous cleanup > based on review from both Victor and Heikki. I consider this version > of the patch to be committable. I intend to commit something close to > it in the next week, probably no later than Thursday. I still haven't > got to the bottom of the shellsort question raised by Heikki. I intend > to do further performance validation before committing the patch. I > will look into the shellsort thing again as part of this final > performance validation work -- perhaps I can get rid of the > specialized shellsort implementation entirely, simplifying the state > structs added to tableam.h. (As I said before, it seems best to > address this last of all to avoid making performance validation even > more complicated.) > I've checked this version quickly. It applies and compiles without issues. `make check` and `make check-world` reported no issue. But `make installcheck-world` failed on: … test explain ... FAILED 22 ms test event_trigger... ok 178 ms test fast_default ... ok 262 ms test stats... ok 586 ms 1 of 202 tests failed. (see attached diff). It doesn't look like the fault of this patch, though. I suppose you plan to send another revision before committing this. Therefore I didn't perform any tests here, will wait for the next version. -- Victor Yegorov 20210111-v13-regression.diffs Description: Binary data
Re: Proposal: Global Index
On Mon, Jan 11, 2021 at 11:25 AM Bruce Momjian wrote: > Once you layer on all the places a global index will be worse than just > creating a single large table, or a partitioned table with an index per > child, there might not be much usefulness left. A POC patch might tell > us that, and might allow us to mark it as "not wanted". I'm confused. Of course it's true to some degree that having a global index "defeats the purpose" of having a partitioned table. But only to a degree. And for some users it will make the difference between using partitioning and not using partitioning -- they simply won't be able to tolerate not having it available (e.g. because of a requirement for a unique constraint that does not cover the partitioning key). -- Peter Geoghegan
Re: Proposal: Global Index
On Mon, Jan 11, 2021 at 11:01:20AM -0800, Peter Geoghegan wrote: > However, it probably would be okay if a global index feature performed > poorly in scenarios where partitions get lots of UPDATEs that produce > lots of index bloat and cause lots of LP_DEAD line pointers to > accumulate in heap pages. It is probably reasonable to just expect > users to not do that if they want to get acceptable performance while > using a global index. Especially since it probably is not so bad if > the index bloat situation gets out of hand for just one of the > partitions (say the most recent one) every once in a while. You at > least don't have the same crazy I/O multiplier effect that you > described. Once you layer on all the places a global index will be worse than just creating a single large table, or a partitioned table with an index per child, there might not be much usefulness left. A POC patch might tell us that, and might allow us to mark it as "not wanted". -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Proposal: Global Index
On Mon, Jan 11, 2021 at 01:37:02PM -0500, Robert Haas wrote: > However, there is a VACUUM amplification effect to worry about here ... > That's not necessarily a death sentence for every use case, but it's > going to be pretty bad for tables that are big and heavily updated. Yeah, I had not really gotten that far in my thinking, but someone is going to need to create a POC and then we need to test it to see if it offers a reasonably valuable feature. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Key management with tests
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Mon, Jan 11, 2021 at 01:23:27PM -0500, Stephen Frost wrote: > > Yes, and it avoids the issue of using a single key for too much, which > > is also a concern. The remaining larger issues are to figure out a > > place to put the tag for each page, and the relatively simple matter of > > programming a mechanism to cache the keys we're commonly using (current > > key for encryption, recently used keys for decryption) since we'll > > eventually get to a point of having written out more data than we are > > going to keep keys in memory for. > > I thought the LSN range would be stored with the keys, so there is no > need to tag the LSN on each page. Yes, LSN range would be stored with the keys in some fashion (maybe just the start of a particular LSN range would be in the filename of the key for that range...). The 'tag' that I'm referring to there is one of the outputs from the GCM encryption and is what provides the integrity / authentication of the encrypted data to be able to detect if it's been modified. Unfortunately, while the page checksum will continue to be used and available for checking against disk corruption, it's not sufficient. Hence, ideally, we'd find a spot to stick the 128-bit tag on each page. Given that, clearly, it's not possible to go from an unencrypted cluster to an encrypted cluster without rewriting the entire cluster, we aren't bound to maintain the on-disk page format, we should be able to accomadate including the tag somewhere. Unfortuantely, it doesn't seem quite as trivial as I'd hoped since there are parts of the code which make assumptions about the page beyond perhaps what they should be, but I'm still hopeful that it won't be *too* hard to do. Thanks, Stephen signature.asc Description: PGP signature
Re: Key management with tests
On Mon, Jan 11, 2021 at 01:23:27PM -0500, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (br...@momjian.us) wrote: > > On Mon, Jan 11, 2021 at 12:54:49PM -0500, Stephen Frost wrote: > > > Although, another approach and one that I've discussed a bit with Bruce, > > > is to have more keys- such as a key for temporary files, and perhaps > > > even a key for logged relations and a different for unlogged.. Or > > > > Yes, we have to make sure the nonce (computed as LSN/pageno) is never > > reused, so if we have several LSN usage "spaces", they need different > > data keys. > > Right, or ensure that the actual IV used is distinct (such as by using > another bit in the IV to distinguish logged-vs-unlogged), but it seems > saner to just use a different key, ultimately. Yes, we have eight unused bit in the Nonce right now. > > > perhaps sets of keys for each which automatically are rotating every X > > > number of GB based on the LSN... Which is a big part of why key > > > management is such an important part of this effort. > > > > Yes, this would avoid the need to failover to a standby for data key > > rotation. > > Yes, and it avoids the issue of using a single key for too much, which > is also a concern. The remaining larger issues are to figure out a > place to put the tag for each page, and the relatively simple matter of > programming a mechanism to cache the keys we're commonly using (current > key for encryption, recently used keys for decryption) since we'll > eventually get to a point of having written out more data than we are > going to keep keys in memory for. I thought the LSN range would be stored with the keys, so there is no need to tag the LSN on each page. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Proposal: Global Index
On Mon, Jan 11, 2021 at 10:37 AM Robert Haas wrote: > I actually think the idea of lazily deleting the index entries is > pretty good, but it won't work if the way the global index is > implemented is by adding a tableoid column. Perhaps there is an opportunity to apply some of the infrastructure that Masahiko Sawada has been working on, that makes VACUUM more incremental in certain specific scenarios: https://postgr.es/m/cad21aod0ske11fmw4jd4renawbmcw1wasvnwpjvw3tvqpoq...@mail.gmail.com I think that VACUUM can be taught to skip the ambulkdelete() step for indexes in many common scenarios. Global indexes might be one place in which that's almost essential. > However, there is a VACUUM amplification effect to worry about here > which Wenjing seems not to be considering. > That's not necessarily a death sentence for every use case, but it's > going to be pretty bad for tables that are big and heavily updated. The main way in which index vacuuming is currently a death sentence for this design (as you put it) is that it's an all-or-nothing thing. Presumably you'll need to VACUUM the entire global index for each partition that receives even one UPDATE. That seems pretty extreme, and probably not acceptable. In a way it's not really a new problem, but the fact remains: it makes global indexes much less valuable. However, it probably would be okay if a global index feature performed poorly in scenarios where partitions get lots of UPDATEs that produce lots of index bloat and cause lots of LP_DEAD line pointers to accumulate in heap pages. It is probably reasonable to just expect users to not do that if they want to get acceptable performance while using a global index. Especially since it probably is not so bad if the index bloat situation gets out of hand for just one of the partitions (say the most recent one) every once in a while. You at least don't have the same crazy I/O multiplier effect that you described. -- Peter Geoghegan
Re: Proposal: Global Index
On Mon, Jan 11, 2021 at 12:46 PM Bruce Momjian wrote: > > For 1) The DETACH old child table can be finished immediately, global index > > can be kept valid after DETACH is completed, and the cleanup of garbage > > data in global index can be deferred to VACUUM. > > This is similar to the global index optimization done by Oracle12c. > > For 2) ATTACH new empty child table can also be completed immediately. > > If this is the case, many of the advantages of partitioned tables will be > > retained, while the advantages of global indexes will be gained. > > Yes, we can keep the index rows for the deleted partition and clean them > up later, but what is the advantage of partitioning then? Just heap > deletion quickly? Is that enough of a value? I actually think the idea of lazily deleting the index entries is pretty good, but it won't work if the way the global index is implemented is by adding a tableoid column. Because then, I might detach a partition and later reattach it and the old index entries are still there but the table contents might have changed. Worse yet, the table might be dropped and the table OID reused for a completely unrelated table with completely unrelated contents, which could then be attached as a new partition. One of the big selling points of global indexes is that they allow you to enforce uniqueness on a column unrelated to the partitioning column. Another is that you can look up a value by doing a single index scan on the global index rather than an index scan per partition. Those things are no less valuable for performing index deletion lazily. However, there is a VACUUM amplification effect to worry about here which Wenjing seems not to be considering. Suppose I have a table which is not partitioned and it is 1TB in size with an index that is 128GB in size. To vacuum the table, I need to do 1TB + 128GB of I/O. Now, suppose I now partition the table into 1024 partitions each with its own local index. Each partition is 1GB in size and the index on each partition is 128MB in size. To vacuum an individual partition requires 1GB + 128MB of I/O, so to vacuum all the partitions requires the same amount of total I/O as before. But, now suppose that I have a single global index instead of a local index per partition. First, how big will that index be? It will not be 128GB, but somewhat bigger, because it needs extra space for every indexed tuple. Let's say 140GB. Furthermore, it will need to be vacuumed whenever any child is vacuumed, because it contains some index entries from every child. So the total I/O to vacuum all partitions is now 1GB * 1024 + 140GB * 1024 = 141TB, which is a heck of a lot worse than the 1.125TB I required with the unpartitioned table or the locally partitioned table. That's not necessarily a death sentence for every use case, but it's going to be pretty bad for tables that are big and heavily updated. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Proposal: Global Index
Bruce Momjian writes: > On Mon, Jan 11, 2021 at 07:40:18PM +0800, 曾文旌 wrote: >> This is indeed a typical scenario for a partitioned table. >> there are two basic operations >> 1) Monthly DETACH old child table >> 2) Monthly ATTACH new child table >> >> For 1) The DETACH old child table can be finished immediately, global index >> can be kept valid after DETACH is completed, and the cleanup of garbage data >> in global index can be deferred to VACUUM. > Yes, we can keep the index rows for the deleted partition and clean them > up later, but what is the advantage of partitioning then? Just heap > deletion quickly? Is that enough of a value? More to the point, you still have a massive index cleanup operation to do. Deferred or not, that's going to take a lot of cycles, and it will leave you with a bloated global index. I find it hard to believe that this approach will seem like an improvement over doing partitioning the way we do it now. regards, tom lane
Re: Key management with tests
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Mon, Jan 11, 2021 at 12:54:49PM -0500, Stephen Frost wrote: > > Although, another approach and one that I've discussed a bit with Bruce, > > is to have more keys- such as a key for temporary files, and perhaps > > even a key for logged relations and a different for unlogged.. Or > > Yes, we have to make sure the nonce (computed as LSN/pageno) is never > reused, so if we have several LSN usage "spaces", they need different > data keys. Right, or ensure that the actual IV used is distinct (such as by using another bit in the IV to distinguish logged-vs-unlogged), but it seems saner to just use a different key, ultimately. > > perhaps sets of keys for each which automatically are rotating every X > > number of GB based on the LSN... Which is a big part of why key > > management is such an important part of this effort. > > Yes, this would avoid the need to failover to a standby for data key > rotation. Yes, and it avoids the issue of using a single key for too much, which is also a concern. The remaining larger issues are to figure out a place to put the tag for each page, and the relatively simple matter of programming a mechanism to cache the keys we're commonly using (current key for encryption, recently used keys for decryption) since we'll eventually get to a point of having written out more data than we are going to keep keys in memory for. Thanks, Stephen signature.asc Description: PGP signature
Re: [POC] Fast COPY FROM command for the table with foreign partitions
Hi Andrey, Unfortunately, this no longer applies :-( I tried to apply this on top of c532d15ddd (from 2020/12/30) but even that has non-trivial conflicts. Can you send a rebased version? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Key management with tests
On Mon, Jan 11, 2021 at 12:54:49PM -0500, Stephen Frost wrote: > Although, another approach and one that I've discussed a bit with Bruce, > is to have more keys- such as a key for temporary files, and perhaps > even a key for logged relations and a different for unlogged.. Or Yes, we have to make sure the nonce (computed as LSN/pageno) is never reused, so if we have several LSN usage "spaces", they need different data keys. > perhaps sets of keys for each which automatically are rotating every X > number of GB based on the LSN... Which is a big part of why key > management is such an important part of this effort. Yes, this would avoid the need to failover to a standby for data key rotation. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Key management with tests
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Mon, Jan 11, 2021 at 08:12:00PM +0900, Masahiko Sawada wrote: > > Looking at the patch, it supports three algorithms but only > > PG_CIPHER_AES_KWP is used in the core for now: > > > > +/* > > + * Supported symmetric encryption algorithm. These identifiers are passed > > + * to pg_cipher_ctx_create() function, and then actual encryption > > + * implementations need to initialize their context of the given encryption > > + * algorithm. > > + */ > > +#define PG_CIPHER_AES_GCM 0 > > +#define PG_CIPHER_AES_KW 1 > > +#define PG_CIPHER_AES_KWP 2 > > +#define PG_MAX_CIPHER_ID 3 > > > > Are we in the process of experimenting which algorithms are better? If > > we support one algorithm that is actually used in the core, we would > > reduce the tests as well. > > I think we are only using KWP (Key Wrap with Padding) because that is > for wrapping keys: > > > https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Algorithm-Validation-Program/documents/mac/KWVS.pdf Yes. > I am not sure about KW. I think we are using GCM for the WAP/heap/index > pages. Stephen would know more. KW was more-or-less 'for free' and there were tests for it, which is why it was included. Yes, GCM would be for WAL/heap/index pages, it wouldn't be appropriate to use KW or KWP for that. Using KW/KWP for the key wrapping also makes the API simpler- and therefore easier for other implementations to be written which provide the same API. > > FWIW, I've written a PoC patch for buffer encryption to make sure the > > kms patch would be workable with other components using the encryption > > key managed by kmgr. > > Wow, it is a small patch --- nice. I agree that the actual encryption patch, for just the main heap/index, won't be too bad. The larger part will be dealing with all of the temporary files we create that have user data in them... I've been contemplating a way to try and make that part of the patch smaller though and hopefully that will bear fruit and we can avoid having to change a lof of, eg, reorderbuffer.c and pgstat.c. There's a few places where we need to be sure to be updating the LSN for both logged and unlogged relations properly, including dealing with things like the magic GIST "GistBuildLSN" fake-LSN too, and we will absolutely need to have a bit used in the IV to distinguish if it's a real LSN or an unlogged LSN. Although, another approach and one that I've discussed a bit with Bruce, is to have more keys- such as a key for temporary files, and perhaps even a key for logged relations and a different for unlogged.. Or perhaps sets of keys for each which automatically are rotating every X number of GB based on the LSN... Which is a big part of why key management is such an important part of this effort. Thanks, Stephen signature.asc Description: PGP signature
Re: Proposal: Global Index
On Mon, Jan 11, 2021 at 07:40:18PM +0800, 曾文旌 wrote: > >> In addition you mentioned: "It is still unclear if these use-cases justify > >> the architectural changes needed to enable global indexes." > >> Please also describe the problems you see, I will confirm each specific > >> issue one by one. > > > > One example is date partitioning. People frequently need to store > > only the most recent data. For instance doing a monthly partitioning > > and dropping the oldest partition every month once you hit the wanted > > retention is very efficient for that use case, as it should be almost > > instant (provided that you can acquire the necessary locks > > immediately). But if you have a global index, you basically lose the > > advantage of partitioning as it'll require heavy changes on that > > index. > If the global index removes all the major benefits of partitioned tables, > then it is not worth having it. > > This is indeed a typical scenario for a partitioned table. > there are two basic operations > 1) Monthly DETACH old child table > 2) Monthly ATTACH new child table > > For 1) The DETACH old child table can be finished immediately, global index > can be kept valid after DETACH is completed, and the cleanup of garbage data > in global index can be deferred to VACUUM. > This is similar to the global index optimization done by Oracle12c. > For 2) ATTACH new empty child table can also be completed immediately. > If this is the case, many of the advantages of partitioned tables will be > retained, while the advantages of global indexes will be gained. Yes, we can keep the index rows for the deleted partition and clean them up later, but what is the advantage of partitioning then? Just heap deletion quickly? Is that enough of a value? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: libpq compression
On 1/11/21 2:53 PM, Konstantin Knizhnik wrote: > > ... > > New version of libpq compression patch is attached. > It can be also be found at g...@github.com:postgrespro/libpq_compression.git > Seems it bit-rotted already, so here's a slightly fixed version. 1) Fixes the MSVC makefile. The list of files is sorted alphabetically, so I've added the file at the end. 2) Fixes duplicate OID. It's a good practice to assign OIDs from the end of the range, to prevent collisions during development. Other than that, I wonder what's the easiest way to run all tests with compression enabled. ISTM it'd be nice to add pg_regress option forcing a particular compression algorithm to be used, or something similar. I'd like a convenient way to pass this through a valgrind, for example. Or how do we get this tested on a buildfarm? I'm not convinced it's very user-friendly to not have a psql option enabling compression. It's true it can be enabled in a connection string, but I doubt many people will notice that. The sgml docs need a bit more love / formatting. The lines in libpq.sgml are far too long, and there are no tags whatsoever. Presumably zlib/zstd should be marked as , and so on. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company >From 14690d8a877244fbc839f97274ccf7d84e971e71 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 11 Jan 2021 18:01:11 +0100 Subject: [PATCH] v29 --- configure | 81 +++ configure.ac | 21 + .../postgres_fdw/expected/postgres_fdw.out| 2 +- doc/src/sgml/config.sgml | 16 + doc/src/sgml/libpq.sgml | 25 + doc/src/sgml/protocol.sgml| 93 +++ src/Makefile.global.in| 1 + src/backend/Makefile | 8 + src/backend/catalog/system_views.sql | 9 + src/backend/libpq/pqcomm.c| 247 +-- src/backend/postmaster/pgstat.c | 30 + src/backend/postmaster/postmaster.c | 10 + src/backend/utils/adt/pgstatfuncs.c | 50 +- src/backend/utils/misc/guc.c | 10 + src/common/Makefile | 3 +- src/common/zpq_stream.c | 684 ++ src/include/catalog/pg_proc.dat | 18 +- src/include/common/zpq_stream.h | 94 +++ src/include/libpq/libpq-be.h | 3 + src/include/libpq/libpq.h | 1 + src/include/libpq/pqcomm.h| 1 + src/include/pg_config.h.in| 3 + src/include/pgstat.h | 7 + src/interfaces/libpq/Makefile | 14 + src/interfaces/libpq/fe-connect.c | 92 ++- src/interfaces/libpq/fe-exec.c| 4 +- src/interfaces/libpq/fe-misc.c| 55 +- src/interfaces/libpq/fe-protocol3.c | 165 - src/interfaces/libpq/libpq-int.h | 21 + src/test/regress/expected/rules.out | 14 +- src/tools/msvc/Mkvcbuild.pm | 2 +- 31 files changed, 1701 insertions(+), 83 deletions(-) create mode 100644 src/common/zpq_stream.c create mode 100644 src/include/common/zpq_stream.h diff --git a/configure b/configure index b917a2a1c9..feb9420106 100755 --- a/configure +++ b/configure @@ -699,6 +699,7 @@ LD LDFLAGS_SL LDFLAGS_EX with_zlib +with_zstd with_system_tzdata with_libxslt XML2_LIBS @@ -866,6 +867,7 @@ with_libxml with_libxslt with_system_tzdata with_zlib +with_zstd with_gnu_ld enable_largefile ' @@ -8571,6 +8573,85 @@ fi +# +# ZStd +# + + + +# Check whether --with-zstd was given. +if test "${with_zstd+set}" = set; then : + withval=$with_zstd; + case $withval in +yes) + ;; +no) + : + ;; +*) + as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5 + ;; + esac + +else + with_zstd=no + +fi + + + + +if test "$with_zstd" = yes ; then + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5 +$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; } +if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_check_lib_save_LIBS=$LIBS +LIBS="-lzstd $LIBS" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +#ifdef __cplusplus +extern "C" +#endif +char ZSTD_compress (); +int +main () +{ +return ZSTD_compress (); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + ac_cv_lib_zstd_ZSTD_compress=yes +else + ac_cv_lib_zstd_ZSTD_compress=no +fi +rm -f core conftest.err conftest.$ac_objext \ +conftest$ac_exeext conftest.$ac_ext
Re: Key management with tests
On Mon, Jan 11, 2021 at 08:12:00PM +0900, Masahiko Sawada wrote: > On Sun, Jan 10, 2021 at 11:51 PM Bruce Momjian wrote: > > OK, here they are with numeric prefixes. It was actually tricky to > > figure out how to create a squashed format-patch based on another branch. > > Thank you for attaching the patches. It passes all cfbot tests, great. Yeah, I saw that. :-) I head to learn a lot about how to create squashed format-patches on non-master branches. I have now automated it so it will be easy going forward. > Looking at the patch, it supports three algorithms but only > PG_CIPHER_AES_KWP is used in the core for now: > > +/* > + * Supported symmetric encryption algorithm. These identifiers are passed > + * to pg_cipher_ctx_create() function, and then actual encryption > + * implementations need to initialize their context of the given encryption > + * algorithm. > + */ > +#define PG_CIPHER_AES_GCM 0 > +#define PG_CIPHER_AES_KW 1 > +#define PG_CIPHER_AES_KWP 2 > +#define PG_MAX_CIPHER_ID 3 > > Are we in the process of experimenting which algorithms are better? If > we support one algorithm that is actually used in the core, we would > reduce the tests as well. I think we are only using KWP (Key Wrap with Padding) because that is for wrapping keys: https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Algorithm-Validation-Program/documents/mac/KWVS.pdf I am not sure about KW. I think we are using GCM for the WAP/heap/index pages. Stephen would know more. > FWIW, I've written a PoC patch for buffer encryption to make sure the > kms patch would be workable with other components using the encryption > key managed by kmgr. Wow, it is a small patch --- nice. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Moving other hex functions to /common
On Mon, Jan 11, 2021 at 04:45:14PM +0900, Michael Paquier wrote: > On Wed, Jan 06, 2021 at 08:58:23AM -0500, Bruce Momjian wrote: > > Fine. Do you want to add the overflow to the patch I posted, for all > > encoding types? > > Yeah. It looks that it would be good to be consistent as well for > escape case, so as it is possible to add a dstlen argument to struct > pg_encoding for the encoding and decoding routines. I would also Sure. > prefer the option to remove the argument "data" from the encode and > decode length routines for the hex routines part of src/common/, even > if it forces the creation of two small wrappers in encode.c to call > the routines of src/common/. Would you prefer if I send a patch by Agreed. Having an argument that does nothing is odd. > myself? Please note that anything I'd send would use directly elog() > and pg_log() instead of returning status codes for the src/common/ > routines, and of course not touch ECPG, as that's the approach you are > favoring. Sure, I realize the elog/pg_log is odd, but the alternatives seem worse. You can take ownership of my hex patch and just add to it. I know you already did the hex length part, and have other ideas of what you want. My key management patch needs the hex encoding in /common, so it will wait for the application of your patch. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Add Nullif case for eval_const_expressions_mutator
"Hou, Zhijie" writes: > I notice that there are no Nullif case in eval_const_expression. > Since Nullif is similar to Opexpr and is easy to implement, > I try add this case in eval_const_expressions_mutator. I think this patch should be about a tenth the size. Try modeling it on the T_SubscriptingRef-etc case, ie, use ece_generic_processing and then ece_evaluate_expr to cover the generic cases. OpExpr is common enough to deserve specially optimized code, but NullIf isn't, so shorter is better. regards, tom lane
Re: popcount
On Mon, Jan 11, 2021 at 03:50:54PM +0100, Peter Eisentraut wrote: > On 2020-12-30 17:41, David Fetter wrote: > > > The input may have more than 2 billion bits set to 1. The biggest possible > > > result should be 8 billion for bytea (1 GB with all bits set to 1). > > > So shouldn't this function return an int8? > > It does now, and thanks for looking at this. > > The documentation still reflects the previous int4 return type (in two > different spellings, too). Thanks for looking this over! Please find attached the next version with corrected documentation. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From f0f8e0639a4d08db7d454d5181aa9d98d31264a3 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 30 Dec 2020 02:51:46 -0800 Subject: [PATCH v3] popcount To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.29.2" This is a multi-part message in MIME format. --2.29.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit Now it's accessible to SQL for the BIT VARYING and BYTEA types. diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml index 02a37658ad..1d86d610dd 100644 --- doc/src/sgml/func.sgml +++ doc/src/sgml/func.sgml @@ -4069,6 +4069,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); + + + + popcount + +popcount ( bytes bytea ) +bigint + + +Counts the number of bits set in a binary string. + + +popcount('\xdeadbeef'::bytea) +24 + + + @@ -4830,6 +4847,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); 101010001010101010 + + + + + popcount + +popcount ( bits bit ) +bigint + + +Counts the bits set in a bit string. + + +popcount(B'101010101010101010') +9 + + + diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat index d7b55f57ea..2d53e0d46d 100644 --- src/include/catalog/pg_proc.dat +++ src/include/catalog/pg_proc.dat @@ -1446,6 +1446,9 @@ { oid => '752', descr => 'substitute portion of string', proname => 'overlay', prorettype => 'bytea', proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' }, +{ oid => '8436', descr => 'count set bits', + proname => 'popcount', prorettype => 'int8', proargtypes => 'bytea', + prosrc => 'byteapopcount'}, { oid => '725', proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line', @@ -3865,6 +3868,9 @@ { oid => '3033', descr => 'set bit', proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4', prosrc => 'bitsetbit' }, +{ oid => '8435', descr => 'count set bits', + proname => 'popcount', prorettype => 'int8', proargtypes => 'bit', + prosrc => 'bitpopcount'}, # for macaddr type support { oid => '436', descr => 'I/O', diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c index 2235866244..a6a44f3f4f 100644 --- src/backend/utils/adt/varbit.c +++ src/backend/utils/adt/varbit.c @@ -36,6 +36,7 @@ #include "libpq/pqformat.h" #include "nodes/nodeFuncs.h" #include "nodes/supportnodes.h" +#include "port/pg_bitutils.h" #include "utils/array.h" #include "utils/builtins.h" #include "utils/varbit.h" @@ -1878,3 +1879,29 @@ bitgetbit(PG_FUNCTION_ARGS) else PG_RETURN_INT32(0); } + +/* + * bitpopcount + * + * Returns the number of bits set in a bit array. + * + */ +Datum +bitpopcount(PG_FUNCTION_ARGS) +{ + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + bits8 *p; + int len; + int8 popcount; + + /* + * ceil(VARBITLEN(ARG1)/BITS_PER_BYTE) + * done to minimize branches and instructions. + */ + len = (VARBITLEN(arg1) + BITS_PER_BYTE + 1) / BITS_PER_BYTE; + p = VARBITS(arg1); + + popcount = pg_popcount((const char *)p, len); + + PG_RETURN_INT64(popcount); +} diff --git src/backend/utils/adt/varlena.c src/backend/utils/adt/varlena.c index 4838bfb4ff..da3ba769c4 100644 --- src/backend/utils/adt/varlena.c +++ src/backend/utils/adt/varlena.c @@ -3433,6 +3433,22 @@ bytea_overlay(bytea *t1, bytea *t2, int sp, int sl) return result; } +/* + * popcount + */ +Datum +byteapopcount(PG_FUNCTION_ARGS) +{ + bytea *t1 = PG_GETARG_BYTEA_PP(0); + int len; + int8 result; + + len = VARSIZE_ANY_EXHDR(t1); + result = pg_popcount(VARDATA_ANY(t1), len); + + PG_RETURN_INT64(result); +} + /* * byteapos - * Return the position of the specified substring. diff --git src/test/regress/expected/bit.out src/test/regress/expected/bit.out index a7f95b846d..228f992ced 100644 --- src/test/regress/expected/bit.out +++ src/test/regress/expected/bit.out @@ -710,6 +710,19 @@ SELECT overlay(B'0101011100' placing '001' from 20);
Re: pg_upgrade test for binary compatibility of core data types
On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote: > On 2020-12-27 20:07, Justin Pryzby wrote: > > rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f > > I think these patches could use some in-place documentation of what they are > trying to achieve and how they do it. The required information is spread > over a lengthy thread. No one wants to read that. Add commit messages to > the patches. Oh, I see that now, and agree that you need to explain each item with a comment. pg_upgrade is doing some odd things, so documenting everything it does is a big win. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Hubert Zhang writes: > As for patch 0004, find ':' after "could not connect to" may failed when > error message like: > "could not connect to host "localhost" (::1), port 12345: Connection > refused", where p2 will point to "::1" instead of ": Connection refused". But > since it's only used for test case, we don't need to filter the error message > precisely. Excellent point, and I think that could happen on a Windows installation. We can make it look for ": " instead of just ':', and that'll reduce the odds of trouble. regards, tom lane
Re: popcount
On 2020-12-30 17:41, David Fetter wrote: The input may have more than 2 billion bits set to 1. The biggest possible result should be 8 billion for bytea (1 GB with all bits set to 1). So shouldn't this function return an int8? It does now, and thanks for looking at this. The documentation still reflects the previous int4 return type (in two different spellings, too).
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Hi Tom, I agree to get detailed error message for each failed host as your patch 0001. As for patch 0004, find ':' after "could not connect to" may failed when error message like: "could not connect to host "localhost" (::1), port 12345: Connection refused", where p2 will point to "::1" instead of ": Connection refused". But since it's only used for test case, we don't need to filter the error message precisely. ``` ecpg_filter_stderr(const char *resultfile, const char *tmpfile) { .. char *p1 = strstr(linebuf.data, "could not connect to "); if (p1) { char *p2 = strchr(p1, ':'); if (p2) memmove(p1 + 17, p2, strlen(p2) + 1); } } ``` Thanks, Hubert From: Tom Lane Sent: Monday, January 11, 2021 10:56 AM To: Hubert Zhang Cc: tsunakawa.ta...@fujitsu.com ; pgsql-hack...@postgresql.org Subject: Re: Multiple hosts in connection string failed to failover in non-hot standby mode I wrote: > I feel pretty good about 0001: it might be committable as-is. 0002 is > probably subject to bikeshedding, plus it has a problem in the ECPG tests. > Two of the error messages are now unstable because they expose > chosen-at-random socket paths: > ... > I don't have any non-hacky ideas what to do about that. The extra detail > seems useful to end users, but we don't have any infrastructure that > would allow filtering it out in the ECPG tests. So far the only solution that comes to mind is to introduce some infrastructure to do that filtering. 0001-0003 below are unchanged, 0004 patches up the ecpg test framework with a rather ad-hoc filtering function. I'd feel worse about this if there weren't already a very ad-hoc filtering function there ;-) This set passes check-world for me; we'll soon see what the cfbot thinks. regards, tom lane
Re: multi-install PostgresNode
On 2020-12-20 18:09, Andrew Dunstan wrote: On 12/19/20 11:19 AM, Andrew Dunstan wrote: This turns out to be remarkably short, with the use of a little eval magic. Give the attached, this test program works just fine: #!/bin/perl use PostgresNodePath; $ENV{PG_REGRESS} = '/home/andrew/pgl/vpath.12/src/test/regress/pg_regress'; my $node = get_new_node('/home/andrew/pgl/inst.12.5711','blurfl'); print $node->info; print $node->connstr(),"\n"; $node->init(); This time with a typo removed. What is proposed the destination of this file? Is it meant to be part of a patch? Is it meant to be installed? Is it meant for the buildfarm code?
Re: pg_upgrade test for binary compatibility of core data types
On 2020-12-27 20:07, Justin Pryzby wrote: On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote: On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote: I meant to notice if the binary format is accidentally changed again, which was what happened here: 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar. I added a table to the regression tests so it's processed by pg_upgrade tests, run like: | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin Per cfbot, this avoids testing ::xml (support for which may not be enabled) And also now tests oid types. I think the per-version hacks should be grouped by logical change, rather than by version. Which I've started doing here. rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f I think these patches could use some in-place documentation of what they are trying to achieve and how they do it. The required information is spread over a lengthy thread. No one wants to read that. Add commit messages to the patches.
Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
On 2021-01-05 10:56, Masahiko Sawada wrote: BTW according to the documentation, the options of DECLARE statement (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive. DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ] CURSOR [ { WITH | WITHOUT } HOLD ] FOR query But I realized that these options are actually order-insensitive. For instance, we can declare a cursor like: =# declare abc scroll binary cursor for select * from pg_class; DECLARE CURSOR The both parser code and documentation has been unchanged from 2003. Is it a documentation bug? According to the SQL standard, the ordering of the cursor properties is fixed. Even if the PostgreSQL parser offers more flexibility, I think we should continue to encourage writing the clauses in the standard order.
Re: libpq compression
On 09.01.2021 23:31, Justin Pryzby wrote: On Thu, Dec 17, 2020 at 05:54:28PM +0300, Konstantin Knizhnik wrote: I am maintaining this code in g...@github.com:postgrespro/libpq_compression.git repository. I will be pleased if anybody, who wants to suggest any bug fixes/improvements of libpq compression, create pull requests: it will be much easier for me to merge them. Thanks for working on this. I have a patch for zstd compression in pg_dump so I looked at your patch. I'm attaching some language fixes. Thank you very much. I applied your patch on top of pull request of Daniil Zakhlystov who has implemented support of using different compressors in different direction. Frankly speaking I still very skeptical concerning too much flexibility in compression configuration: - toggle compression on the fly - using different compression algorithms in both directions - toggle compression on the fly According to Daniil's results there is only 30% differences in compression ration between zstd:1 and zstd:19. But making it possible to specify arbitrary compression level we give user of a simple tool to attack server (cause CPU/memory exhaustion). +zstd_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, char* rx_data, size_t rx_data_size) +zlib_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, char* rx_data, size_t rx_data_size) +build_compressors_list(PGconn *conn, char** client_compressors, bool build_descriptors) Are you able to run pg_indent to fix all the places where "*" is before the space ? (And similar style issues). Also done by Daniil. There are several compression patches in the commitfest, I'm not sure how much they need to be coordinated, but for sure we should coordinate the list of compressions available at compile time. Maybe there should be a central structure for this, or maybe just the ID numbers of compressors should be a common enum/define. In my patch, I have: +struct compressLibs { + const CompressionAlgorithm alg; + const char *name; /* Name in -Z alg= */ + const char *suffix;/* file extension */ + const int defaultlevel; /* Default compression level */ +}; Maybe we'd also want to store the "magic number" of each compression library. Maybe there'd also be a common parsing of compression options. You're supporting a syntax like zlib,zstd:5, but zstd also supports long-range, checksum, and rsyncable modes (rsyncable is relevant to pg_dump, but not to libpq). There are at least three places in Postgres where compression is used right : 1. TOAST and extened attributes compression. 2. pg_basebackup compression 3. pg_dump compression And there are also patches for 4. page compression 5. protocol compression It is awful that all this five places are using compression in their own way. It seems to me that compression (as well as all other system dependent stuff as socket IO, file IO, synchronization primitives,...) should be extracted to SAL (system-abstract layer) where then can be used both by backend, frontend and utilities. Including external utilities, like pg_probackup, pg_bouncer, Odyssey, ... Unfortunately such refactoring requires so much efforts, that it can have any chance to be committed if this work will be coordinated by one of the core committers. I think your patch has an issue here. You have this: src/interfaces/libpq/fe-connect.c +pqGetc(, conn); +index = resp; +if (index == (char)-1) +{ +appendPQExpBuffer(>errorMessage, + libpq_gettext( + "server is not supported requested compression algorithms %s\n"), + conn->compression); +goto error_return; +} +Assert(!conn->zstream); +conn->zstream = zpq_create(conn->compressors[index].impl, + conn->compressors[index].level, + (zpq_tx_func)pqsecure_write, (zpq_rx_func)pqsecure_read, conn, + >inBuffer[conn->inCursor], conn->inEnd-conn->inCursor); This takes the "index" returned by the server and then accesses conn->compressors[index] without first checking if the index is out of range, so a malicious server could (at least) crash the client by returning index=666. Thank you for pointed this problem. I will add the check, although I think that problem of malicious server is less critical than malicious client. Also such "byzantine" server may return wrong data, which in many cases is more fatal than crash of a client. I suggest that there should be an enum of algorithms, which is constant across all
Re: Added schema level support for publication.
On Mon, Jan 11, 2021 at 5:28 PM Bharath Rupireddy wrote: > > On Mon, Jan 11, 2021 at 5:25 PM Amit Kapila wrote: > > > > On Mon, Jan 11, 2021 at 4:29 PM Li Japin wrote: > > > > > > > > > On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy > > > wrote: > > > > > > On Mon, Jan 11, 2021 at 1:29 PM japin wrote: > > > > > > Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we > > > do not insert data > > > between step (5) and (6), it will not ship the new records, however, if > > > we insert > > > data between step (5) and (6), it will ship the new records. > > > > > > > > .. > > > It might be a bug. > > > > > > > Can you check pg_publication_rel and pg_subscription_rel? Also, this > > is not related to the feature proposed in this thread, so it is better > > to start a new thread to conclude whether this is a bug or not. > > Thanks Amit, sure I will verify and start a new thread. I started a new thread [1] for this, please have a look. [1] - https://www.postgresql.org/message-id/CALj2ACV%2B0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
Hi, While providing thoughts on the design in [1], I found a strange behaviour with the $subject. The use case is shown below as a sequence of steps that need to be run on publisher and subscriber to arrive at the strange behaviour. In step 5, the table is dropped from the publication and in step 6, the refresh publication is run on the subscriber, from here onwards, the expectation is that no further inserts into the publisher table have to be replicated on to the subscriber, but the opposite happens i.e. the inserts are still replicated to the subscriber. ISTM as a bug. Let me know if I'm missing anything. Thoughts? step 1) on the publisher: DROP TABLE t1; DROP PUBLICATION mypub1; CREATE TABLE t1 (a int); INSERT INTO t1 VALUES (1); CREATE PUBLICATION mypub1 FOR TABLE t1; postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_publication_rel r1, pg_class r2, pg_publication r3 WHERE r1.prrelid = r2.oid AND r1.prpubid = r3.oid; oid | prpubid | prrelid | relname | oid | pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate | pubviaroot ---+-+-+-+---+-+--+--+---+---+---+-+ 16462 | 16461 | 16458 | t1 | 16461 | mypub1 | 10 | f | t | t | t | t | f (1 row) step 2) on the subscriber: DROP TABLE t1; DROP SUBSCRIPTION mysub1; CREATE TABLE t1 (a int); CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost dbname=postgres user=bharath port=5432' PUBLICATION mypub1; postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_subscription_rel r1, pg_class r2, pg_subscription r3 WHERE r1.srrelid = r2.oid AND r1.srsubid = r3.oid; srsubid | srrelid | srsubstate | srsublsn | relname | oid | subdbid | subname | subowner | subenabled | subbinary | substream | subconninfo | subslotname | subsynccommit | subpublications -+-++--+-+---+-+-+--++---+---+- --+-+---+- 16446 | 16443 | i | | t1 | 16446 | 12872 | mysub1 | 10 | t | f | f | host=localhost dbnam e=postgres user=bharath port=5432 | mysub1 | off | {mypub1} (1 row) postgres=# SELECT * FROM t1; a --- 1 (1 row) step 3) on the publisher: INSERT INTO t1 VALUES (2); step 4) on the subscriber: postgres=# SELECT * FROM t1; a --- 1 2 (2 rows) step 5) on the publisher: ALTER PUBLICATION mypub1 DROP TABLE t1; postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_publication_rel r1, pg_class r2, pg_publication r3 WHERE r1.prrelid = r2.oid AND r1.prpubid = r3.oid; oid | prpubid | prrelid | relname | oid | pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate | pubviaroot -+-+-+-+-+-+--+--+---+---+---+-+ (0 rows) INSERT INTO t1 VALUES (3); step 6) on the subscriber: postgres=# SELECT * FROM t1; a --- 1 2 3 (3 rows) ALTER SUBSCRIPTION mysub1 REFRESH PUBLICATION; postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_subscription_rel r1, pg_class r2, pg_subscription r3 WHERE r1.srrelid = r2.oid AND r1.srsubid = r3.oid; srsubid | srrelid | srsubstate | srsublsn | relname | oid | subdbid | subname | subowner | subenabled | subbinary | substream | subconninfo | subslotn ame | subsynccommit | subpublications -+-++--+-+-+-+-+--++---+---+-+- +---+- (0 rows) step 7) on the publisher: INSERT INTO t1 VALUES (4); step 8) on the subscriber: postgres=# SELECT * FROM t1; a --- 1 2 3 4 (4 rows) step 9) on the publisher: INSERT INTO t1 SELECT * FROM generate_series(5,100); step 10) on the subscriber: postgres=# SELECT count(*) FROM t1; count --- 100 (1 row) [1] - https://www.postgresql.org/message-id/CAA4eK1L5TejNHNctyPB3GVuEriRQw6xxU32iMyv%3Dh4tCJKkLew%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: adding partitioned tables to publications
Thanks for your reply. The patch is exactly what I want. My English name is Mark Zhao, which should be the current email name. Thanks, Mark Zhao --Original-- From: "Amit Kapila";
Re: [Bug Fix] Logical replication on partition table is very slow and CPU is 99%
On Mon, Jan 4, 2021 at 7:07 PM 赵锐 <875941...@qq.com> wrote: > > > This patch adds the missed decrement to resolve the problem. > > > Previous discussion is here: > https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_ejmoo...@mail.gmail.com > And I believe patch #83fd453 introduce this problem. > Thanks for the report and fix. The fix looks straight forward and correct to me. Please add the patch to the next CF so that it's not forgotten. But since this is a bug fix it need not wait for CF. Also please report the results of your experiment with the patch applied so as to know this bug's contribution to the slow down. -- Best Wishes, Ashutosh Bapat
Re: Proposal: Global Index
> 2021年1月7日 23:04,Robert Haas 写道: > > On Thu, Jan 7, 2021 at 4:44 AM 曾文旌 wrote: >> I've been following this topic for a long time. It's been a year since the >> last response. >> It was clear that our customers wanted this feature as well, and a large >> number of them mentioned it. >> >> So, I wish the whole feature to mature as soon as possible. >> I summarized the scheme mentioned in the email and completed the POC >> patch(base on PG_13). > > You need to summarize the basic design choices you've made here. Like, > what did you do about the fact that TIDs have to be 6 bytes? These are the basic choices, and most of them come from discussions in previous emails. 1 Definition of global index Obviously, we need to expand Index address info(CTID) to include child table info in GlobalIndexTuple. 1.1 As mentioned in the previous email, Bruce suggested having the OID instead of relfilenode as relfilenode can be duplicated across tablespaces. I agree with that. 1.2 And Heikki pointed me to include heap specific information using the INCLUDE keyword so that heap information is stored with each index node as data. So ,In POC stage, I choose use INCLUDE keyword to INCLUDE the tableoid of global index. This will add 4 bytes to each IndexTuple. Considering that if a single partitioned table does not exceed 65535 child tables, perhaps two bytes for tracking which child table the data belongs to is sufficient. 2. Maintenance of global index by partition table DML. The DML of each child table of the partitioned table needs to maintain the global index on the partitioned table. 3. Global index scan Planner: Processes predicate on the primary partition, generating paths and plans for the global index. The cost model of the global index may need to be considered. We need to make the global index or the local index selected in their respective advantageous scenarios. Executer: The index scan get indextup, get the tableoid from indextup, and verify the visibility of the data in the child table. If a child table is DETACH, then the index item of this table is ignored during the index scan until VACUUM finishes cleaning up the global index. 4. Vacuum partition table maintains global index. Old data in the global index also needs to be cleaned up, and vaccum is suitable for it. Each child table in VACUUM, while vacuuming its own index, also vacuums the global index on the partitioned table. 5. Other The global index indexes all of the child tables, which makes the global index large and has many levels. Follow the technical route, The partitioned indexes are a further target. This is my basic idea for implementing global index. Looking forward to your feedback. Thanks! Wenjing > > -- > Robert Haas > EDB: http://www.enterprisedb.com smime.p7s Description: S/MIME cryptographic signature
RE: [POC] Fast COPY FROM command for the table with foreign partitions
Hi Andrey, I had a general look at this extension feature, I think it's beneficial for some application scenarios of PostgreSQL. So I did 7 performance cases test on your patch(v13). The results are really good. As you can see below we can get 7-10 times improvement with this patch. PSA test_copy_from.sql shows my test cases detail(I didn't attach my data file since it's too big). Below are the test results: 'Test No' corresponds to the number(0 1...6) in attached test_copy_from.sql. %reg=(Patched-Unpatched)/Unpatched), Unit is millisecond. |Test No| Test Case |Patched(ms) | Unpatched(ms) |%reg | |---|-|-|---|---| |0 |COPY FROM insertion into the partitioned table(parition is foreign table)| 102483.223 | 1083300.907 | -91% | |1 |COPY FROM insertion into the partitioned table(parition is foreign partition)| 104779.893 | 1207320.287 | -91% | |2 |COPY FROM insertion into the foreign table(without partition) | 100268.730 | 1077309.158 | -91% | |3 |COPY FROM insertion into the partitioned table(part of foreign partitions) | 104110.620 | 1134781.855 | -91% | |4 |COPY FROM insertion into the partitioned table with constraint(part of foreign partition)| 136356.201 | 1238539.603 | -89% | |5 |COPY FROM insertion into the foreign table with constraint(without partition)| 136818.262 | 1189921.742 | -89% | |6 |\copy insertion into the partitioned table with constraint. | 140368.072 | 1242689.924 | -89% | If there is any question on my tests, please feel free to ask. Best Regard, Tang test_copy_from.sql Description: test_copy_from.sql
Re: Added schema level support for publication.
On Mon, Jan 11, 2021 at 5:25 PM Amit Kapila wrote: > > On Mon, Jan 11, 2021 at 4:29 PM Li Japin wrote: > > > > > > On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy > > wrote: > > > > On Mon, Jan 11, 2021 at 1:29 PM japin wrote: > > > > Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do > > not insert data > > between step (5) and (6), it will not ship the new records, however, if we > > insert > > data between step (5) and (6), it will ship the new records. > > > > > .. > > It might be a bug. > > > > Can you check pg_publication_rel and pg_subscription_rel? Also, this > is not related to the feature proposed in this thread, so it is better > to start a new thread to conclude whether this is a bug or not. Thanks Amit, sure I will verify and start a new thread. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Added schema level support for publication.
On Mon, Jan 11, 2021 at 4:29 PM Li Japin wrote: > > > On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy > wrote: > > On Mon, Jan 11, 2021 at 1:29 PM japin wrote: > > Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do > not insert data > between step (5) and (6), it will not ship the new records, however, if we > insert > data between step (5) and (6), it will ship the new records. > > .. > It might be a bug. > Can you check pg_publication_rel and pg_subscription_rel? Also, this is not related to the feature proposed in this thread, so it is better to start a new thread to conclude whether this is a bug or not. -- With Regards, Amit Kapila.
Re: adding partitioned tables to publications
On Wed, Dec 30, 2020 at 8:03 PM 赵锐 <875941...@qq.com> wrote: > > The first file of Amit's patch can not only re-range the code, but also fix a > hidden bug. > To make it easy to see, I attach another patch. > "RelationIdGetRelation" will increase ref on owner->relrefarr, without > "RelationClose", the owner->relrefarr will enlarge and re-hash. > When the capacity of owner->relrefarr is over than 10 million, enlarge and > re-hash takes serial hours. And what's worse, increase ref will also take > minutes, as the hash collision resolution is based on looking up an array in > order. > When we want to publish 10 billion data under one partition table, it takes > serial days up to increase ref, enlarge and re-hash, and CPU is always 99%. > After applying my patch, 10 billion will be published in 10 minutes. > It is a clear relation descriptor leak. The proposed fix seems correct to me. The patch wasn't getting applied to HEAD. So, I have prepared the separate patches for HEAD and 13. There are minor modifications in the patch like I have used RelationIsValid before closing the relation. I have not added any test because I see that there is already a test in src/test/subscription/t/013_partition. Kindly let me know your English name so that I can give you credit as a co-author? -- With Regards, Amit Kapila. v1-0001-Fix-relation-descriptor-leak.HEAD.patch Description: Binary data v1-0001-Fix-relation-descriptor-leak.13.patch Description: Binary data
Re: Proposal: Global Index
> 2021年1月8日 16:26,Julien Rouhaud 写道: > > On Fri, Jan 8, 2021 at 4:02 PM 曾文旌 wrote: >> >>> 2021年1月7日 22:16,Bruce Momjian 写道: >>> >>> On Thu, Jan 7, 2021 at 05:44:01PM +0800, 曾文旌 wrote: I've been following this topic for a long time. It's been a year since the last response. It was clear that our customers wanted this feature as well, and a large number of them mentioned it. So, I wish the whole feature to mature as soon as possible. I summarized the scheme mentioned in the email and completed the POC patch(base on PG_13). >>> >>> I think you need to address the items mentioned in this blog, and the >>> email link it mentions: >>> >>> https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020 >> >> Thank you for your reply. >> I read your blog and it helped me a lot. >> >> The blog mentions a specific problem: "A large global index might also >> reintroduce problems that prompted the creation of partitioning in the first >> place. " >> I don't quite understand, could you give some specific information? >> >> In addition you mentioned: "It is still unclear if these use-cases justify >> the architectural changes needed to enable global indexes." >> Please also describe the problems you see, I will confirm each specific >> issue one by one. > > One example is date partitioning. People frequently need to store > only the most recent data. For instance doing a monthly partitioning > and dropping the oldest partition every month once you hit the wanted > retention is very efficient for that use case, as it should be almost > instant (provided that you can acquire the necessary locks > immediately). But if you have a global index, you basically lose the > advantage of partitioning as it'll require heavy changes on that > index. If the global index removes all the major benefits of partitioned tables, then it is not worth having it. This is indeed a typical scenario for a partitioned table. there are two basic operations 1) Monthly DETACH old child table 2) Monthly ATTACH new child table For 1) The DETACH old child table can be finished immediately, global index can be kept valid after DETACH is completed, and the cleanup of garbage data in global index can be deferred to VACUUM. This is similar to the global index optimization done by Oracle12c. For 2) ATTACH new empty child table can also be completed immediately. If this is the case, many of the advantages of partitioned tables will be retained, while the advantages of global indexes will be gained. smime.p7s Description: S/MIME cryptographic signature
Re: Key management with tests
On Sun, Jan 10, 2021 at 11:51 PM Bruce Momjian wrote: > > On Sun, Jan 10, 2021 at 06:04:12PM +1300, Thomas Munro wrote: > > On Sun, Jan 10, 2021 at 3:45 PM Bruce Momjian wrote: > > > Does anyone know why the cfbot applied the patch listed second first > > > here? > > > > > > http://cfbot.cputube.org/patch_31_2925.log > > > > > > Specifically, it applied hex..key.diff.gz before hex.diff.gz. I assumed > > > it would apply attachments in the order they appear in the email. > > > > It sorts the filenames (in this case after decompressing step removes > > the .gz endings). That works pretty well for the patches that "git > > format-patch" spits out, but it's a bit hit and miss with cases like > > yours. > > OK, here they are with numeric prefixes. It was actually tricky to > figure out how to create a squashed format-patch based on another branch. > Thank you for attaching the patches. It passes all cfbot tests, great. Looking at the patch, it supports three algorithms but only PG_CIPHER_AES_KWP is used in the core for now: +/* + * Supported symmetric encryption algorithm. These identifiers are passed + * to pg_cipher_ctx_create() function, and then actual encryption + * implementations need to initialize their context of the given encryption + * algorithm. + */ +#define PG_CIPHER_AES_GCM 0 +#define PG_CIPHER_AES_KW 1 +#define PG_CIPHER_AES_KWP 2 +#define PG_MAX_CIPHER_ID 3 Are we in the process of experimenting which algorithms are better? If we support one algorithm that is actually used in the core, we would reduce the tests as well. FWIW, I've written a PoC patch for buffer encryption to make sure the kms patch would be workable with other components using the encryption key managed by kmgr. Overall it’s good. While the buffer encryption patch is still PoC quality and there are some problems regarding nonce generation we need to deal with, it easily can use the relation key managed by the kmgr to encrypt/decrypt buffers. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/ 0003-Poc-buffer-encryption.patch Description: Binary data
Re: Added schema level support for publication.
On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy mailto:bharath.rupireddyforpostg...@gmail.com>> wrote: On Mon, Jan 11, 2021 at 1:29 PM japin mailto:japi...@hotmail.com>> wrote: Say a user has created a publication for a schema with hundreds of tables in it, at some point later, can he stop replicating a single or some tables from that schema? There is no provision for this currently. The documentation [1] says, we can ALTER PUBLICATION testpub DROP TABLE t1; which removes the table from the list of published tables, but looks like it requires ALTER SUBSCRIPTION testsub REFRESH PUBLICATION; for the changes to become effective on the subscriber. I have done some testing for this case: 1) created publication for table t1, see \d+ t1, the associated publication is visible in the output 2) created subscription on the subscriber, initial available data from the publisher for table t1 is received 3) insert into table t1 on the publisher 4) inserted data in (3) is received in the subscriber table t1 5) alter publication to drop the table t1 on the publisher, see \d+ t1, there will not be any associated publication in the output 6) execute alter subscription refresh publication on the subscriber, with the expectation that it should not receive the data from the publisher for the table t1 since it's dropped from the publication in (5) 7) insert into table t1 on the publisher 8) still the newly inserted data in (7) from the publisher, will be received into the table t1 in the subscriber IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and the above use case, it looks like a bug to me. If I'm wrong, can someone correct me? Yes, if we modify the publication, we should refresh the subscription on each subscriber. It looks strange for me, especially for partitioned tables [1]. Thoughts? Can we trace the different between publication and subscription, and auto-refresh subscription on subscriber? [1] https://www.postgresql.org/message-id/flat/1D6DCFD2-0F44-4A18-BF67-17C2697B1631%40hotmail.com As Amit stated in your thread [1], DDLs like creation of the new tables or partitions, schema changes etc. on the publisher can not be replicated automatically by the logical replication framework to the subscriber. Users have to perform those DDLs on the subscribers by themselves. Yeah, DDLs is not supported now. On publisher, the partitions are added to the publication automatically. However, even if we created the partitions on subscriber, it will not sync the new partitions, because it likes normal table, we must execute ALTER SUBSCRIPTION my_test REFRESH PUBLICATION; I preferred it will automatically add to subscription when we create the new partitions if the partitions is already in publication. If your point is to at least issue the ALTER SUBSCRIPTION testsub REFRESH PUBLICATION; from the publication whenever the publication is altered i.e. added or dropped tables, IMO, we cannot do this, because running this command on the subscriber only makes sense, after user runs the same DDLs (which were run on the publisher) also on the subscriber. To illustrate this: 1) create a new table or partition on the publisher and add it to publisher, note that the same table has not yet been created on the subscriber 2) imagine the publisher issuing an auto refresh command to all the subscribers, then, no point in that right, because the new table or the partition is not yet created on all the subscribers. So, IMO, we can not have an auto refresh mechanism, until we have the feature to replicate the DDL changes to all the subscribers. Thanks for clarification. What I stated in my earlier mail [1] is that even though we drop the table from the publication in the publisher and run a refresh publication on the subscriber, still the data is being replicated from the publisher to the subscriber table. I just wanted to know whether this is the expected behaviour or what exactly means. a user running ALTER PUBLICATION mypub DROP TABLE mytable; [1] - https://www.postgresql.org/message-id/CALj2ACWAxO3vSToT0o5nXL%3Drz5cNx90zaV-at%3DcvM14Tag4%3DcQ%40mail.gmail.com Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do not insert data between step (5) and (6), it will not ship the new records, however, if we insert data between step (5) and (6), it will ship the new records. (1) created publication for table t1, t2 postgres[8765]=# CREATE TABLE t1 (a int); CREATE TABLE postgres[8765]=# CREATE TABLE t2 (a int); CREATE TABLE postgres[8765]=# INSERT INTO t1 VALUES (1); INSERT 0 1 postgres[8765]=# INSERT INTO t2 VALUES (1); INSERT 0 1 postgres[8765]=# CREATE PUBLICATION mypub1 FOR TABLE t1; CREATE PUBLICATION postgres[8765]=# CREATE PUBLICATION mypub2 FOR TABLE t2; CREATE PUBLICATION (2) created subscription on the subscriber postgres[9812]=# CREATE TABLE t1 (a int); CREATE TABLE postgres[9812]=# CREATE TABLE t2 (a int); CREATE TABLE postgres[9812]=# CREATE SUBSCRIPTION mysub1 CONNECTION
Re: Single transaction in the tablesync worker?
On Mon, Jan 11, 2021 at 3:53 PM Ajin Cherian wrote: > > On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila wrote: > > > > BTW, I have analyzed whether we need any modifications to > > > pg_dump/restore for this patch as this changes the state of one of the > > > fields in the system table and concluded that we don't need any > > > change. For subscriptions, we don't dump any of the information from > > > pg_subscription_rel, rather we just dump subscriptions with the > > > connect option as false which means users need to enable the > > > subscription and refresh publication after restore. I have checked > > > this in the code and tested it as well. The related information is > > > present in pg_dump doc page [1], see from "When dumping logical > > > replication subscriptions ". > > > > > > > I have further analyzed that we don't need to do anything w.r.t > > pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the > > schema info of the old cluster and then restore it to the new cluster. > > And, we know that pg_dump ignores the info in pg_subscription_rel, so > > we don't need to change anything as our changes are specific to the > > state of one of the columns in pg_subscription_rel. I have not tested > > this but we should test it by having some relations in not_ready state > > and then allow the old cluster (<=PG13) to be upgraded to new (pg14) > > both with and without this patch and see if there is any change in > > behavior. > > I have tested this scenario, stopped a server running PG_13 when > subscription table sync was in progress. > Thanks for the test. This confirms my analysis and we don't need any change in pg_dump or pg_upgrade for this patch. -- With Regards, Amit Kapila.
Re: Single transaction in the tablesync worker?
On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila wrote: > > BTW, I have analyzed whether we need any modifications to > > pg_dump/restore for this patch as this changes the state of one of the > > fields in the system table and concluded that we don't need any > > change. For subscriptions, we don't dump any of the information from > > pg_subscription_rel, rather we just dump subscriptions with the > > connect option as false which means users need to enable the > > subscription and refresh publication after restore. I have checked > > this in the code and tested it as well. The related information is > > present in pg_dump doc page [1], see from "When dumping logical > > replication subscriptions ". > > > > I have further analyzed that we don't need to do anything w.r.t > pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the > schema info of the old cluster and then restore it to the new cluster. > And, we know that pg_dump ignores the info in pg_subscription_rel, so > we don't need to change anything as our changes are specific to the > state of one of the columns in pg_subscription_rel. I have not tested > this but we should test it by having some relations in not_ready state > and then allow the old cluster (<=PG13) to be upgraded to new (pg14) > both with and without this patch and see if there is any change in > behavior. I have tested this scenario, stopped a server running PG_13 when subscription table sync was in progress. One of the tables in pg_subscription_rel was still in 'd' state (DATASYNC) postgres=# select * from pg_subscription_rel; srsubid | srrelid | srsubstate | srsublsn -+-++ 16424 | 16384 | d | 16424 | 16390 | r | 0/247A63D8 16424 | 16395 | r | 0/247A6410 16424 | 16387 | r | 0/247A6448 (4 rows) then initiated the pg_upgrade to PG_14 with the patch and without the patch: I see that the subscription exists but is not enabled: postgres=# select * from pg_subscription; oid | subdbid | subname | subowner | subenabled | subbinary | substream | subconninfo| subslotname | subsynccommit | subpublications ---+-+-+--++---+---+--+-+---+- 16407 | 16401 | tap_sub | 10 | f | f | f | host=localhost port=6972 dbname=postgres | tap_sub | off | {tap_pub} (1 row) and looking at the pg_subscription_rel: postgres=# select * from pg_subscription_rel; srsubid | srrelid | srsubstate | srsublsn -+-++-- (0 rows) As can be seen, none of the data in the pg_subscription_rel has been copied over. Same behaviour is seen with the patch and without the patch. regards, Ajin Cherian Fujitsu Australia
Re: [HACKERS] Custom compression methods
On Mon, Jan 11, 2021 at 12:21 PM Justin Pryzby wrote: > > On Mon, Jan 11, 2021 at 12:11:54PM +0530, Dilip Kumar wrote: > > On Mon, Jan 11, 2021 at 11:00 AM Dilip Kumar wrote: > > > On Sun, Jan 10, 2021 at 10:59 PM Justin Pryzby > > > wrote: > > > > > > > > On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote: > > > > > On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby > > > > > wrote: > > > > > > And fails pg_upgrade check, apparently losing track of the > > > > > > compression (?) > > > > > > > > > > > > CREATE TABLE public.cmdata2 ( > > > > > > -f1 text COMPRESSION lz4 > > > > > > +f1 text > > > > > > ); > > > > > > > > > > I did not get this? pg_upgrade check is passing for me. > > > > > > > > I realized that this was failing in your v16 patch sent Dec 25. > > > > It's passing on current patches because they do "DROP TABLE cmdata2", > > > > but > > > > that's only masking the error. > > > > I tested specifically pg_upgrade by removing all the DROP table and MV > > and it is passing. I don't see the reason why should it fail. I mean > > after the upgrade why COMPRESSION lz4 is missing? > > How did you test it ? > > I'm not completely clear how this is intended to work... has it been tested > before ? According to the comments, in binary upgrade mode, there's an ALTER > which is supposed to SET COMPRESSION, but that's evidently not happening. I am able to reproduce this issue, If I run pg_dump with binary_upgrade mode then I can see the issue (./pg_dump --binary-upgrade -d Postgres). Yes you are right that for fixing this there should be an ALTER..SET COMPRESSION method. > > > > I found that's the AM's OID in the old clsuter: > > > > regression=# SELECT * FROM pg_am WHERE oid=36447; > > > > oid | amname | amhandler | amtype > > > > ---++-+ > > > > 36447 | pglz2 | pglzhandler | c > > > > > > > > But in the new cluster, the OID has changed. Since that's written into > > > > table > > > > data, I think you have to ensure that the compression OIDs are > > > > preserved on > > > > upgrade: > > > > > > > > 16755 | pglz2 | pglzhandler | c > > > > > > Yeah, basically we are storing am oid in the compressed data so Oid > > > must be preserved. I will look into this and fix it. > > > > On further analysis, if we are dumping and restoring then we will > > compress the data back while inserting it so why would we need to old > > OID. I mean in the new cluster we are inserting data again so it will > > be compressed again and now it will store the new OID. Am I missing > > something here? > > I'm referring to pg_upgrade which uses pg_dump, but does *not* re-insert data, > but rather recreates catalogs only and then links to the old tables (either > with copy, link, or clone). Test with make -C src/bin/pg_upgrade (which is > included in make check-world). Got this as well. I will fix these two issues and post the updated patch by tomorrow. Thanks for your findings. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Added schema level support for publication.
On Mon, Jan 11, 2021 at 1:29 PM japin wrote: > >> > Say a user has created a publication for a schema with hundreds of > >> > tables in it, at some point later, can he stop replicating a single or > >> > some tables from that schema? > >> > > >> > >> There is no provision for this currently. > > > > The documentation [1] says, we can ALTER PUBLICATION testpub DROP > > TABLE t1; which removes the table from the list of published tables, > > but looks like it requires ALTER SUBSCRIPTION testsub REFRESH > > PUBLICATION; for the changes to become effective on the subscriber. I > > have done some testing for this case: > > 1) created publication for table t1, see \d+ t1, the associated > > publication is visible in the output > > 2) created subscription on the subscriber, initial available data from > > the publisher for table t1 is received > > 3) insert into table t1 on the publisher > > 4) inserted data in (3) is received in the subscriber table t1 > > 5) alter publication to drop the table t1 on the publisher, see \d+ > > t1, there will not be any associated publication in the output > > 6) execute alter subscription refresh publication on the subscriber, > > with the expectation that it should not receive the data from the > > publisher for the table t1 since it's dropped from the publication in > > (5) > > 7) insert into table t1 on the publisher > > 8) still the newly inserted data in (7) from the publisher, will be > > received into the table t1 in the subscriber > > > > IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and > > the above use case, it looks like a bug to me. If I'm wrong, can > > someone correct me? > > > > Yes, if we modify the publication, we should refresh the subscription on > each subscriber. It looks strange for me, especially for partitioned > tables [1]. > > > Thoughts? > > > > Can we trace the different between publication and subscription, and > auto-refresh subscription on subscriber? > > [1] > https://www.postgresql.org/message-id/flat/1D6DCFD2-0F44-4A18-BF67-17C2697B1631%40hotmail.com As Amit stated in your thread [1], DDLs like creation of the new tables or partitions, schema changes etc. on the publisher can not be replicated automatically by the logical replication framework to the subscriber. Users have to perform those DDLs on the subscribers by themselves. If your point is to at least issue the ALTER SUBSCRIPTION testsub REFRESH PUBLICATION; from the publication whenever the publication is altered i.e. added or dropped tables, IMO, we cannot do this, because running this command on the subscriber only makes sense, after user runs the same DDLs (which were run on the publisher) also on the subscriber. To illustrate this: 1) create a new table or partition on the publisher and add it to publisher, note that the same table has not yet been created on the subscriber 2) imagine the publisher issuing an auto refresh command to all the subscribers, then, no point in that right, because the new table or the partition is not yet created on all the subscribers. So, IMO, we can not have an auto refresh mechanism, until we have the feature to replicate the DDL changes to all the subscribers. What I stated in my earlier mail [1] is that even though we drop the table from the publication in the publisher and run a refresh publication on the subscriber, still the data is being replicated from the publisher to the subscriber table. I just wanted to know whether this is the expected behaviour or what exactly means. a user running ALTER PUBLICATION mypub DROP TABLE mytable; [1] - https://www.postgresql.org/message-id/CALj2ACWAxO3vSToT0o5nXL%3Drz5cNx90zaV-at%3DcvM14Tag4%3DcQ%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Improper use about DatumGetInt32
On Sat, Jan 09, 2021 at 01:41:39PM +0100, Peter Eisentraut wrote: > If we had a way to do such testing then we wouldn't need these markers. But > AFAICT, we don't. Not sure I am following your point here. Taking your patch, it is possible to trigger the version of get_raw_page() <= 1.8 just with something like the following: create extension pageinspect version "1.8"; select get_raw_page('pg_class', 0); There are no in-core regression tests that check the compatibility of extensions with older versions, but it is technically possible to do so. -- Michael signature.asc Description: PGP signature
Re: Added schema level support for publication.
On Mon, 11 Jan 2021 at 14:15, Bharath Rupireddy wrote: > On Sun, Jan 10, 2021 at 11:21 PM vignesh C wrote: >> On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy >> wrote: >> > I think this feature can be useful, in case a user has a lot of tables >> > to publish inside a schema. Having said that, I wonder if this feature >> > mandates users to create the same schema with same >> > permissions/authorizations manually on the subscriber, because logical >> > replication doesn't propagate any ddl's so are the schema or schema >> > changes? Or is it that the list of tables from the publisher can go >> > into a different schema on the subscriber? >> > >> >> DDL's will not be propagated to the subscriber. Users have to create >> the schema & tables in the subscriber. No change in >> Permissions/authorizations handling, it will be the same as the >> existing behavior for relations. > > Looks like the existing behaviour already requires users to create the > schema on the subscriber when publishing the tables from that schema. > Otherwise, an error is thrown on the subscriber [1]. > > [1] on publisher: > CREATE SCHEMA myschema; > CREATE TABLE myschema.t1(a1 int, b1 int); > INSERT INTO myschema.t1_myschema SELECT i, i+10 FROM generate_series(1,10) i; > CREATE PUBLICATION testpub FOR TABLE myschema.t1; > > on subscriber: > postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost > dbname=postgres user=bharath port=5432' PUBLICATION testpub; > ERROR: schema "myschema" does not exist > CREATE SCHEMA myschema; > CREATE TABLE myschema.t1(a1 int, b1 int); > postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost > dbname=postgres user=bharath port=5432' PUBLICATION testpub; > NOTICE: created replication slot "testsub" on publisher > CREATE SUBSCRIPTION > >> > Since the schema can have other objects such as data types, functions, >> > operators, I'm sure with your feature, non-table objects will be >> > skipped. >> > >> >> Yes, only table data will be sent to subscribers, non-table objects >> will be skipped. > > Looks like the existing CREATE PUBLICATION FOR ALL TABLES, which is > for all the tables in the database, does this i.e. skips non-table > objects and temporary tables, foreign tables and so on. So, your > feature also can behave the same way, but within the scope of the > given schema/s. > >> > As Amit pointed out earlier, the behaviour when schema dropped, I >> > think we should also consider when schema is altered, say altered to a >> > different name, maybe we should change that in the publication too. >> > >> >> I agree that when schema is altered the renamed schema should be >> reflected in the publication. > > I think, it's not only making sure that the publisher side has the new > altered schema, but also the subscriber needs those alters. Having > said that, since these alters come under DDL changes and in logical > replication we don't publish the scheme changes to the subscriber, we > may not need to anything extra for informing the schema alters to the > subscriber from the publisher, the users might have to do the same > schema alter on the subscriber and then a ALTER SUBSCRIPTION testsub > REFRESH PUBLICATION; should work for them? If this understanding is > correct, then we should document this. > >> > In general, what happens if we have some temporary tables or foreign >> > tables inside the schema, will they be allowed to send the data to >> > subscribers? >> > >> >> Temporary tables & foreign tables will not be added to the publications. > > Yes the existing logical replication framework doesn't allow > replication of temporary, unlogged, foreign tables and other non-table > relations such as materialized views, indexes etc [1]. The CREATE > PUBLICATION statement either fails in check_publication_add_relation > or before that. > > CREATE PUBLICATION testpub FOR TABLE tab1, throwing the error if the > single table tab1 is any of the above restricted tables, seems fine. > But, if there's a list of tables with CREATE PUBLICATION testpub FOR > TABLE normal_tab1, temp_tab2, normal_tab3, foreign_tab4, > unlogged_tab5, normal_tab6, normal_tab7 ..; This query fails on > first encounter of the restricted table, say at temp_tab2. Whereas, > CREATE PUBLICATION testpub FOR ALL TABLES; would skip the restricted > tables and continue to add the accepted tables into the publication > within the database. > > IMHO, if there's a list of tables specified with FOR TABLE, then > instead of throwing an error in case of any restricted table, we can > issue a warning and continue with the addition of accepted tables into > the publication. If done, this behaviour will be in sync with FOR ALL > TABLES; > > Thoughts? If okay, I can work on a patch. > > Related to error messages: when foreign table is specified in CREATE > PUBLICATION statement, then "ERROR: "f1" is not a table", is thrown > [1], how about the error message "ERROR: foerign table "f1" cannot be > replicated". In general, it