Re: Dynamic gathering the values for seq_page_cost/xxx_cost
On Sat, Sep 26, 2020 at 8:17 AM Andy Fan wrote: > > As for the testing with cache considered, I found how to estimate cache hit > ratio is hard or how to control a hit ratio to test is hard. Recently I am > thinking > a method that we can get a page_reads, shared_buffer_hit from pg_kernel > and the real io (without the file system cache hit) at os level (just as what > iotop/pidstat do). then we can know the shared_buffer hit ratio and file > system > cache hit ratio (assume it will be stable after a long run). and then do a > testing. > However this would be another branch of manual work and I still have not got > it done until now. FWIW pg_stat_kcache [1] extension accumulates per (database, user, queryid) physical reads and writes, so you can easily compute a shared_buffers / IO cache / disk hit ratio. [1] https://github.com/powa-team/pg_stat_kcache
Re: Parallel INSERT (INTO ... SELECT ...)
On Sat, Sep 26, 2020 at 11:00 AM Bharath Rupireddy wrote: > > On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow wrote: > > > > On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila > > wrote: > > > > > > > Again, there's a fundamental difference in the Parallel Insert case. > > Right at the top of ExecutePlan it calls EnterParallelMode(). > > For ParallelCopy(), there is no such problem. EnterParallelMode() is > > only called just before ParallelCopyMain() is called. So it can easily > > acquire the xid before this, because then parallel mode is not set. > > > > As it turns out, I think I have solved the commandId issue (and almost > > the xid issue) by realising that both the xid and cid are ALREADY > > being included as part of the serialized transaction state in the > > Parallel DSM. So actually I don't believe that there is any need for > > separately passing them in the DSM, and having to use those > > AssignForWorker() functions in the worker code - not even in the > > Parallel Copy case (? - need to check). > > > > Thanks Gred for the detailed points. > > I further checked on full txn id and command id. Yes, these are > getting passed to workers via InitializeParallelDSM() -> > SerializeTransactionState(). I tried to summarize what we need to do > in case of parallel inserts in general i.e. parallel COPY, parallel > inserts in INSERT INTO and parallel inserts in CTAS. > > In the leader: > GetCurrentFullTransactionId() > GetCurrentCommandId(true) > EnterParallelMode(); > InitializeParallelDSM() --> calls SerializeTransactionState() > (both full txn id and command id are serialized into parallel DSM) > This won't be true for Parallel Insert patch as explained by Greg as well because we enter-parallel-mode much before we assign xid. -- With Regards, Amit Kapila.
Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers
On Tue, Sep 22, 2020 at 12:50 PM David Rowley wrote: > > On Mon, 21 Sep 2020 at 19:15, Peter Eisentraut > wrote: > > > > On 2020-09-21 05:48, Amit Kapila wrote: > > > What according to you should be the behavior here and how will it be > > > better than current? > > > > I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers > > (up to the number of indexes), even if max_parallel_maintenance_workers > > is 2. > > It would be good if we were consistent with these parallel options. > Right now max_parallel_workers_per_gather will restrict the > parallel_workers reloption. I'd say this > max_parallel_workers_per_gather is similar to > max_parallel_maintenance_workers here and the PARALLEL vacuum option > is like the parallel_workers reloption. > > If we want VACUUM's parallel option to work the same way as that then > max_parallel_maintenance_workers should restrict whatever is mentioned > in VACUUM PARALLEL. > > Or perhaps this is slightly different as the user is explicitly asking > for this in the command, but you could likely say the same about ALTER > TABLE SET (parallel_workers = N); too. > This is exactly my feeling too. But how about changing documentation a bit as proposed above [1] to make it precise. [1] - https://www.postgresql.org/message-id/CAA4eK1LQWXS_4RwLo%2BWT7jusGnBkUvXO73xQOCsydWLYBpLBEg%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow wrote: > > On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila wrote: > > > > Again, there's a fundamental difference in the Parallel Insert case. > Right at the top of ExecutePlan it calls EnterParallelMode(). > For ParallelCopy(), there is no such problem. EnterParallelMode() is > only called just before ParallelCopyMain() is called. So it can easily > acquire the xid before this, because then parallel mode is not set. > > As it turns out, I think I have solved the commandId issue (and almost > the xid issue) by realising that both the xid and cid are ALREADY > being included as part of the serialized transaction state in the > Parallel DSM. So actually I don't believe that there is any need for > separately passing them in the DSM, and having to use those > AssignForWorker() functions in the worker code - not even in the > Parallel Copy case (? - need to check). > Thanks Gred for the detailed points. I further checked on full txn id and command id. Yes, these are getting passed to workers via InitializeParallelDSM() -> SerializeTransactionState(). I tried to summarize what we need to do in case of parallel inserts in general i.e. parallel COPY, parallel inserts in INSERT INTO and parallel inserts in CTAS. In the leader: GetCurrentFullTransactionId() GetCurrentCommandId(true) EnterParallelMode(); InitializeParallelDSM() --> calls SerializeTransactionState() (both full txn id and command id are serialized into parallel DSM) In the workers: ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both full txn id and command id are restored into workers' CurrentTransactionState->fullTransactionId and currentCommandId) If the parallel workers are meant for insertions, then we need to set currentCommandIdUsed = true; Maybe we can lift the assert in GetCurrentCommandId(), if we don't want to touch that function, then we can have a new function GetCurrentCommandidInWorker() whose functionality will be same as GetCurrentCommandId() without the Assert(!IsParallelWorker());. Am I missing something? If the above points are true, we might have to update the parallel copy patch set, test the use cases and post separately in the parallel copy thread in coming days. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Sep 25, 2020 at 2:31 PM Bharath Rupireddy wrote: > > On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow wrote: > > > > For cases where it can't be allowed (e.g. INSERT into a table with > > foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE > > ...") it at least allows parallelism of the SELECT part. > > > > Thanks Greg for the patch. > > 2. What happens if the target table has triggers(before statement, > after statement, before row, after row) that are parallel unsafe? > In such a case, the parallel insert shouldn't be selected. However, we should still be able to execute the Select part in parallel. > 3. Will each worker be doing single row insertions or multi inserts? > If single row insertions, will the buffer lock contentions be more? > I don't think the purpose of this patch is to change the basic flow of how Insert works and also I am not sure if it is worth the effort as well. I have answered this earlier in a bit more detailed way [1]. [1] - https://www.postgresql.org/message-id/CAA4eK1Ks8Sqs29VHPS6koNj5E9YQdkGCzgGsSrQMeUbQfe28yg%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Sep 25, 2020 at 9:23 PM Greg Nancarrow wrote: > > On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila > As it turns out, I think I have solved the commandId issue (and almost > the xid issue) by realising that both the xid and cid are ALREADY > being included as part of the serialized transaction state in the > Parallel DSM. So actually I don't believe that there is any need for > separately passing them in the DSM, and having to use those > AssignForWorker() functions in the worker code - not even in the > Parallel Copy case (? - need to check). GetCurrentCommandId(true) and > GetFullTransactionId() need to be called prior to Parallel DSM > initialization, so they are included in the serialized transaction > state. > I just needed to add a function to set currentCommandIdUsed=true in > the worker initialization (for INSERT case) and make a small tweak to > the Assert in GetCurrentCommandId() to ensure that > currentCommandIdUsed, in a parallel worker, never gets set to true > when it is false. This is in line with the comment in that function, > because we know that "currentCommandId was already true at the start > of the parallel operation". With this in place, I don't need to change > any of the original calls to GetCurrentCommandId(), so this addresses > that issue raised by Andres. > > I am not sure yet how to get past the issue of the parallel mode being > set at the top of ExecutePlan(). With that in place, it doesn't allow > a xid to be acquired for the leader, without removing/changing that > parallel-mode check in GetNewTransactionId(). > I think now there is no fundamental problem in allocating xid in the leader and then sharing it with workers who can use it to perform the insert. So we can probably tweak that check so that it is true for only parallel workers. -- With Regards, Amit Kapila.
Re: a potential size overflow issue
David Zhang writes: > "InitBufTable" is the function used to initialize the buffer lookup > table for buffer manager. With the memory size increasing nowadays, > there is a potential overflow issue for the parameter "int size" used by > "InitBufTable". This function is invoked in freelist.c as below: > InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); > The number of buffer block “NBuffers” is also defined as "int", and > "NUM_BUFFER_PARTITIONS" has a default value 128. In theory, it may get > the chance to overflow the "size" parameter in "InitBufTable". No, because guc.c limits NBuffers to INT_MAX/2. > Therefore, it would be better to change "InitBufTable(int size)" to > "InitBufTable(long size)". If I was worried about this, that wouldn't be much of a fix, since on many platforms "long" is not any wider than "int". (We really oughta try to move away from relying on "long", because its size is so poorly standardized.) regards, tom lane
Re: Optimize memory allocation code
On Fri, Sep 25, 2020 at 7:32 PM Li Japin wrote: > > > > > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud wrote: > > > > Hi, > > > > On Sat, Sep 26, 2020 at 12:14 AM Li Japin wrote: > >> > >> Hi, hackers! > >> > >> I find the palloc0() is similar to the palloc(), we can use palloc() > >> inside palloc0() > >> to allocate space, thereby I think we can reduce duplication of code. > > > > The code is duplicated on purpose. There's a comment at the beginning > > that mentions it: > > > > /* duplicates MemoryContextAllocZero to avoid increased overhead */ > > > > Same for MemoryContextAllocZero() itself. > > Thanks! How big is this overhead? Is there any way I can test it? Profiler. For example, oprofile. In hot areas of the code (memory allocation is very hot), profiling is the first step. merlin
Re: Optimize memory allocation code
> On Sep 26, 2020, at 8:09 AM, Julien Rouhaud wrote: > > Hi, > > On Sat, Sep 26, 2020 at 12:14 AM Li Japin wrote: >> >> Hi, hackers! >> >> I find the palloc0() is similar to the palloc(), we can use palloc() inside >> palloc0() >> to allocate space, thereby I think we can reduce duplication of code. > > The code is duplicated on purpose. There's a comment at the beginning > that mentions it: > > /* duplicates MemoryContextAllocZero to avoid increased overhead */ > > Same for MemoryContextAllocZero() itself. Thanks! How big is this overhead? Is there any way I can test it? Best regards! -- Japin Li
Re: Dynamic gathering the values for seq_page_cost/xxx_cost
On Fri, Sep 25, 2020 at 5:15 PM Ashutosh Bapat wrote: > On Tue, Sep 22, 2020 at 10:57 AM Andy Fan > wrote: > > > > > > My tools set the random_page_cost to 8.6, but based on the fio data, it > should be > > set to 12.3 on the same hardware. and I do see the better plan as well > with 12.3. > > Looks too smooth to believe it is true.. > > > > The attached result_fio_mytool.tar.gz is my test result. You can use > git show HEAD^^ > > is the original plan with 8.6. git show HEAD^ show the plan changes > after we changed > > the random_page_cost. git show HEAD shows the run time statistics > changes for these queries. > > I also uploaded the test tool [1] for this for your double check. > > The scripts seem to start and stop the server, drop caches for every > query. That's where you are seeing that setting random_page_cost to > fio based ratio provides better plans. But in practice, these costs > need to be set on a server where the queries are run concurrently and > repeatedly. That's where the caching behaviour plays an important role. Can we write a tool which can recommend costs for that scenario? I totally agree with this. Actually the first thing I did is to define a proper IO workload. At the very beginning, I used DIRECT_IO for both seq read and random read on my SSD, and then found the result is pretty bad per testing (random_page_cost = ~1.6). then I realized postgresql relies on the prefetch which is disabled by DIRECT_IO. After I fixed this, I tested again with the above scenario (cache hit ratio = 0) to verify my IO model. Per testing, it looks good. I am also thinking if the random_page_cost = 1.1 doesn't provide a good result on my SSD because it ignores the prefects of seq read. After I am OK with my IO model, I test with the way you see above. but I also detect the latency for file system cache hit, which is handled by get_fs_cache_latency_us in my code (I ignored the shared buffer hits for now). and allows user to provides a cache_hit_ratio, the final random_page_cost = (real_random_lat) / real_seq_lat, where real_xxx_lat = cache_hit_ratio * fs_cache_lat + (1 - cache_hit_ratio) * xxx_lat. See function cal_real_lat and cal_random_page_cost. As for the testing with cache considered, I found how to estimate cache hit ratio is hard or how to control a hit ratio to test is hard. Recently I am thinking a method that we can get a page_reads, shared_buffer_hit from pg_kernel and the real io (without the file system cache hit) at os level (just as what iotop/pidstat do). then we can know the shared_buffer hit ratio and file system cache hit ratio (assume it will be stable after a long run). and then do a testing. However this would be another branch of manual work and I still have not got it done until now. I'd not like to share too many details, but "lucky" many cases I have haven't file system cache, that makes things a bit easier. What I am doing right now is to calculate the random_page_cost with the above algorithm with only shared_buffer considered. and test the real benefits with real workload to see how it works. If it works well, I think the only thing left is to handle file system cache. The testing is time consuming since I have to cooperate with many site engineers, so any improvement on the design will be much helpful. > How do the fio based cost perform when the queries are run repeatedly? > > That probably is not good since I have 280G+ file system cache and I have to prepare much more than 280G data size for testing. -- Best Regards Andy Fan
Re: Optimize memory allocation code
Hi, On Sat, Sep 26, 2020 at 12:14 AM Li Japin wrote: > > Hi, hackers! > > I find the palloc0() is similar to the palloc(), we can use palloc() inside > palloc0() > to allocate space, thereby I think we can reduce duplication of code. The code is duplicated on purpose. There's a comment at the beginning that mentions it: /* duplicates MemoryContextAllocZero to avoid increased overhead */ Same for MemoryContextAllocZero() itself.
a potential size overflow issue
Hi hackers, "InitBufTable" is the function used to initialize the buffer lookup table for buffer manager. With the memory size increasing nowadays, there is a potential overflow issue for the parameter "int size" used by "InitBufTable". This function is invoked in freelist.c as below: InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); The number of buffer block “NBuffers” is also defined as "int", and "NUM_BUFFER_PARTITIONS" has a default value 128. In theory, it may get the chance to overflow the "size" parameter in "InitBufTable". The "size" parameter is later used by "ShmemInitHash" as "init_size" and "max_size", which are all defined as "long". SharedBufHash = ShmemInitHash("Shared Buffer Lookup Table", size, size, , HASH_ELEM | HASH_BLOBS | HASH_PARTITION); Therefore, it would be better to change "InitBufTable(int size)" to "InitBufTable(long size)". A simple patch is attached and it passed the “make installcheck-world” test. -- David Software Engineer Highgo Software Inc. (Canada) www.highgo.ca From 7df445121d3cf2aa7c0c22c831a8a920bc818d6e Mon Sep 17 00:00:00 2001 From: David Zhang Date: Fri, 25 Sep 2020 15:39:24 -0700 Subject: [PATCH] fix a potential overflow issue for InitBufTable --- src/backend/storage/buffer/buf_table.c | 2 +- src/include/storage/buf_internals.h| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/storage/buffer/buf_table.c b/src/backend/storage/buffer/buf_table.c index 4953ae9f82..d4f58c1ce0 100644 --- a/src/backend/storage/buffer/buf_table.c +++ b/src/backend/storage/buffer/buf_table.c @@ -49,7 +49,7 @@ BufTableShmemSize(int size) * size is the desired hash table size (possibly more than NBuffers) */ void -InitBufTable(int size) +InitBufTable(long size) { HASHCTL info; diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h index 3377fa5676..25e1db6854 100644 --- a/src/include/storage/buf_internals.h +++ b/src/include/storage/buf_internals.h @@ -320,7 +320,7 @@ extern bool have_free_buffer(void); /* buf_table.c */ extern Size BufTableShmemSize(int size); -extern void InitBufTable(int size); +extern void InitBufTable(long size); extern uint32 BufTableHashCode(BufferTag *tagPtr); extern int BufTableLookup(BufferTag *tagPtr, uint32 hashcode); extern int BufTableInsert(BufferTag *tagPtr, uint32 hashcode, int buf_id); -- 2.21.1 (Apple Git-122.3)
What does pg_stat_get_xact_function_self_time count exactly?
Hi, Is the self-time of a tracked function (as reported by pg_stat_get_xact_function_self_time) its total time minus - the time of other tracked functions it calls? - the time of other tracked or untracked functions it calls? - the time of other tracked/untracked functions or SPI functions it calls? - something else? Regards, -Chap
Re: Libpq support to connect to standby server as priority
Greg Nancarrow writes: > [ v19-0001-Enhance-libpq-target_session_attrs-and-add-target_se.patch ] I started to look through this, and I find that I'm really pretty disappointed in the direction the patch has gone of late. I think there is no defensible reason for the choices that have been made to have different behavior for v14-and-up servers than for older servers. It's not necessary and it complicates life for users. We can use pg_is_in_recovery() on every server version that has hot standby at all, so there is no reason not to treat the GUC_REPORT indicator as an optimization that lets us skip a separate inquiry transaction, rather than something we have to have to make the feature work correctly. So I think what we ought to have is the existing read-write vs read-only distinction, implemented as it is now by checking "SHOW transaction_read_only" if the server fails to send that as a GUC_REPORT value; and orthogonally to that, a primary/standby distinction implemented by checking pg_is_in_recovery(), again with a fast path if we got a ParameterStatus report. I do not like the addition of target_server_type. It seems unnecessary and confusing, particularly because you've had to make a completely arbitrary decision about how it interacts with target_session_attrs when both are specified. I think the justification that "it's more like JDBC" is risible. Any user of this will be writing C not Java. A couple of other thoughts: * Could we call the GUC "in_hot_standby" rather than "in_recovery"? I think "recovery" is a poorly chosen legacy term that we ought to avoid exposing to users more than we already have. We're stuck with pg_is_in_recovery() I suppose, but let's not double down on bad decisions. * I don't think you really need a hard-wired test on v14-or-not in the libpq code. The internal state about read-only and hot-standby ought to be "yes", "no", or "unknown", starting in the latter state. Receipt of ParameterStatus changes it from "unknown" to one of the other two states. If we need to know the value, and it's still "unknown", then we send a probe query. We still need hard-coded version checks to know if the probe query is safe, but those version breaks are far enough back to be pretty well set in stone. (In the back of my mind here is that people might well choose to back-port the GUC_REPORT marking of transaction_read_only, and maybe even the other GUC if they were feeling ambitious. So not having a hard-coded version assumption where we don't particularly need it seems a good thing.) * This can't be right can it? Too many commas. @@ -1618,7 +1619,7 @@ static struct config_bool ConfigureNamesBool[] = {"transaction_read_only", PGC_USERSET, CLIENT_CONN_STATEMENT, gettext_noop("Sets the current transaction's read-only status."), NULL, -GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE +GUC_REPORT, GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE }, , false, (The compiler will fail to bitch about that unfortunately, since there are more struct fields that we leave uninitialized normally.) BTW, I think it would be worth splitting this into separate server-side and libpq patches. It looked to me like the server side is pretty nearly committable, modulo bikeshedding about the new GUC name. We could get that out of the way and then have a much smaller libpq patch to argue about. regards, tom lane
Re: pg_upgrade: fail early if a tablespace dir already exists for new cluster version
Added at https://commitfest.postgresql.org/30/2739/ >From 831246aa6d6837b2b0da7c96ad2f44ffd6cd3951 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 24 Sep 2020 19:49:40 -0500 Subject: [PATCH v2] pg_upgrade --check to avoid tablespace failure mode --- src/bin/pg_upgrade/check.c | 37 + 1 file changed, 37 insertions(+) diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index 2f7aa632c5..154e3d249e 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -13,6 +13,7 @@ #include "fe_utils/string_utils.h" #include "mb/pg_wchar.h" #include "pg_upgrade.h" +#include "common/relpath.h" static void check_new_cluster_is_empty(void); static void check_databases_are_compatible(void); @@ -27,6 +28,7 @@ static void check_for_tables_with_oids(ClusterInfo *cluster); static void check_for_reg_data_type_usage(ClusterInfo *cluster); static void check_for_jsonb_9_4_usage(ClusterInfo *cluster); static void check_for_pg_role_prefix(ClusterInfo *cluster); +static void check_for_existing_tablespace_dirs(ClusterInfo *new_cluster); static char *get_canonical_locale_name(int category, const char *locale); @@ -187,6 +189,8 @@ check_new_cluster(void) check_is_install_user(_cluster); check_for_prepared_transactions(_cluster); + + check_for_existing_tablespace_dirs(_cluster); } @@ -542,6 +546,39 @@ create_script_for_cluster_analyze(char **analyze_script_file_name) } +/* + * Check that tablespace dirs do not already exist for new cluster version. + * If they did, it'd cause an error while restoring global objects. + * This allows failing earlier rather than only after dumping schema. + * + * Note, v8.4 has no tablespace_suffix, which is fine so long as the version we + * being upgraded *to* has a suffix. + */ +static void +check_for_existing_tablespace_dirs(ClusterInfo *new_cluster) +{ + char old_tablespace_dir[MAXPGPATH]; + + prep_status("Checking for pre-existing tablespace directories"); + + for (int tblnum = 0; tblnum < os_info.num_old_tablespaces; tblnum++) + { + struct stat statbuf; + snprintf(old_tablespace_dir, MAXPGPATH, "%s%s", +os_info.old_tablespaces[tblnum], +new_cluster->tablespace_suffix); + + canonicalize_path(old_tablespace_dir); + + // XXX: lstat ? + if (stat(old_tablespace_dir, ) == 0 || errno != ENOENT) + pg_fatal("tablespace directory already exists for new cluster version: \"%s\"\n", + old_tablespace_dir); + } + + check_ok(); +} + /* * create_script_for_old_cluster_deletion() * -- 2.17.0
Re: history file on replica and double switchover
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed "make installcheck-world" test was performed on branch "REL_13_STABLE" with tag "REL_13_0". The regression failed was not caused by the "history file" patch, since it has the same error on my environment even without any patch. So the failure is not related, in other words, the patch "history_replica_v4.patch" is good for me. Below is the failed message when tested with and without "history_replica_v4.patch". t/004_timeline_switch.pl . ok t/005_replay_delay.pl ok t/006_logical_decoding.pl # Looks like your test exited with 29 before it could output anything. t/006_logical_decoding.pl Dubious, test returned 29 (wstat 7424, 0x1d00) Failed 14/14 subtests t/007_sync_rep.pl ok t/008_fsm_truncation.pl .. ok t/009_twophase.pl ok t/010_logical_decoding_timelines.pl .. # Looks like your test exited with 29 before it could output anything. t/010_logical_decoding_timelines.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00) Failed 13/13 subtests t/011_crash_recovery.pl .. ok t/012_subtransactions.pl . ok t/013_crash_restart.pl ... ok t/014_unlogged_reinit.pl . ok t/015_promotion_pages.pl . ok t/016_min_consistency.pl . ok t/017_shm.pl . ok t/018_wal_optimize.pl ok t/019_replslot_limit.pl .. ok t/020_archive_status.pl .. ok Test Summary Report --- t/006_logical_decoding.pl (Wstat: 7424 Tests: 0 Failed: 0) Non-zero exit status: 29 Parse errors: Bad plan. You planned 14 tests but ran 0. t/010_logical_decoding_timelines.pl (Wstat: 7424 Tests: 0 Failed: 0) Non-zero exit status: 29 Parse errors: Bad plan. You planned 13 tests but ran 0. Files=20, Tests=202, 103 wallclock secs ( 0.18 usr 0.04 sys + 21.20 cusr 23.52 csys = 44.94 CPU) Result: FAIL make[2]: *** [installcheck] Error 1 make[2]: Leaving directory `/home/david/git/postgres/src/test/recovery' make[1]: *** [installcheck-recovery-recurse] Error 2 make[1]: Leaving directory `/home/david/git/postgres/src/test' make: *** [installcheck-world-src/test-recurse] Error 2
Re: extension patch of CREATE OR REPLACE TRIGGER
"osumi.takami...@fujitsu.com" writes: > [ CREATE_OR_REPLACE_TRIGGER_v11.patch ] I took a quick look through this. I think there's still work left to do. * I'm concerned by the fact that there doesn't seem to be any defense against somebody replacing a foreign-key trigger with something that does something else entirely, and thereby silently breaking their foreign key constraint. I think it might be a good idea to forbid replacing triggers for which tgisinternal is true; but I've not done the legwork to see if that's exactly the condition we want. * In the same vein, I'm not sure that the right things happen when fooling with triggers attached to partitioned tables. We presumably don't want to allow mucking directly with a child trigger. Perhaps refusing an update when tgisinternal might fix this too (although we'll have to be careful to make the error message not too confusing). * I don't think that you've fully thought through the implications of replacing a trigger for a table that the current transaction has already modified. Is it really sufficient, or even useful, to do this: +/* + * If this trigger has pending events, throw an error. + */ +if (trigger_deferrable && AfterTriggerPendingOnRel(RelationGetRelid(rel))) As an example, if we change a BEFORE trigger to an AFTER trigger, that's not going to affect the fact that we *already* fired that trigger. Maybe this is okay and we just need to document it, but I'm not convinced. * BTW, I don't think a trigger necessarily has to be deferrable in order to have pending AFTER events. The existing use of AfterTriggerPendingOnRel certainly doesn't assume that. But really, I think we probably ought to be applying CheckTableNotInUse which'd include that test. (Another way in which there's fuzzy thinking here is that AfterTriggerPendingOnRel isn't specific to *this* trigger.) * A lesser point is that I think you're overcomplicating the code by applying heap_modify_tuple. You might as well just build the new tuple normally in all cases, and then apply either CatalogTupleInsert or CatalogTupleUpdate. * Also, the search for an existing trigger tuple is being done the hard way. You're using an index on (tgrelid, tgname), so you could include the name in the index key and expect that there's at most one match. regards, tom lane
Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2
FIPS only specifies which algorithms are approved for use on it. For instance, MD-5 is NOT approved at all under FIPS. I would say any algorithm should produce the same result regardless of where it is run. BTW, on Redhat servers, the first algorithm listed for use with SSH is MD-5. This causes the sshd daemon to abort when FIPS is enabled and that config file has not been edited. So, you can no longer connect with an SSH client as the daemon isn’t running. Ask me how I know this. Sent from my iPad > On Sep 25, 2020, at 3:39 PM, Bruce Momjian wrote: > > On Fri, Sep 25, 2020 at 03:38:22PM -0400, John Scalia wrote: >> Bruce, >> >> In my experience, any client is permitted to connect to FIPS140-2 compliant >> server. I set this up when I worked at SSA, at management’s request. > > My question is whether the hash output would match if using different > code. > > -- > Bruce Momjian https://momjian.us > EnterpriseDB https://enterprisedb.com > > The usefulness of a cup is in its emptiness, Bruce Lee >
Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2
On Fri, Sep 25, 2020 at 03:38:22PM -0400, John Scalia wrote: > Bruce, > > In my experience, any client is permitted to connect to FIPS140-2 compliant > server. I set this up when I worked at SSA, at management’s request. My question is whether the hash output would match if using different code. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2
Bruce, In my experience, any client is permitted to connect to FIPS140-2 compliant server. I set this up when I worked at SSA, at management’s request. — Jay Sent from my iPad > On Sep 25, 2020, at 3:13 PM, Bruce Momjian wrote: > > On Fri, Sep 25, 2020 at 03:56:53PM +0900, Michael Paquier wrote: >>> On Fri, Sep 25, 2020 at 01:36:44AM -0400, Tom Lane wrote: >>> Peter Eisentraut writes: However, again, the SCRAM implementation would already appear to fail that requirement because it uses a custom HMAC implementation, and HMAC is listed in FIPS 140-2 as a covered algorithm. >>> >>> Ugh. But is there any available FIPS-approved library code that could be >>> used instead? >> >> That's a good point, and I think that this falls down to use OpenSSL's >> HMAC_* interface for this job when building with OpenSSL: >> https://www.openssl.org/docs/man1.1.1/man3/HMAC.html >> >> Worth noting that these have been deprecated in 3.0.0 as per the >> rather-recent commit dbde472, where they recommend the use of >> EVP_MAC_*() instead. > > Would a FIPS server only be able to talk to a FIPS client, or would our > internal code produce the same output? > > -- > Bruce Momjian https://momjian.us > EnterpriseDB https://enterprisedb.com > > The usefulness of a cup is in its emptiness, Bruce Lee > > >
Re: gs_group_1 crashing on 13beta2/s390x
Hi, On 2020-09-25 14:11:46 -0400, Tom Lane wrote: > Christoph Berg writes: > > Ok, but given that Debian is currently targeting 22 architectures, I doubt > > the PostgreSQL buildfarm covers all of them with the extra JIT option, so I > > should probably make sure to do that here when running tests. > > +1. I rather doubt our farm is running this type of test on anything > but x86_64. There's quite a few arm animals and at least one mips animal that do some minimal coverage of JITing (i.e. the queries that are actually somewhat expensive). I pinged two owners asking whether one of the arm animals could be changed to force JITing. Greetings, Andres Freund
Re: gs_group_1 crashing on 13beta2/s390x
Hi, On 2020-09-25 19:05:52 +0200, Christoph Berg wrote: > Am 25. September 2020 18:42:04 MESZ schrieb Andres Freund > >> * jit is not exercised enough by "make installcheck" > > > >So far we've exercised more widely it by setting up machines that use > >it > >for all queries (by setting the config option). I'm doubtful it's worth > >doing differently. > > Ok, but given that Debian is currently targeting 22 architectures, I > doubt the PostgreSQL buildfarm covers all of them with the extra JIT > option, so I should probably make sure to do that here when running > tests. Forcing to JIT a lot of queries that are otherwise really fast unfortunately has a significant time cost. Doing that on slow architectures might be prohibitively slow. Kinda wonder if we shouldn't just restrict JIT to a few architectures that we have a bit more regular access to (x86, arm, maybe also ppc?). It's not like anybody would run large analytics queries on mips. Greetings, Andres Freund
Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2
On Fri, Sep 25, 2020 at 03:56:53PM +0900, Michael Paquier wrote: > On Fri, Sep 25, 2020 at 01:36:44AM -0400, Tom Lane wrote: > > Peter Eisentraut writes: > >> However, again, the SCRAM > >> implementation would already appear to fail that requirement because it > >> uses a custom HMAC implementation, and HMAC is listed in FIPS 140-2 as a > >> covered algorithm. > > > > Ugh. But is there any available FIPS-approved library code that could be > > used instead? > > That's a good point, and I think that this falls down to use OpenSSL's > HMAC_* interface for this job when building with OpenSSL: > https://www.openssl.org/docs/man1.1.1/man3/HMAC.html > > Worth noting that these have been deprecated in 3.0.0 as per the > rather-recent commit dbde472, where they recommend the use of > EVP_MAC_*() instead. Would a FIPS server only be able to talk to a FIPS client, or would our internal code produce the same output? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: history file on replica and double switchover
On 2020/09/26 2:58, Grigory Smolkin wrote: Fujii Masao, David Zhang, Anastasia Lubennikova, many thanks to you for efforts with this patch! Can I mark it as ready for committer? Ok, but I attached the updated version of the patch. It's helpful if you review that. In the latest patch, I changed walreceiver so that it creates .done file for the streamed timeline history file when archive_mode is NOT "always". Walreceiver does the same thing for the streamed WAL files to prevent them from being archived later. Without this, the streamed WAL files can exist in pg_wal without any archive status files, and then they will be archived later accidentally because of lack of archive status. OTOH, timeline history files will not be archived later even without archive status files. So there is no strong reason to make walreceiver create .doen file for the timeline history files. But at least for me it's strange to keep the file in pg_wal without archive status. So for now I'm just inclined to create .done files Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index beb309e668..42f01c515f 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1395,7 +1395,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)' If archive_mode is set to on, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but - will not archive any WAL it did not generate itself. To get a complete + will not archive any WAL or timeline history files that + it did not generate itself. To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby. This is inherently true with file-based log shipping, as the standby can only restore files that diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 17f1a49f87..bb1d44ccb7 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -758,6 +758,15 @@ WalRcvFetchTimeLineHistoryFiles(TimeLineID first, TimeLineID last) */ writeTimeLineHistoryFile(tli, content, len); + /* +* Mark the streamed history file as ready for archiving +* if archive_mode is always. +*/ + if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS) + XLogArchiveForceDone(fname); + else + XLogArchiveNotify(fname); + pfree(fname); pfree(content); }
Re: Asynchronous Append on postgres_fdw nodes.
Your AsyncAppend doesn't switch to another source if the data in current leader is available: /* * The request for the next node cannot be sent before the leader * responds. Finish the current leader if possible. */ if (PQisBusy(leader_state->s.conn)) { int rc = WaitLatchOrSocket(NULL, WL_SOCKET_READABLE | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, PQsocket(leader_state->s.conn), 0, WAIT_EVENT_ASYNC_WAIT); if (!(rc & WL_SOCKET_READABLE)) available = false; } /* fetch the leader's data and enqueue it for the next request */ if (available) { fetch_received_data(leader); add_async_waiter(leader); } I don't understand, why it is needed. If we have fdw connections with different latency, then we will read data from the fast connection first. I think this may be a source of skew and decrease efficiency of asynchronous append. For example, see below synthetic query: CREATE TABLE l (a integer) PARTITION BY LIST (a); CREATE FOREIGN TABLE f1 PARTITION OF l FOR VALUES IN (1) SERVER lb OPTIONS (table_name 'l1'); CREATE FOREIGN TABLE f2 PARTITION OF l FOR VALUES IN (2) SERVER lb OPTIONS (table_name 'l2'); INSERT INTO l (a) SELECT 2 FROM generate_series(1,200) as gs; INSERT INTO l (a) SELECT 1 FROM generate_series(1,1000) as gs; EXPLAIN ANALYZE (SELECT * FROM f1) UNION ALL (SELECT * FROM f2) LIMIT 400; Result: Limit (cost=100.00..122.21 rows=400 width=4) (actual time=0.483..1.183 rows=400 loops=1) -> Append (cost=100.00..424.75 rows=5850 width=4) (actual time=0.482..1.149 rows=400 loops=1) -> Foreign Scan on f1 (cost=100.00..197.75 rows=2925 width=4) (actual time=0.481..1.115 rows=400 loops=1) -> Foreign Scan on f2 (cost=100.00..197.75 rows=2925 width=4) (never executed) As you can see, executor scans one input and doesn't tried to scan another. -- regards, Andrey Lepikhov Postgres Professional
Re: Problem of ko.po on branch REL9_5_STABLE
Hi, On 2020-Sep-25, Yang, Rong wrote: > When I checked the encoding of the Po files, I noticed that > src/bin/pg_config/po/ko.po on REL9_5_STABLE branch seemed a little different. > The ‘Content-Type’ of this file and file’s encoding are different from > those under other modules, as follows: > > src/bin/pg_config/po/ko.po: > "Content-Type: text/plain; charset=euc-kr\n" > src/bin/initdb/po/ko.po: > "Content-Type: text/plain; charset=UTF-8\n" Yeah, each file declares in its header the encoding it's in, and of course the entire file must be in that encoding. In the case of pg_config's ko.po it was changed from euc-kr to UTF-8 sometime in October 2016 (between 9.5 and 9.6). It is not supposed to cause any problems. If it does, please let do us know. If the Korean translation interests you, perhaps we could persuade you to update it for the 13 branch, https://babel.postgresql.org/ Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: gs_group_1 crashing on 13beta2/s390x
Christoph Berg writes: > Ok, but given that Debian is currently targeting 22 architectures, I doubt > the PostgreSQL buildfarm covers all of them with the extra JIT option, so I > should probably make sure to do that here when running tests. +1. I rather doubt our farm is running this type of test on anything but x86_64. Of course, we can't actually *fix* any LLVM bugs, but it'd be nice to know whether they're there. regards, tom lane
Re: gs_group_1 crashing on 13beta2/s390x
Em sex., 25 de set. de 2020 às 14:36, Ranier Vilela escreveu: > Em sex., 25 de set. de 2020 às 11:30, Christoph Berg > escreveu: > >> Re: Tom Lane >> > > Tom> It's hardly surprising that datumCopy would segfault when given >> a >> > > Tom> null "value" and told it is pass-by-reference. However, to get >> to >> > > Tom> the datumCopy call, we must have passed the >> MemoryContextContains >> > > Tom> check on that very same pointer value, and that would surely >> have >> > > Tom> segfaulted as well, one would think. >> > >> > > Nope, because MemoryContextContains just returns "false" if passed a >> > > NULL pointer. >> > >> > Ah, right. So you could imagine getting here if the finalfn had >> returned >> > PointerGetDatum(NULL) with isnull = false. We have some aggregate >> > transfns that are capable of doing that for internal-type transvalues, >> > I think, but the finalfn never should do it. >> >> So I had another stab at this. As expected, the 13.0 upload to >> Debian/unstable crashed again on the buildd, while a manual >> everything-should-be-the-same build succeeded. I don't know why I >> didn't try this before, but this time I took this manual build and >> started a PG instance from it. Pasting the gs_group_1 queries made it >> segfault instantly. >> >> So here we are: >> >> #0 datumCopy (value=0, typLen=-1, typByVal=false) at >> ./build/../src/backend/utils/adt/datum.c:142 >> #1 0x02aa3bf6322e in datumCopy (value=, >> typByVal=, typLen=) >> at ./build/../src/backend/utils/adt/datum.c:178 >> #2 0x02aa3bda6dd6 in finalize_aggregate >> (aggstate=aggstate@entry=0x2aa3defbfd0, >> peragg=peragg@entry=0x2aa3e0671f0, >> pergroupstate=pergroupstate@entry=0x2aa3e026b78, >> resultVal=resultVal@entry=0x2aa3e067108, resultIsNull=0x2aa3e06712a) >> at ./build/../src/backend/executor/nodeAgg.c:1153 >> >> (gdb) p *resultVal >> $3 = 0 >> (gdb) p *resultIsNull >> $6 = false >> >> (gdb) p *peragg >> $7 = {aggref = 0x2aa3deef218, transno = 2, finalfn_oid = 0, finalfn = >> {fn_addr = 0x0, fn_oid = 0, fn_nargs = 0, fn_strict = false, >> fn_retset = false, fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = >> 0x0, fn_expr = 0x0}, numFinalArgs = 1, aggdirectargs = 0x0, >> resulttypeLen = -1, resulttypeByVal = false, shareable = true} >> >> Since finalfn_oid is 0, resultVal/resultIsNull were set by the `else` >> branch of the if (OidIsValid) in finalize_aggregate(): >> >> else >> { >> /* Don't need MakeExpandedObjectReadOnly; datumCopy will copy it >> */ >> *resultVal = pergroupstate->transValue; >> *resultIsNull = pergroupstate->transValueIsNull; >> } >> >> (gdb) p *pergroupstate >> $12 = {transValue = 0, transValueIsNull = false, noTransValue = false} >> > Here transValueIsNull shouldn't be "true"? > thus, DatumCopy would be protected, for this test: "!*resultIsNull" > Observe this excerpt (line 1129): /* don't call a strict function with NULL inputs */ *resultVal = (Datum) 0; *resultIsNull = true; Now, it does not contradict this principle. If all the values are null, they should be filled with True (1), and not 0 (false)? Line (4711), function ExecReScanAgg: MemSet(econtext->ecxt_aggvalues, 0, sizeof(Datum) * node->numaggs); MemSet(econtext->ecxt_aggnulls, true, sizeof(bool) * node->numaggs); zero, here, mean False, aggvalues is Null? Not. regards, Ranier Vilela
Re: history file on replica and double switchover
Fujii Masao, David Zhang, Anastasia Lubennikova, many thanks to you for efforts with this patch! Can I mark it as ready for committer? On 9/25/20 10:07 AM, Fujii Masao wrote: On 2020/09/25 13:00, David Zhang wrote: On 2020-09-24 5:00 p.m., Fujii Masao wrote: On 2020/09/25 8:15, David Zhang wrote: Hi, My understanding is that the "archiver" won't even start if "archive_mode = on" has been set on a "replica". Therefore, either (XLogArchiveMode == ARCHIVE_MODE_ALWAYS) or (XLogArchiveMode != ARCHIVE_MODE_OFF) will produce the same result. Yes, the archiver isn't invoked in the standby when archive_mode=on. But, with the original patch, walreceiver creates .ready archive status file even when archive_mode=on and no archiver is running. This causes the history file to be archived after the standby is promoted and the archiver is invoked. With my patch, walreceiver creates .ready archive status for the history file only when archive_mode=always, like it does for the streamed WAL files. This is the difference between those two patches. To prevent the streamed timeline history file from being archived, IMO we should use the condition archive_mode==always in the walreceiver. +1 Please see how the "archiver" is started in src/backend/postmaster/postmaster.c 5227 /* 5228 * Start the archiver if we're responsible for (re-)archiving received 5229 * files. 5230 */ 5231 Assert(PgArchPID == 0); 5232 if (XLogArchivingAlways()) 5233 PgArchPID = pgarch_start(); I did run the nice script "double_switchover.sh" using either "always" or "on" on patch v1 and v2. They all generate the same results below. In other words, whether history file (0003.history) is archived or not depends on "archive_mode" settings. echo "archive_mode = always" >> ${PGDATA_NODE2}/postgresql.auto.conf $ ls -l archive -rw--- 1 david david 16777216 Sep 24 14:40 00010002 ... ... -rw--- 1 david david 16777216 Sep 24 14:40 0002000A -rw--- 1 david david 41 Sep 24 14:40 0002.history -rw--- 1 david david 83 Sep 24 14:40 0003.history echo "archive_mode = on" >> ${PGDATA_NODE2}/postgresql.auto.conf $ ls -l archive -rw--- 1 david david 16777216 Sep 24 14:47 00010002 ... ... -rw--- 1 david david 16777216 Sep 24 14:47 0002000A -rw--- 1 david david 41 Sep 24 14:47 0002.history Personally, I prefer patch v2 since it allies to the statement here, https://www.postgresql.org/docs/13/warm-standby.html#CONTINUOUS-ARCHIVING-IN-STANDBY "If archive_mode is set to on, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL it did not generate itself." By the way, I think the last part of the sentence should be changed to something like below: "but will not archive any WAL, which was not generated by itself." I'm not sure if this change is an improvement or not. But if we apply the patch I proposed, maybe we should mention also history file here. For example, "but will not archive any WAL or timeline history files that it did not generate itself". make sense for me So I included this change in the patch. Patch attached. Regards, -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: gs_group_1 crashing on 13beta2/s390x
Em sex., 25 de set. de 2020 às 11:30, Christoph Berg escreveu: > Re: Tom Lane > > > Tom> It's hardly surprising that datumCopy would segfault when given a > > > Tom> null "value" and told it is pass-by-reference. However, to get to > > > Tom> the datumCopy call, we must have passed the MemoryContextContains > > > Tom> check on that very same pointer value, and that would surely have > > > Tom> segfaulted as well, one would think. > > > > > Nope, because MemoryContextContains just returns "false" if passed a > > > NULL pointer. > > > > Ah, right. So you could imagine getting here if the finalfn had returned > > PointerGetDatum(NULL) with isnull = false. We have some aggregate > > transfns that are capable of doing that for internal-type transvalues, > > I think, but the finalfn never should do it. > > So I had another stab at this. As expected, the 13.0 upload to > Debian/unstable crashed again on the buildd, while a manual > everything-should-be-the-same build succeeded. I don't know why I > didn't try this before, but this time I took this manual build and > started a PG instance from it. Pasting the gs_group_1 queries made it > segfault instantly. > > So here we are: > > #0 datumCopy (value=0, typLen=-1, typByVal=false) at > ./build/../src/backend/utils/adt/datum.c:142 > #1 0x02aa3bf6322e in datumCopy (value=, > typByVal=, typLen=) > at ./build/../src/backend/utils/adt/datum.c:178 > #2 0x02aa3bda6dd6 in finalize_aggregate > (aggstate=aggstate@entry=0x2aa3defbfd0, > peragg=peragg@entry=0x2aa3e0671f0, > pergroupstate=pergroupstate@entry=0x2aa3e026b78, > resultVal=resultVal@entry=0x2aa3e067108, resultIsNull=0x2aa3e06712a) > at ./build/../src/backend/executor/nodeAgg.c:1153 > > (gdb) p *resultVal > $3 = 0 > (gdb) p *resultIsNull > $6 = false > > (gdb) p *peragg > $7 = {aggref = 0x2aa3deef218, transno = 2, finalfn_oid = 0, finalfn = > {fn_addr = 0x0, fn_oid = 0, fn_nargs = 0, fn_strict = false, > fn_retset = false, fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, > fn_expr = 0x0}, numFinalArgs = 1, aggdirectargs = 0x0, > resulttypeLen = -1, resulttypeByVal = false, shareable = true} > > Since finalfn_oid is 0, resultVal/resultIsNull were set by the `else` > branch of the if (OidIsValid) in finalize_aggregate(): > > else > { > /* Don't need MakeExpandedObjectReadOnly; datumCopy will copy it */ > *resultVal = pergroupstate->transValue; > *resultIsNull = pergroupstate->transValueIsNull; > } > > (gdb) p *pergroupstate > $12 = {transValue = 0, transValueIsNull = false, noTransValue = false} > Here transValueIsNull shouldn't be "true"? thus, DatumCopy would be protected, for this test: "!*resultIsNull" regards, Ranier Vilela
Re: New statistics for tuning WAL buffer size
On 2020/09/25 12:06, Masahiro Ikeda wrote: On 2020-09-18 11:11, Kyotaro Horiguchi wrote: At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda wrote in Thanks. I confirmed that it causes HOT pruning or killing of dead index tuple if DecodeCommit() is called. As you said, DecodeCommit() may access the system table. ... The wals are generated only when logical replication is performed. So, I added pgstat_send_wal() in XLogSendLogical(). But, I concerned that it causes poor performance since pgstat_send_wal() is called per wal record, I think that's too frequent. If we want to send any stats to the collector, it is usually done at commit time using pgstat_report_stat(), and the function avoids sending stats too frequently. For logrep-worker, apply_handle_commit() is calling it. It seems to be the place if we want to send the wal stats. Or it may be better to call pgstat_send_wal() via pgstat_report_stat(), like pg_stat_slru(). Thanks for your comments. Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it, the frequency to send statistics is not so high. On second thought, it's strange to include this change in pg_stat_wal patch. Because pgstat_report_stat() sends various stats and that change would affect not only pg_stat_wal but also other stats views. That is, if we really want to make some processes call pgstat_report_stat() newly, which should be implemented as a separate patch. But I'm not sure how useful this change is because probably the stats are almost negligibly small in those processes. This thought seems valid for pgstat_send_wal(). I changed the thought and am inclined to be ok not to call pgstat_send_wal() in some background processes that are very unlikely to generate WAL. For example, logical-rep launcher, logical-rep walsender, and autovacuum launcher. Thought? Currently logrep-laucher, logrep-worker and autovac-launcher (and some other processes?) don't seem (AFAICS) sending scan stats at all but according to the discussion here, we should let such processes send stats. I added pgstat_report_stat() to logrep-laucher and autovac-launcher. As you said, logrep-worker already calls apply_handle_commit() and pgstat_report_stat(). Right. The checkpointer doesn't seem to call pgstat_report_stat() currently, but since there is a possibility to send wal statistics, I added pgstat_report_stat(). IMO it's better to call pgstat_send_wal() in the checkpointer, instead, because of the above reason. Thanks for updating the patch! I'd like to share my review comments. +for details. Like the description for pg_stat_bgwriter, tag should be used instead of . + + Number of WAL writes when the are full + I prefer the following description. Thought? "Number of times WAL data was written to the disk because wal_buffers got full" +the pg_stat_archiver view ,or wal A comma should be just after "view" (not just before "or"). +/* + * WAL global statistics counter. + * This counter is incremented by both each backend and background. + * And then, sent to the stat collector process. + */ +PgStat_MsgWal WalStats; What about merging the comments for BgWriterStats and WalStats into one because they are almost the same? For example, --- /* * BgWriter and WAL global statistics counters. * Stored directly in a stats message structure so they can be sent * without needing to copy things around. We assume these init to zeroes. */ PgStat_MsgBgWriter BgWriterStats; PgStat_MsgWal WalStats; --- BTW, originally there was the comment "(unused in other processes)" for BgWriterStats. But it seems not true, so I removed it from the above example. + rc = fwrite(, sizeof(walStats), 1, fpout); + (void) rc; /* we'll check for error with ferror */ Since the patch changes the pgstat file format, PGSTAT_FILE_FORMAT_ID should also be changed? -* Clear out global and archiver statistics so they start from zero in +* Clear out global, archiver and wal statistics so they start from zero in This is not the issue of this patch, but isn't it better to mention also SLRU stats here? That is, what about "Clear out global, archiver, WAL and SLRU statistics so they start from zero in"? I found "wal statistics" and "wal stats" in some comments in the patch, but isn't it better to use "WAL statistics" and "WAL stats", instead, if there is no special reason to use lowercase? + /* +* Read wal stats struct +*/ + if (fread(, 1, sizeof(walStats), fpin) != sizeof(walStats)) In pgstat_read_db_statsfile_timestamp(), the local variable myWalStats should be declared and be used to store the WAL stats read via fread(), instead. +{ oid => '1136', descr => 'statistics: number of WAL writes when the wal buffers are full', If we change the description of wal_buffers_full column in the document
Re: "cert" + clientcert=verify-ca in pg_hba.conf?
On Thu, Sep 24, 2020 at 09:59:50PM -0400, Tom Lane wrote: > Kyotaro Horiguchi writes: > > Thank you Bruce, Michael. This is a rebased version. > > I really strongly object to all the encoded data in this patch. > One cannot read it, one cannot even easily figure out how long > it is until the tests break by virtue of the certificates expiring. > > One can, however, be entirely certain that they *will* break at > some point. I don't like the idea of time bombs in our test suite. > That being the case, it'd likely be better to drop all the pre-made > certificates and have the test scripts create them on the fly. > That'd remove both the documentation problem (i.e., having readable > info as to how the certificates were made) and the expiration problem. I am not planning to apply the test parts of this patch. I think having the committer test it is sufficient. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: gs_group_1 crashing on 13beta2/s390x
Am 25. September 2020 18:42:04 MESZ schrieb Andres Freund >> * jit is not exercised enough by "make installcheck" > >So far we've exercised more widely it by setting up machines that use >it >for all queries (by setting the config option). I'm doubtful it's worth >doing differently. Ok, but given that Debian is currently targeting 22 architectures, I doubt the PostgreSQL buildfarm covers all of them with the extra JIT option, so I should probably make sure to do that here when running tests.
Re: [HACKERS] [PATCH] Generic type subscripting
ne 20. 9. 2020 v 17:46 odesílatel Dmitry Dolgov <9erthali...@gmail.com> napsal: > > On Fri, Sep 18, 2020 at 07:23:11PM +0200, Pavel Stehule wrote: > > > > > In this way (returning an error on a negative indices bigger than the > > > number of elements) functionality for assigning via subscripting will > be > > > already significantly differ from the original one via jsonb_set. Which > > > in turn could cause a new wave of something similar to "why assigning > an > > > SQL NULL as a value returns NULL instead of jsonb?". Taking into > account > > > that this is not absolutely new interface, but rather a convenient > > > shortcut for the existing one it probably makes sense to try to find a > > > balance between both consistency with regular array and similarity with > > > already existing jsonb modification functions. > > > > > > Having said that, my impression is that this balance should be not > fully > > > shifted towards consistensy with the regular array type, as jsonb array > > > and regular array are fundamentally different in terms of > > > implementation. If any differences are of concern, they should be > > > addressed at different level. At the same time I've already sort of > gave > > > up on this patch in the form I wanted to see it anyway, so anything > goes > > > if it helps bring it to the finish point. In case if there would be no > > > more arguments from other involved sides, I can post the next version > > > with your suggestion included. > > > > > > > This is a relatively new interface and at this moment we can decide if it > > will be consistent or not. I have not a problem if I have different > > functions with different behaviors, but I don't like one interface with > > slightly different behaviors for different types. I understand your > > argument about implementing a lighter interface to some existing API. > But I > > think so more important should be consistency in maximall possible rate > > (where it has sense). > > > > For me "jsonb" can be a very fundamental type in PLpgSQL development - it > > can bring a lot of dynamic to this environment (it can work perfectly > like > > PL/SQL collection or like Perl dictionary), but for this purpose the > > behaviour should be well consistent without surprising elements. > > And here we are, the rebased version with the following changes: > > insert into test_jsonb_subscript values (1, '[]'); > update test_jsonb_subscript set test_json[5] = 1; > select * from test_jsonb_subscript; > id | test_json > +--- > 1 | [null, null, null, null, null, 1] > (1 row) > > update test_jsonb_subscript set test_json[-8] = 1; > ERROR: path element at position 1 is out of range: -8 > > Thanks for the suggestions! > Thank you for accepting my suggestions. I checked this set of patches and it looks well. I have only one minor comment. I understand the error message, but I am not sure if without deeper knowledge I can understand. +update test_jsonb_subscript set test_json[-8] = 1; +ERROR: path element at position 1 is out of range: -8 Maybe 'value of subscript "-8" is out of range'. Current error message is fully correct - but people probably have to think "what is a path element at position 1?" It doesn't look intuitive. Do you have some idea? My comment is minor, and I mark this patch with pleasure as ready for committer. patching and compiling - without problems implemented functionality - I like it Building doc - without problems make check-world - passed Regards Pavel
Re: gs_group_1 crashing on 13beta2/s390x
Hi, On 2020-09-25 17:29:07 +0200, Christoph Berg wrote: > I guess that suggests two things: > * jit is not ready for prime time on s390x and I should disable it I don't know how good LLVMs support for s390x JITing is, and given that it's unrealistic for people to get access to s390x... > * jit is not exercised enough by "make installcheck" So far we've exercised more widely it by setting up machines that use it for all queries (by setting the config option). I'm doubtful it's worth doing differently. Greetings, Andres Freund
Re: Load TIME fields - proposed performance improvement
Peter Smith writes: > The patch has been re-implemented based on previous advice. > Please see attached. Hm, so: 1. The caching behavior really needs to account for the possibility of the timezone setting being changed intra-transaction. That's not very likely, perhaps, but we can't deliver a wrong answer. It seems best to make the dependency on session_timezone explicit instead of allowing it to be hidden inside timestamp2tm. 2. I'd had in mind just one cache, not several. Admittedly it doesn't gain that much for different code paths to share the result, but it seems silly not to, especially given the relative complexity of getting the caching right. That led me to refactor the patch as attached. (I'd first thought that we could just leave the j2date calculation in GetSQLCurrentDate out of the cache, but performance measurements showed that it is worthwhile to cache that. An advantage of the way it's done here is we probably won't have to redo j2date even across transactions.) > (represents 19% improvement for this worst case table/data) I'm getting a hair under 30% improvement on this admittedly artificial test case, for either your patch or mine. > Note: I patched the function GetCurrentTimeUsec consistently with the > others, but actually I was not able to discover any SQL syntax which > could cause that function to be invoked multiple times. Perhaps the > patch for that function should be removed? The existing callers for that are concerned with interpreting "now" in datetime input strings. It's certainly possible to have to do that more than once per transaction --- an example would be COPY IN where a lot of rows contain "now" as the value of a datetime column. regards, tom lane diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c index eaaffa7137..057051fa85 100644 --- a/src/backend/utils/adt/date.c +++ b/src/backend/utils/adt/date.c @@ -299,20 +299,31 @@ EncodeSpecialDate(DateADT dt, char *str) DateADT GetSQLCurrentDate(void) { - TimestampTz ts; - struct pg_tm tt, - *tm = - fsec_t fsec; - int tz; + struct pg_tm tm; - ts = GetCurrentTransactionStartTimestamp(); + static int cache_year = 0; + static int cache_mon = 0; + static int cache_mday = 0; + static DateADT cache_date; - if (timestamp2tm(ts, , tm, , NULL, NULL) != 0) - ereport(ERROR, -(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), - errmsg("timestamp out of range"))); + GetCurrentDateTime(); - return date2j(tm->tm_year, tm->tm_mon, tm->tm_mday) - POSTGRES_EPOCH_JDATE; + /* + * date2j involves several integer divisions; moreover, unless our session + * lives across local midnight, we don't really have to do it more than + * once. So it seems worth having a separate cache here. + */ + if (tm.tm_year != cache_year || + tm.tm_mon != cache_mon || + tm.tm_mday != cache_mday) + { + cache_date = date2j(tm.tm_year, tm.tm_mon, tm.tm_mday) - POSTGRES_EPOCH_JDATE; + cache_year = tm.tm_year; + cache_mon = tm.tm_mon; + cache_mday = tm.tm_mday; + } + + return cache_date; } /* @@ -322,18 +333,12 @@ TimeTzADT * GetSQLCurrentTime(int32 typmod) { TimeTzADT *result; - TimestampTz ts; struct pg_tm tt, *tm = fsec_t fsec; int tz; - ts = GetCurrentTransactionStartTimestamp(); - - if (timestamp2tm(ts, , tm, , NULL, NULL) != 0) - ereport(ERROR, -(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), - errmsg("timestamp out of range"))); + GetCurrentTimeUsec(tm, , ); result = (TimeTzADT *) palloc(sizeof(TimeTzADT)); tm2timetz(tm, fsec, tz, result); @@ -348,18 +353,12 @@ TimeADT GetSQLLocalTime(int32 typmod) { TimeADT result; - TimestampTz ts; struct pg_tm tt, *tm = fsec_t fsec; int tz; - ts = GetCurrentTransactionStartTimestamp(); - - if (timestamp2tm(ts, , tm, , NULL, NULL) != 0) - ereport(ERROR, -(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), - errmsg("timestamp out of range"))); + GetCurrentTimeUsec(tm, , ); tm2time(tm, fsec, ); AdjustTimeForTypmod(, typmod); diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index dec2fad82a..bc682584f0 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -339,35 +339,78 @@ j2day(int date) /* * GetCurrentDateTime() * - * Get the transaction start time ("now()") broken down as a struct pg_tm. + * Get the transaction start time ("now()") broken down as a struct pg_tm, + * according to the session timezone setting. */ void GetCurrentDateTime(struct pg_tm *tm) { - int tz; fsec_t fsec; - timestamp2tm(GetCurrentTransactionStartTimestamp(), , tm, , - NULL, NULL); - /* Note: don't pass NULL tzp to timestamp2tm; affects behavior */ + /* This is just a convenience wrapper for GetCurrentTimeUsec */ + GetCurrentTimeUsec(tm, , NULL); } /* * GetCurrentTimeUsec() * * Get the transaction start time ("now()") broken down as a struct pg_tm, - * including fractional seconds and timezone
Optimize memory allocation code
Hi, hackers! I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0() to allocate space, thereby I think we can reduce duplication of code. Best regards! -- Japin Li 0001-Optimize-memory-allocation-code.patch Description: 0001-Optimize-memory-allocation-code.patch
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Sep 25, 2020 at 10:17 PM Amit Kapila wrote: > But we can tighten the condition in GetCurrentCommandId() such that it > Asserts for parallel worker only when currentCommandIdUsed is not set > before start of parallel operation. I also find these changes in the > callers of GetCurrentCommandId() quite adhoc and ugly even if they are > correct. Also, why we don't face a similar problems for parallel copy? > For Parallel Insert, as part of query plan execution, GetCurrentCommandId(true) is being called as part of INSERT statement execution. Parallel Copy of course doesn't have to deal with this. That's why there's a difference. And also, it has its own parallel entry point (ParallelCopyMain), so it's in full control, it's not trying to fit in with the infrastructure for plan execution. > So are you facing problems in this area because we EnterParallelMode > before even assigning the xid in the leader? Because I don't think we > should ever reach this code in the worker. If so, there are two > possibilities that come to my mind (a) assign xid in leader before > entering parallel mode or (b) change the check so that we don't assign > the new xid in workers. In this case, I am again wondering how does > parallel copy dealing this? > Again, there's a fundamental difference in the Parallel Insert case. Right at the top of ExecutePlan it calls EnterParallelMode(). For ParallelCopy(), there is no such problem. EnterParallelMode() is only called just before ParallelCopyMain() is called. So it can easily acquire the xid before this, because then parallel mode is not set. As it turns out, I think I have solved the commandId issue (and almost the xid issue) by realising that both the xid and cid are ALREADY being included as part of the serialized transaction state in the Parallel DSM. So actually I don't believe that there is any need for separately passing them in the DSM, and having to use those AssignForWorker() functions in the worker code - not even in the Parallel Copy case (? - need to check). GetCurrentCommandId(true) and GetFullTransactionId() need to be called prior to Parallel DSM initialization, so they are included in the serialized transaction state. I just needed to add a function to set currentCommandIdUsed=true in the worker initialization (for INSERT case) and make a small tweak to the Assert in GetCurrentCommandId() to ensure that currentCommandIdUsed, in a parallel worker, never gets set to true when it is false. This is in line with the comment in that function, because we know that "currentCommandId was already true at the start of the parallel operation". With this in place, I don't need to change any of the original calls to GetCurrentCommandId(), so this addresses that issue raised by Andres. I am not sure yet how to get past the issue of the parallel mode being set at the top of ExecutePlan(). With that in place, it doesn't allow a xid to be acquired for the leader, without removing/changing that parallel-mode check in GetNewTransactionId(). Regards, Greg Nancarrow Fujitsu Australia
Re: gs_group_1 crashing on 13beta2/s390x
Re: To Tom Lane > I poked around with the SET in the offending tests, and the crash is > only present if `set jit_above_cost = 0;` is present. Removing that > makes it pass. Removing work_mem or enable_hashagg does not make a > difference. llvm version is 10.0.1. I put jit_above_cost=0 into postgresql.conf and ran "make installcheck" again. There are more crashes: >From src/test/regress/sql/interval.sql: 2020-09-25 17:00:25.130 CEST [8135] LOG: Serverprozess (PID 8369) wurde von Signal 11 beendet: Speicherzugriffsfehler 2020-09-25 17:00:25.130 CEST [8135] DETAIL: Der fehlgeschlagene Prozess führte aus: select avg(f1) from interval_tbl; >From src/test/regress/sql/tid.sql: 2020-09-25 17:01:20.593 CEST [8135] LOG: Serverprozess (PID 9015) wurde von Signal 11 beendet: Speicherzugriffsfehler 2020-09-25 17:01:20.593 CEST [8135] DETAIL: Der fehlgeschlagene Prozess führte aus: SELECT max(ctid) FROM tid_tab; >From src/test/regress/sql/collate*.sql: 2020-09-25 17:07:17.852 CEST [8135] LOG: Serverprozess (PID 12232) wurde von Signal 11 beendet: Speicherzugriffsfehler 2020-09-25 17:07:17.852 CEST [8135] DETAIL: Der fehlgeschlagene Prozess führte aus: SELECT min(b), max(b) FROM collate_test1; >From src/test/regress/sql/plpgsql.sql: 2020-09-25 17:11:56.495 CEST [8135] LOG: Serverprozess (PID 15562) wurde von Signal 11 beendet: Speicherzugriffsfehler 2020-09-25 17:11:56.495 CEST [8135] DETAIL: Der fehlgeschlagene Prozess führte aus: select * from returnqueryf(); Contrary to the others this one is not related to aggregates: -- test RETURN QUERY with dropped columns create table tabwithcols(a int, b int, c int, d int); insert into tabwithcols values(10,20,30,40),(50,60,70,80); create or replace function returnqueryf() returns setof tabwithcols as $$ begin return query select * from tabwithcols; return query execute 'select * from tabwithcols'; end; $$ language plpgsql; select * from returnqueryf(); alter table tabwithcols drop column b; select * from returnqueryf(); alter table tabwithcols drop column d; select * from returnqueryf(); alter table tabwithcols add column d int; select * from returnqueryf(); drop function returnqueryf(); drop table tabwithcols; src/test/regress/sql/rangefuncs.sql: 2020-09-25 17:16:04.209 CEST [8135] LOG: Serverprozess (PID 17372) wurde von Signal 11 beendet: Speicherzugriffsfehler 2020-09-25 17:16:04.209 CEST [8135] DETAIL: Der fehlgeschlagene Prozess führte aus: select * from usersview; src/test/regress/sql/alter_table.sql: 2020-09-25 17:21:36.187 CEST [8135] LOG: Serverprozess (PID 19217) wurde von Signal 11 beendet: Speicherzugriffsfehler 2020-09-25 17:21:36.187 CEST [8135] DETAIL: Der fehlgeschlagene Prozess führte aus: update atacc3 set test2 = 4 where test2 is null; src/test/regress/sql/polymorphism.sql: 2020-09-25 17:23:55.509 CEST [8135] LOG: Serverprozess (PID 21010) wurde von Signal 11 beendet: Speicherzugriffsfehler 2020-09-25 17:23:55.509 CEST [8135] DETAIL: Der fehlgeschlagene Prozess führte aus: select myleast(1.1, 0.22, 0.55); 2020-09-25 17:28:26.222 CEST [8135] LOG: Serverprozess (PID 22325) wurde von Signal 11 beendet: Speicherzugriffsfehler 2020-09-25 17:28:26.222 CEST [8135] DETAIL: Der fehlgeschlagene Prozess führte aus: select f.longname from fullname f; (stopping here) There are also a lot of these log lines (without prefix): ORC error: No callback manager available for s390x-ibm-linux-gnu Is that worrying? I'm not sure but I think I've seen these on other architectures as well. I guess that suggests two things: * jit is not ready for prime time on s390x and I should disable it * jit is not exercised enough by "make installcheck" Christoph
Re: gs_group_1 crashing on 13beta2/s390x
I poked around with the SET in the offending tests, and the crash is only present if `set jit_above_cost = 0;` is present. Removing that makes it pass. Removing work_mem or enable_hashagg does not make a difference. llvm version is 10.0.1. Test file: -- -- Compare results between plans using sorting and plans using hash -- aggregation. Force spilling in both cases by setting work_mem low -- and altering the statistics. -- create table gs_data_1 as select g%1000 as g1000, g%100 as g100, g%10 as g10, g from generate_series(0,1999) g; analyze gs_data_1; alter table gs_data_1 set (autovacuum_enabled = 'false'); update pg_class set reltuples = 10 where relname='gs_data_1'; SET work_mem='64kB'; -- Produce results with sorting. set enable_hashagg = false; set jit_above_cost = 0; -- remove this to remove crash explain (costs off) select g100, g10, sum(g::numeric), count(*), max(g::text) from gs_data_1 group by cube (g1000, g100,g10); create table gs_group_1 as select g100, g10, sum(g::numeric), count(*), max(g::text) from gs_data_1 group by cube (g1000, g100,g10); Christoph
Re: gs_group_1 crashing on 13beta2/s390x
Re: Tom Lane > > Tom> It's hardly surprising that datumCopy would segfault when given a > > Tom> null "value" and told it is pass-by-reference. However, to get to > > Tom> the datumCopy call, we must have passed the MemoryContextContains > > Tom> check on that very same pointer value, and that would surely have > > Tom> segfaulted as well, one would think. > > > Nope, because MemoryContextContains just returns "false" if passed a > > NULL pointer. > > Ah, right. So you could imagine getting here if the finalfn had returned > PointerGetDatum(NULL) with isnull = false. We have some aggregate > transfns that are capable of doing that for internal-type transvalues, > I think, but the finalfn never should do it. So I had another stab at this. As expected, the 13.0 upload to Debian/unstable crashed again on the buildd, while a manual everything-should-be-the-same build succeeded. I don't know why I didn't try this before, but this time I took this manual build and started a PG instance from it. Pasting the gs_group_1 queries made it segfault instantly. So here we are: #0 datumCopy (value=0, typLen=-1, typByVal=false) at ./build/../src/backend/utils/adt/datum.c:142 #1 0x02aa3bf6322e in datumCopy (value=, typByVal=, typLen=) at ./build/../src/backend/utils/adt/datum.c:178 #2 0x02aa3bda6dd6 in finalize_aggregate (aggstate=aggstate@entry=0x2aa3defbfd0, peragg=peragg@entry=0x2aa3e0671f0, pergroupstate=pergroupstate@entry=0x2aa3e026b78, resultVal=resultVal@entry=0x2aa3e067108, resultIsNull=0x2aa3e06712a) at ./build/../src/backend/executor/nodeAgg.c:1153 (gdb) p *resultVal $3 = 0 (gdb) p *resultIsNull $6 = false (gdb) p *peragg $7 = {aggref = 0x2aa3deef218, transno = 2, finalfn_oid = 0, finalfn = {fn_addr = 0x0, fn_oid = 0, fn_nargs = 0, fn_strict = false, fn_retset = false, fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, numFinalArgs = 1, aggdirectargs = 0x0, resulttypeLen = -1, resulttypeByVal = false, shareable = true} Since finalfn_oid is 0, resultVal/resultIsNull were set by the `else` branch of the if (OidIsValid) in finalize_aggregate(): else { /* Don't need MakeExpandedObjectReadOnly; datumCopy will copy it */ *resultVal = pergroupstate->transValue; *resultIsNull = pergroupstate->transValueIsNull; } (gdb) p *pergroupstate $12 = {transValue = 0, transValueIsNull = false, noTransValue = false} That comes from finalize_aggregates: #3 0x02aa3bda7e10 in finalize_aggregates (aggstate=aggstate@entry=0x2aa3defbfd0, peraggs=peraggs@entry=0x2aa3e067140, pergroup=0x2aa3e026b58) at ./build/../src/backend/executor/nodeAgg.c:1369 /* * Run the final functions. */ for (aggno = 0; aggno < aggstate->numaggs; aggno++) { AggStatePerAgg peragg = [aggno]; int transno = peragg->transno; AggStatePerGroup pergroupstate; pergroupstate = [transno]; if (DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit)) finalize_partialaggregate(aggstate, peragg, pergroupstate, [aggno], [aggno]); else finalize_aggregate(aggstate, peragg, pergroupstate, [aggno], [aggno]); } ... but at that point I'm lost. Maybe "aggno" and "transno" got mixed up here? (I'll leave the gdb session open for further suggestions.) Christoph
Re: Probable documentation errors or improvements
Split one patch about text search, added another one (sequences), added some info to commit messages, and added here. https://commitfest.postgresql.org/30/2744/ -- Justin >From adf050ac6cc7d0905fc1613dce1a04f76a892609 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 22 Sep 2020 21:40:17 -0500 Subject: [PATCH v2 01/11] extraneous comma --- doc/src/sgml/rules.sgml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/rules.sgml b/doc/src/sgml/rules.sgml index bcf860b68b..d6e3463ac2 100644 --- a/doc/src/sgml/rules.sgml +++ b/doc/src/sgml/rules.sgml @@ -769,7 +769,7 @@ SELECT t1.a, t2.b, t1.ctid FROM t1, t2 WHERE t1.a = t2.a; -The benefit of implementing views with the rule system is, +The benefit of implementing views with the rule system is that the planner has all the information about which tables have to be scanned plus the relationships between these tables plus the restrictive @@ -781,7 +781,7 @@ SELECT t1.a, t2.b, t1.ctid FROM t1, t2 WHERE t1.a = t2.a; the best path to execute the query, and the more information the planner has, the better this decision can be. And the rule system as implemented in PostgreSQL -ensures, that this is all information available about the query +ensures that this is all information available about the query up to that point. -- 2.17.0 >From d6e81a48596edcd255015767095e8985464051e1 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Wed, 23 Sep 2020 13:09:11 -0500 Subject: [PATCH v2 02/11] semicolons after END in programlisting --- doc/src/sgml/func.sgml| 2 +- doc/src/sgml/plpgsql.sgml | 12 ++-- doc/src/sgml/rules.sgml | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 461b748d89..1e18f332a3 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -26580,7 +26580,7 @@ BEGIN obj.object_name, obj.object_identity; END LOOP; -END +END; $$; CREATE EVENT TRIGGER test_event_trigger_for_drops ON sql_drop diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index 815912666d..a71a8e7e48 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -1143,7 +1143,7 @@ BEGIN SELECT users.userid INTO STRICT userid FROM users WHERE users.username = get_userid.username; RETURN userid; -END +END; $$ LANGUAGE plpgsql; On failure, this function might produce an error message such as @@ -1816,7 +1816,7 @@ BEGIN RETURN NEXT r; -- return current row of SELECT END LOOP; RETURN; -END +END; $BODY$ LANGUAGE plpgsql; @@ -1844,7 +1844,7 @@ BEGIN END IF; RETURN; - END + END; $BODY$ LANGUAGE plpgsql; @@ -1918,7 +1918,7 @@ DECLARE myvar int := 5; BEGIN CALL triple(myvar); RAISE NOTICE 'myvar = %', myvar; -- prints 15 -END +END; $$; @@ -3521,7 +3521,7 @@ BEGIN ROLLBACK; END IF; END LOOP; -END +END; $$; CALL transaction_test1(); @@ -5175,7 +5175,7 @@ DECLARE f1 int; BEGIN RETURN f1; -END +END; $$ LANGUAGE plpgsql; WARNING: variable "f1" shadows a previously defined variable LINE 3: f1 int; diff --git a/doc/src/sgml/rules.sgml b/doc/src/sgml/rules.sgml index d6e3463ac2..e81addcfa9 100644 --- a/doc/src/sgml/rules.sgml +++ b/doc/src/sgml/rules.sgml @@ -2087,7 +2087,7 @@ CREATE FUNCTION tricky(text, text) RETURNS bool AS $$ BEGIN RAISE NOTICE '% = %', $1, $2; RETURN true; -END +END; $$ LANGUAGE plpgsql COST 0.01; SELECT * FROM phone_number WHERE tricky(person, phone); -- 2.17.0 >From db23e239d021e018c3b9d6cb4d983f5bbcae8b85 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 22 Sep 2020 22:51:47 -0500 Subject: [PATCH v2 03/11] punctuation --- doc/src/sgml/catalogs.sgml | 2 +- doc/src/sgml/config.sgml | 4 ++-- doc/src/sgml/parallel.sgml | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index de9bacd34f..d1b8fc8010 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -1525,7 +1525,7 @@ Role can log in. That is, this role can be given as the initial - session authorization identifier + session authorization identifier. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 8eabf93834..1c753ccb7e 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -10084,8 +10084,8 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1) -If set, do not trace locks for tables below this OID. (use to avoid -output on system tables) +If set, do not trace locks for tables below this OID (used to avoid +output on system tables). This parameter is only available if the LOCK_DEBUG diff --git
Re: Dumping/restoring fails on inherited generated column
Peter Eisentraut writes: > The proposed patches will cause the last statement to be omitted, but > that still won't recreate the original state. The problem is that there > is no command to make a column generated afterwards, like the SET > DEFAULT command, so we can't dump it like this. Right. I'm not even sure what such a command should do ... would it run through all existing rows and update them to be the GENERATED value? > We could probably fix this by having ALTER TABLE ONLY / DROP EXPRESSION > update the attlocal column of direct children to true, to make the > catalog state look like something that can be restored. However, that's > a fair amount of complicated code, so for now I propose to just prohibit > this command, meaning you can't use ONLY in this command if there are > children. This is new in PG13, so this change would have very limited > impact in practice. +1. At this point we would want some fairly un-complicated fix for the v13 branch anyhow, and this seems to fit the bill. (Also, having child columns suddenly grow an attislocal property doesn't seem to meet the principle of least surprise.) regards, tom lane
Re: Transactions involving multiple postgres foreign servers, take 2
On Fri, 25 Sep 2020 at 18:21, tsunakawa.ta...@fujitsu.com wrote: > > From: Masahiko Sawada > > I don't think it's always possible to avoid raising errors in advance. > > Considering how postgres_fdw can implement your idea, I think > > postgres_fdw would need PG_TRY() and PG_CATCH() for its connection > > management. It has a connection cache in the local memory using HTAB. > > It needs to create an entry for the first time to connect (e.g., when > > prepare and commit prepared a transaction are performed by different > > processes) and it needs to re-connect the foreign server when the > > entry is invalidated. In both cases, ERROR could happen. I guess the > > same is true for other FDW implementations. Possibly other FDWs might > > need more work for example cleanup or releasing resources. I think > > Why does the client backend have to create a new connection cache entry > during PREPARE or COMMIT PREPARE? Doesn't the client backend naturally > continue to use connections that it has used in its current transaction? I think there are two cases: a process executes PREPARE TRANSACTION and another process executes COMMIT PREPARED later, and if the coordinator has cascaded foreign servers (i.g., a foreign server has its foreign server) and temporary connection problem happens in the intermediate node after PREPARE then another process on the intermediate node will execute COMMIT PREPARED on its foreign server. > > > > that the pros of your idea are to make the transaction manager simple > > since we don't need resolvers and launcher but the cons are to bring > > the complexity to FDW implementation codes instead. Also, IMHO I don't > > think it's safe way that FDW does neither re-throwing an error nor > > abort transaction when an error occurs. > > No, I didn't say the resolver is unnecessary. The resolver takes care of > terminating remote transactions when the client backend encountered an error > during COMMIT/ROLLBACK PREPARED. Understood. With your idea, we can remove at least the code of making backend wait and inter-process communication between backends and resolvers. I think we need to consider that it's really safe and what needs to achieve your idea safely. > > > > In terms of performance you're concerned, I wonder if we can somewhat > > eliminate the bottleneck if multiple resolvers are able to run on one > > database in the future. For example, if we could launch resolver > > processes as many as connections on the database, individual backend > > processes could have one resolver process. Since there would be > > contention and inter-process communication it still brings some > > overhead but it might be negligible comparing to network round trip. > > Do you mean that if concurrent 200 clients each update data on two foreign > servers, there are 400 resolvers? ...That's overuse of resources. I think we have 200 resolvers in this case since one resolver process per backend process. Or another idea is that all processes queue foreign transactions to resolve into the shared memory queue and resolver processes fetch and resolve them instead of assigning one distributed transaction to one resolver process. Using asynchronous execution, the resolver process can process a bunch of foreign transactions across distributed transactions and grouped by the foreign server at once. It might be more complex than the current approach but having multiple resolver processes on one database would increase through-put well especially by combining with asynchronous execution. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Dumping/restoring fails on inherited generated column
> On 25 Sep 2020, at 15:07, Peter Eisentraut > wrote: > We could probably fix this by having ALTER TABLE ONLY / DROP EXPRESSION > update the attlocal column of direct children to true, to make the catalog > state look like something that can be restored. However, that's a fair > amount of complicated code, so for now I propose to just prohibit this > command, meaning you can't use ONLY in this command if there are children. > This is new in PG13, so this change would have very limited impact in > practice. That sounds a bit dramatic. Do you propose to do that in v13 as well or just in HEAD? If the latter, considering that the window until the 14 freeze is quite wide shouldn't we try to fix it first? cheers ./daniel
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Sep 25, 2020 at 5:47 PM Amit Kapila wrote: > > > > > At least in the case of Parallel INSERT, the leader for the Parallel > > INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed > > through and assigned to each of the workers during their > > initialization (so they are assigned the same xid). > > > > So are you facing problems in this area because we EnterParallelMode > before even assigning the xid in the leader? Because I don't think we > should ever reach this code in the worker. If so, there are two > possibilities that come to my mind (a) assign xid in leader before > entering parallel mode or (b) change the check so that we don't assign > the new xid in workers. In this case, I am again wondering how does > parallel copy dealing this? > In parallel copy, we are doing option (a) i.e. the leader gets the full txn id before entering parallel mode and passes it to all workers. In the leader: full_transaction_id = GetCurrentFullTransactionId(); EnterParallelMode(); shared_info_ptr->full_transaction_id = full_transaction_id; In the workers: AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id); Hence below part of the code doesn't get hit. if (IsInParallelMode() || IsParallelWorker()) elog(ERROR, "cannot assign XIDs during a parallel operation"); We also deal with the commandid similarly i.e. the leader gets the command id, and workers would use it while insertion. In the leader: shared_info_ptr->mycid = GetCurrentCommandId(true); In the workers: AssignCommandIdForWorker(pcshared_info->mycid, true); [1] void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId) { TransactionState s = CurrentTransactionState; Assert((IsInParallelMode() || IsParallelWorker())); s->fullTransactionId = fullTransactionId; } void AssignCommandIdForWorker(CommandId commandId, bool used) { Assert((IsInParallelMode() || IsParallelWorker())); /* this is global to a transaction, not subtransaction-local */ if (used) currentCommandIdUsed = true; currentCommandId = commandId; } With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Dumping/restoring fails on inherited generated column
I have been analyzing this issue again. We have a few candidate patches that do very similar things for avoiding dumping the generation expression of table gtest1_1. We can figure out later which one of these we like best. But there is another issue lurking nearby. The table hierarchy of gtest30, which is created in the regression tests like this: CREATE TABLE gtest30 ( a int, b int GENERATED ALWAYS AS (a * 2) STORED ); CREATE TABLE gtest30_1 () INHERITS (gtest30); ALTER TABLE ONLY gtest30 ALTER COLUMN b DROP EXPRESSION; This drops the generation expression from the parent table but not the child table. This is currently dumped like this: CREATE TABLE public.gtest30 ( a integer, b integer ); CREATE TABLE public.gtest30_1 ( ) INHERITS (public.gtest30); ALTER TABLE ONLY public.gtest30_1 ALTER COLUMN b SET DEFAULT (a * 2); The proposed patches will cause the last statement to be omitted, but that still won't recreate the original state. The problem is that there is no command to make a column generated afterwards, like the SET DEFAULT command, so we can't dump it like this. We would have to produce CREATE TABLE public.gtest30 ( a integer, b integer ); CREATE TABLE public.gtest30_1 ( b integer GENERATED ALWAYS AS (a * 2) STORED ) INHERITS (public.gtest30); but this will create the column "b" of gtest30_1 as attlocal, which the original command sequence does not. We could probably fix this by having ALTER TABLE ONLY / DROP EXPRESSION update the attlocal column of direct children to true, to make the catalog state look like something that can be restored. However, that's a fair amount of complicated code, so for now I propose to just prohibit this command, meaning you can't use ONLY in this command if there are children. This is new in PG13, so this change would have very limited impact in practice. Proposed patch attached. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 6c4209883dad527eb1140a61ed7ac6a610cf2a14 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 25 Sep 2020 14:51:59 +0200 Subject: [PATCH] Disallow ALTER TABLE ONLY / DROP EXPRESSION The current implementation cannot handle this correct, so just forbid it for now. GENERATED clauses must be attached to the column definition and cannot be added later like DEFAULT, so if a child table has a generation expression that the parent does not have, the child column will necessarily be an attlocal column. So to implement ALTER TABLE ONLY / DROP EXPRESSION, we'd need extra code to update attislocal of the direct child tables, somewhat similar to how DROP COLUMN does it, so that the resulting state can be properly dumped and restored. --- src/backend/commands/tablecmds.c| 22 +++--- src/test/regress/expected/generated.out | 11 ++- src/test/regress/sql/generated.sql | 2 +- 3 files changed, 26 insertions(+), 9 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 16285ad09f..b1b1b1e74a 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -412,7 +412,7 @@ static ObjectAddress ATExecAddIdentity(Relation rel, const char *colName, static ObjectAddress ATExecSetIdentity(Relation rel, const char *colName, Node *def, LOCKMODE lockmode); static ObjectAddress ATExecDropIdentity(Relation rel, const char *colName, bool missing_ok, LOCKMODE lockmode); -static void ATPrepDropExpression(Relation rel, AlterTableCmd *cmd, bool recursing); +static void ATPrepDropExpression(Relation rel, AlterTableCmd *cmd, bool recurse, bool recursing, LOCKMODE lockmode); static ObjectAddress ATExecDropExpression(Relation rel, const char *colName, bool missing_ok, LOCKMODE lockmode); static ObjectAddress ATExecSetStatistics(Relation rel, const char *colName, int16 colNum, Node *newValue, LOCKMODE lockmode); @@ -4142,7 +4142,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, case AT_DropExpression: /* ALTER COLUMN DROP EXPRESSION */ ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE); ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode, context); - ATPrepDropExpression(rel, cmd, recursing); + ATPrepDropExpression(rel, cmd, recurse, recursing, lockmode); pass = AT_PASS_DROP; break; case AT_SetStatistics: /* ALTER COLUMN SET STATISTICS */ @@ -7240,8 +7240,24 @@ ATExecDropIdentity(Relation rel, const char *colName, bool missing_ok, LOCKMODE * ALTER TABLE ALTER COLUMN DROP EXPRESSION */ static void -ATPrepDropExpression(Relation rel, AlterTableCmd
Re: Feature improvement for FETCH tab completion
On 2020/09/25 17:21, btnakamichin wrote: 2020-09-25 15:38 に Fujii Masao さんは書きました: On 2020/09/25 14:24, btnakamichin wrote: Hello! I’d like to improve the FETCH tab completion. The FETCH tab completion . Therefore, this patch fixes the problem. Previous function completed one of FORWARD, BACKWARD, RELATIVE, ABSOLUTE, but now it completes one of FORWARD, BACKWARD, RELATIVE, ABSOLUTE, ALL, NEXT, PRIOR, FIRST, LAST and Corresponded to later IN and FROM clauses. Thanks for the patch! Here are review comments. + /* Complete FETCH BACKWARD or FORWARD with one of ALL */ + else if (Matches("FETCH|MOVE", "BACKWARD|FORWARD")) + COMPLETE_WITH("ALL"); Not only "ALL" but also "FROM" and "IN" should be displayed here because they also can follow "BACKWARD" and "FORWARD"? else if (Matches("FETCH|MOVE", MatchAny, MatchAny)) + else if (Matches("FETCH|MOVE", "ABSOLUTE|BACKWARD|FORWARD|RELATIVE", MatchAny)) + COMPLETE_WITH("FROM", "IN"); This change seems to cause "FETCH FORWARD FROM " to display "FROM" and "IN". To avoid this confusing tab-completion, we should use something like MatchAnyExcept("FROM|IN") here, instead? Regards, I’m Sorry, I forgot to include pgsql_hackers in the cc, so I resend it Thank you, I appreciate your comment. I have attached patch with newline. Thanks for updating the patch! The patch should include only the change of tab-complete for FETCH, but accidentally it also includes DEALLOCATE that's proposed by you in another thread. So I exclude DEALLOCATE part from the patch. Attached is the updated version of the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 9c6f5ecb6a..3cf7698c0b 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -3076,19 +3076,27 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("SELECT", "INSERT", "DELETE", "UPDATE", "DECLARE"); /* FETCH && MOVE */ - /* Complete FETCH with one of FORWARD, BACKWARD, RELATIVE */ + + /* +* Complete FETCH with one of ABSOLUTE, BACKWARD, FORWARD, RELATIVE, ALL, +* NEXT, PRIOR, FIRST, LAST +*/ else if (Matches("FETCH|MOVE")) - COMPLETE_WITH("ABSOLUTE", "BACKWARD", "FORWARD", "RELATIVE"); - /* Complete FETCH with one of ALL, NEXT, PRIOR */ - else if (Matches("FETCH|MOVE", MatchAny)) - COMPLETE_WITH("ALL", "NEXT", "PRIOR"); + COMPLETE_WITH("ABSOLUTE", "BACKWARD", "FORWARD", "RELATIVE", + "ALL", "NEXT", "PRIOR", "FIRST", "LAST"); + + /* Complete FETCH BACKWARD or FORWARD with one of ALL, FROM, IN */ + else if (Matches("FETCH|MOVE", "BACKWARD|FORWARD")) + COMPLETE_WITH("ALL", "FROM", "IN"); /* -* Complete FETCH with "FROM" or "IN". These are equivalent, +* Complete FETCH with "FROM" or "IN". These are equivalent, * but we may as well tab-complete both: perhaps some users prefer one * variant or the other. */ - else if (Matches("FETCH|MOVE", MatchAny, MatchAny)) + else if (Matches("FETCH|MOVE", "ABSOLUTE|BACKWARD|FORWARD|RELATIVE", +MatchAnyExcept("FROM|IN")) || +Matches("FETCH|MOVE", "ALL|NEXT|PRIOR|FIRST|LAST")) COMPLETE_WITH("FROM", "IN"); /* FOREIGN DATA WRAPPER */
Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
On Fri, Sep 25, 2020 at 3:21 PM Fujii Masao wrote: > > I think that we can simplify the code by merging the connection-retry > code into them, like the attached very WIP patch (based on yours) does. > +1. > > + else > + ereport(ERROR, > + (errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION), > +errmsg("could not connect to server \"%s\"", > + server->servername), > +errdetail_internal("%s", pchomp(PQerrorMessage(entry->conn); > > The above is not necessary? If this error occurs, connect_pg_server() > reports an error, before the above code is reached. Right? > Removed. Thanks for the comments. I think we need to have a volatile qualifier for need_reset_conn, because of the sigsetjmp. Instead of "need_reset_conn", "retry_conn" would be more meaningful and also instead of goto label name "reset;", "retry:". I changed "closing connection %p to reestablish connection" to "closing connection %p to reestablish a new one" I also adjusted the comments to be under the 80char limit. I feel, when we are about to close an existing connection and reestablish a new connection, it will be good to have a debug3 message saying that we "could not connect to postgres_fdw connection %p"[1]. Attaching v5 patch that has the above changes. Both make check and make check-world regression tests passes. Please review. [1] This would tell the user that we are not able to connect to the connection. postgres=# SELECT * FROM foreign_tbl; DEBUG: starting remote transaction on connection 0x55ab0e416830 DEBUG: could not connect to postgres_fdw connection 0x55ab0e416830 DEBUG: closing connection 0x55ab0e416830 to reestablish a new one DEBUG: new postgres_fdw connection 0x55ab0e416830 for server "foreign_server" (user mapping oid 16407, userid 10) DEBUG: starting remote transaction on connection 0x55ab0e416830 DEBUG: closing remote transaction on connection 0x55ab0e416830 a1 | b1 -+- 100 | 200 Without the above message, it would look like we are starting remote txn, and closing connection without any reason. postgres=# SELECT * FROM foreign_tbl; DEBUG: starting remote transaction on connection 0x55ab0e4c0d50 DEBUG: closing connection 0x55ab0e4c0d50 to reestablish a new one DEBUG: new postgres_fdw connection 0x55ab0e4c0d50 for server "foreign_server" (user mapping oid 16389, userid 10) DEBUG: starting remote transaction on connection 0x55ab0e4c0d50 DEBUG: closing remote transaction on connection 0x55ab0e4c0d50 a1 | b1 -+- 100 | 200 With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v5-Retry-Cached-Remote-Connections-For-postgres_fdw.patch Description: Binary data
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Sep 25, 2020 at 10:02 AM Greg Nancarrow wrote: > > Hi Andres, > > On Thu, Sep 24, 2020 at 12:21 PM Andres Freund wrote: > > > > > >> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value, > >> TupleDesc toasttupDesc; > >> Datum t_values[3]; > >> boolt_isnull[3]; > >> - CommandId mycid = GetCurrentCommandId(true); > >> + CommandId mycid = GetCurrentCommandId(!IsParallelWorker()); > >> struct varlena *result; > >> struct varatt_external toast_pointer; > >> union > > > >Hm? Why do we need this in the various places you have made this change? > > It's because for Parallel INSERT, we're assigning the same command-id > to each worker up-front during worker initialization (the commandId > has been retrieved by the leader and passed through to each worker) > and "currentCommandIdUsed" has been set true. See the > AssignCommandIdForWorker() function in the patch. > If you see the code of GetCurrentCommandId(), you'll see it Assert > that it's not being run by a parallel worker if the parameter is true. > I didn't want to remove yet another check, without being able to know > the context of the caller, because only for Parallel INSERT do I know > that "currentCommandIdUsed was already true at the start of the > parallel operation". See the comment in that function. Anyway, that's > why I'm passing "false" to relevant GetCurrentCommandId() calls if > they're being run by a parallel (INSERT) worker. > But we can tighten the condition in GetCurrentCommandId() such that it Asserts for parallel worker only when currentCommandIdUsed is not set before start of parallel operation. I also find these changes in the callers of GetCurrentCommandId() quite adhoc and ugly even if they are correct. Also, why we don't face a similar problems for parallel copy? > > >> diff --git a/src/backend/access/transam/varsup.c > >> b/src/backend/access/transam/varsup.c > >> index a4944fa..9d3f100 100644 > >> --- a/src/backend/access/transam/varsup.c > >> +++ b/src/backend/access/transam/varsup.c > >> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact) > >> TransactionId xid; > >> > >> /* > >> - * Workers synchronize transaction state at the beginning of each > >> parallel > >> - * operation, so we can't account for new XIDs after that point. > >> - */ > >> - if (IsInParallelMode()) > >> - elog(ERROR, "cannot assign TransactionIds during a parallel > >> operation"); > >> - > >> - /* > >>* During bootstrap initialization, we return the special bootstrap > >>* transaction id. > >>*/ > > > >Same thing, this code cannot just be allowed to be reachable. What > >prevents you from assigning two different xids from different workers > >etc? > > At least in the case of Parallel INSERT, the leader for the Parallel > INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed > through and assigned to each of the workers during their > initialization (so they are assigned the same xid). > So are you facing problems in this area because we EnterParallelMode before even assigning the xid in the leader? Because I don't think we should ever reach this code in the worker. If so, there are two possibilities that come to my mind (a) assign xid in leader before entering parallel mode or (b) change the check so that we don't assign the new xid in workers. In this case, I am again wondering how does parallel copy dealing this? -- With Regards, Amit Kapila.
Re: enable pg_stat_statements to track rows processed by REFRESH MATERIALIZED VIEW
oups, sorry so +1 for this fix Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Sep 25, 2020 at 7:01 PM Bharath Rupireddy wrote: > I have few points (inspired from parallel copy feature work) to mention: > > 1. What if the target table is a foreign table or partitioned table? > 2. What happens if the target table has triggers(before statement, > after statement, before row, after row) that are parallel unsafe? > 3. Will each worker be doing single row insertions or multi inserts? > If single row insertions, will the buffer lock contentions be more? > 5. How does it behave with toast columns values? > 6. How does it behave if we have a RETURNING clause with INSERT INTO SELECT? > Hi Bharath, Thanks for pointing out more cases I need to exclude and things I need to investigate further. I have taken note of them, and will do more testing and improvement. At least RETURNING clause with INSERT INTO SELECT is working! Regards, Greg Nancarrow Fujitsu Australia
Re: BUG #16419: wrong parsing BC year in to_date() function
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Patch looks good to me. The new status of this patch is: Ready for Committer
Re: Resetting spilled txn statistics in pg_stat_replication
On Thu, Sep 24, 2020 at 5:44 PM Amit Kapila wrote: > > On Sat, Sep 19, 2020 at 1:48 PM Amit Kapila wrote: > > > > On Tue, Sep 8, 2020 at 7:02 PM Amit Kapila wrote: > > > > > > On Tue, Sep 8, 2020 at 7:53 AM Masahiko Sawada > > > wrote: > > > > I have fixed these review comments in the attached patch. > > > > > > Apart from the above, > > (a) fixed one bug in ReorderBufferSerializeTXN() where we were > > updating the stats even when we have not spilled anything. > > (b) made changes in pgstat_read_db_statsfile_timestamp to return false > > when the replication slot entry is corrupt. > > (c) move the declaration and definitions in pgstat.c to make them > > consistent with existing code > > (d) made another couple of cosmetic fixes and changed a few comments > > (e) Tested the patch by using a guc which allows spilling all the > > changes. See v4-0001-guc-always-spill > > > > I have found a way to write the test case for this patch. This is > based on the idea we used in stats.sql. As of now, I have kept the > test as a separate patch. We can decide to commit the test part > separately as it is slightly timing dependent but OTOH as it is based > on existing logic in stats.sql so there shouldn't be much problem. I > have not changed anything apart from the test patch in this version. > Note that the first patch is just a debugging kind of tool to test the > patch. > I have done some more testing of this patch especially for the case where we spill before streaming the transaction and found everything is working as expected. Additionally, I have changed a few more comments and ran pgindent. I am still not very sure whether we want to display physical slots in this view as all the stats are for logical slots but anyway we can add stats w.r.t physical slots in the future. I am fine either way (don't show physical slots in this view or show them but keep stats as 0). Let me know if you have any thoughts on these points, other than that I am happy with the current state of the patch. -- With Regards, Amit Kapila. v6-0001-Track-statistics-for-spilling-of-changes-from-Reo.patch Description: Binary data v6-0002-Test-stats.patch Description: Binary data
Re: enable pg_stat_statements to track rows processed by REFRESH MATERIALIZED VIEW
On 2020/09/25 19:04, legrand legrand wrote: Hi, isn't this already fixed in pg14 https://www.postgresql.org/message-id/e1k0mzg-0002vn...@gemulon.postgresql.org ? IIUC that commit handled CREATE TABLE AS, SELECT INTO, CREATE MATERIALIZED VIEW and FETCH commands, but not REFRESH MATERIALIZED VIEW. Katsuragi-san's patch is for REFRESH MATERIALIZED VIEW. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Online checksums verification in the backend
On Wed, Sep 16, 2020 at 11:46 AM Michael Paquier wrote: > > On Fri, Sep 11, 2020 at 09:49:16AM +0200, Julien Rouhaud wrote: > > Thanks! > > I got some numbers out of my pocket, using the following base > configuration: > [...] > > Within parenthesis is the amount of time taken by pg_relation_check() > for a single check. This is not completely exact and I saw some > variations, just to give an impression: > Conns 64 128 256 > force_lock=true 60590 (7~8s) 55652 (10.2~10.5s) 46812 (9~10s) > force_lock=false 58637 (5s)54131 (6~7s) 37091 (1.1~1.2s) > > For connections higher than 128, I was kind of surprised to see > pg_relation_check being more aggressive without the optimization, with > much less contention on the buffer mapping LWLock actually, but that's > an impression from looking at pg_stat_activity. > > Looking at the wait events for each run, I saw much more hiccups with > the buffer mapping LWLock when forcing the lock rather than not, still > I was able to see some contention when also not forcing the lock. Not > surprising as this rejects a bunch of pages from shared buffers. > > > I used all default settings, but by default checksum_cost_delay is 0 > > so there shouldn't be any throttling. > > Thanks, so did I. From what I can see, there could be as well > benefits in not using the optimization by default so as the relation > check applies some natural throttling by making the checks actually > slower (there is a link between the individual runtime of > pg_relation_time and the TPS). Thanks a lot for the tests! I'm not surprised that forcing the lock will slow down the pg_check_relation() execution, but I'm a bit surprised that holding the buffer mapping lock longer in a workload that has a lot of evictions actually makes things faster. Do you have any idea why that's the case? > So it also seems to me that the > throttling parameters would be beneficial, but it looks to me that > there is as well a point to not include any throttling in a first > version if the optimization to go full speed is not around. Using > three new GUCs for those function calls is still too much IMO I'm assuming that you prefer to remove both the optimization and the throttling part? I'll do that with the next version unless there's objections. >, so > there is also the argument to move all this stuff into a new contrib/ > module, and have a bgworker implementation as part of it as it would > need shared_preload_libraries anyway. > > Also, I have been putting some thoughts into the APIs able to fetch a > buffer without going through the shared buffers. And neither > checksum.c, because it should be a place that those APIs depends on > and include only the most-internal logic for checksum algorithm and > computation, nor checksumfuncs.c, because we need to think about the > case of a background worker as well (that could spawn a set of dynamic > workers connecting to different databases able to do checksum > verifications?). It would be good to keep the handling of the buffer > mapping lock as well as the calls to smgrread() into a single place. > ReadBuffer_common() is a natural place for that, though it means for > our use case the addition of three new options: > - Being able to pass down directly a buffer pointer to save the page > contents. > - Being able to not verify directly a page, leaving the verification > to the caller upthread. > - Addition of a new mode, that I am calling here RBM_PRIVATE, where we > actually read the page from disk if not yet in shared buffers, except > that we fill in the contents of the page using the pointer given by > the caller. That's different than the use of local buffers, as we > don't need this much amount of complications like temporary tables of > course for per-page checks. > > Another idea would be to actually just let ReadBuffer_common just do > the check by itself, with a different mode like a kind of > RBM_VALIDATE, where we just return a verification state of the page > that can be consumed by callers. > > This also comes with some more advantages: > - Tracking of reads from disk with track_io_timing. > - Addition of some private stats dedicated to this private mode, with > new fields in pgBufferUsage, all in a single place I agree that putting the code nearby ReadBuffer_common() would be a good idea. However, that means that I can't move all the code to contrib/ I'm wondering what you'd like to see going there. I can see some values in also having the SQL functions available in core rather than contrib, e.g. if you need to quickly check a relation on a standby, so without requiring to create the extension on the primary node first. Then, I'm a bit worried about adding this code in ReadBuffer_common. What this code does is quite different, and I'm afraid that it'll make ReadBuffer_common more complex than needed, which is maybe not a good idea for something as critical as this
Re: enable pg_stat_statements to track rows processed by REFRESH MATERIALIZED VIEW
Hi, isn't this already fixed in pg14 https://www.postgresql.org/message-id/e1k0mzg-0002vn...@gemulon.postgresql.org ? Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
On 2020/09/25 13:56, Bharath Rupireddy wrote: On Wed, Sep 23, 2020 at 8:19 PM Fujii Masao wrote: Please let me know if okay with the above agreed points, I will work on the new patch. Yes, please work on the patch! Thanks! I may revisit the above points later, though ;) Thanks, attaching v4 patch. Please consider this for further review. Thanks for updating the patch! In the orignal code, disconnect_pg_server() is called when invalidation occurs and connect_pg_server() is called when no valid connection exists. I think that we can simplify the code by merging the connection-retry code into them, like the attached very WIP patch (based on yours) does. Basically I'd like to avoid duplicating disconnect_pg_server(), connect_pg_server() and begin_remote_xact() if possible. Thought? Your patch adds several codes into PG_CATCH() section, but it's better to keep that section simple enough (as the source comment for PG_CATCH() explains). So what about moving some codes out of PG_CATCH() section? + else + ereport(ERROR, + (errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION), +errmsg("could not connect to server \"%s\"", + server->servername), +errdetail_internal("%s", pchomp(PQerrorMessage(entry->conn); The above is not necessary? If this error occurs, connect_pg_server() reports an error, before the above code is reached. Right? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 08daf26fdf..9f76261d99 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -108,6 +108,7 @@ PGconn * GetConnection(UserMapping *user, bool will_prep_stmt) { boolfound; + boolneed_reset_conn = false; ConnCacheEntry *entry; ConnCacheKey key; @@ -159,23 +160,25 @@ GetConnection(UserMapping *user, bool will_prep_stmt) /* Reject further use of connections which failed abort cleanup. */ pgfdw_reject_incomplete_xact_state_change(entry); +reset: /* * If the connection needs to be remade due to invalidation, disconnect as -* soon as we're out of all transactions. +* soon as we're out of all transactions. Also if previous attempt to start +* new remote transaction failed on the cached connection, disconnect +* it to reestablish new connection. */ - if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0) + if ((entry->conn != NULL && entry->invalidated && entry->xact_depth == 0) || + need_reset_conn) { - elog(DEBUG3, "closing connection %p for option changes to take effect", -entry->conn); + if (need_reset_conn) + elog(DEBUG3, "closing connection %p to reestablish connection", +entry->conn); + else + elog(DEBUG3, "closing connection %p for option changes to take effect", +entry->conn); disconnect_pg_server(entry); } - /* -* We don't check the health of cached connection here, because it would -* require some overhead. Broken connection will be detected when the -* connection is actually used. -*/ - /* * If cache entry doesn't have a connection, we have to establish a new * connection. (If connect_pg_server throws an error, the cache entry @@ -206,9 +209,31 @@ GetConnection(UserMapping *user, bool will_prep_stmt) } /* -* Start a new transaction or subtransaction if needed. +* We check the health of the cached connection here, while the remote +* xact is going to begin. Broken connection will be detected and the +* connection is retried. We do this only once during each begin remote +* xact call, if the connection is still broken after the retry attempt, +* then the query will fail. */ - begin_remote_xact(entry); + PG_TRY(); + { + /* Start a new transaction or subtransaction if needed. */ + begin_remote_xact(entry); + need_reset_conn = false; + } + PG_CATCH(); + { + if (!entry->conn || + PQstatus(entry->conn) != CONNECTION_BAD || + entry->xact_depth > 0 || + need_reset_conn) + PG_RE_THROW(); + need_reset_conn = true; + } + PG_END_TRY(); + + if (need_reset_conn) +
Re: [patch] Concurrent table reindex per-index progress reporting
On Fri, 25 Sep 2020 at 08:44, Michael Paquier wrote: > > On Thu, Sep 24, 2020 at 09:19:18PM +0200, Matthias van de Meent wrote: > > While working on a PG12-instance I noticed that the progress reporting of > > concurrent index creation for non-index relations fails to update the > > index/relation OIDs that it's currently working on in the > > pg_stat_progress_create_index view after creating the indexes. Please find > > attached a patch which properly sets these values at the appropriate places. > > > > Any thoughts? > > I agree that this is a bug and that we had better report correctly the > heap and index IDs worked on, as these could also be part of a toast > relation from the parent table reindexed. However, your patch is > visibly incorrect in the two places you are updating. Thanks for checking the patch, I should've checked it more than I did. > > > + * Configure progress reporting to report for this index. > > + * While we're at it, reset the phase as it is cleared by > > start_command. > > + */ > > + pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, > > heapId); > > + pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID, > > newIndexId); > > + pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE, > > + > > PROGRESS_CREATEIDX_PHASE_WAIT_1); > > First, updating PROGRESS_CREATEIDX_PHASE here to WAIT_1 makes no > sense, because this is not a wait phase, and index_build() called > within index_concurrently_build() will set this state correctly a bit > after so IMO it is fine to leave that unset here, and the build phase > is by far the bulk of the operation that needs tracking. I think that > you are also missing to update PROGRESS_CREATEIDX_COMMAND to > PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY, and > PROGRESS_CREATEIDX_ACCESS_METHOD_OID to the correct index AM OID, > similarly to reindex_index(). Updating to WAIT_1 was to set it back to whatever the state was before we would get into the loop and start_command was called. I quite apparently didn't take enough care to set all fields that could be set, though, so thanks for noticing. > > + /* > > + * Configure progress reporting to report for this index. > > + * While we're at it, reset the phase as it is cleared by > > start_command. > > + */ > > + pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, > > heapId); > > + pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID, > > newIndexId); > > + pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE, > > + > > PROGRESS_CREATEIDX_PHASE_WAIT_2); > > + > > validate_index(heapId, newIndexId, snapshot); > > Basically the same three comments here: PROGRESS_CREATEIDX_PHASE set > to WAIT_2 makes no real sense, and validate_index() would update the > phase as it should be. This should set PROGRESS_CREATEIDX_COMMAND to > PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY, and > PROGRESS_CREATEIDX_ACCESS_METHOD_OID to the correct index AM OID. Basically the same response: I didn't take enough care to correctly reset the state, while being conservative in what I did put back. > > While reviewing this code, I also noticed that the wait state set just > before dropping the indexes was wrong. The code was using WAIT_4, but > this has to be WAIT_5. > > And this gives me the attached. This also means that we still set the > table and relation OID to the last index worked on for WAIT_2, WAIT_4 > and WAIT_5, but we cannot set the command start to relationOid either > as given in input of ReindexRelationConcurrently() as this could be a > single index given for REINDEX INDEX CONCURRENTLY. Matthias, does > that look right to you? It looks great, thanks! -Matthias
Re: FETCH FIRST clause PERCENT option
Hi Michael On Thu, Sep 24, 2020 at 6:58 AM Michael Paquier wrote: > On Mon, Aug 10, 2020 at 01:23:44PM +0300, Surafel Temesgen wrote: > > I also Implement PERCENT WITH TIES option. patch is attached > > i don't start a new tread because the patches share common code > > This fails to apply per the CF bot. Could you send a rebase? > -- > Attaches are rebased patches regards Surafel diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index a31abce7c9..135b98da5d 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -3096,7 +3096,8 @@ estimate_path_cost_size(PlannerInfo *root, if (fpextra && fpextra->has_limit) { adjust_limit_rows_costs(, _cost, _cost, - fpextra->offset_est, fpextra->count_est); + LIMIT_OPTION_COUNT, fpextra->offset_est, + fpextra->count_est); retrieved_rows = rows; } } diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index b93e4ca208..9811380ec6 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] -[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ] +[ FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } { ONLY | WITH TIES } ] [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] where from_item can be one of: @@ -1434,7 +1434,7 @@ OFFSET start which PostgreSQL also supports. It is: OFFSET start { ROW | ROWS } -FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } +FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } { ONLY | WITH TIES } In this syntax, the start or count value is required by @@ -1444,6 +1444,9 @@ FETCH { FIRST | NEXT } [ count ] { ambiguity. If count is omitted in a FETCH clause, it defaults to 1. +With PERCENT count specifies the maximum number of rows to return +in percentage round up to the nearest integer. Using PERCENT +is best suited to returning single-digit percentages of the query's total row count. The WITH TIES option is used to return any additional rows that tie for the last place in the result set according to the ORDER BY clause; ORDER BY diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt index b6e58e8493..68e0c5e2e1 100644 --- a/src/backend/catalog/sql_features.txt +++ b/src/backend/catalog/sql_features.txt @@ -344,7 +344,7 @@ F862 in subqueries YES F863 Nested in YES F864 Top-level in views YES F865 in YES -F866 FETCH FIRST clause: PERCENT option NO +F866 FETCH FIRST clause: PERCENT option YES F867 FETCH FIRST clause: WITH TIES option YES R010 Row pattern recognition: FROM clause NO R020 Row pattern recognition: WINDOW clause NO diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c index d85cf7d93e..cfb28a9fd4 100644 --- a/src/backend/executor/nodeLimit.c +++ b/src/backend/executor/nodeLimit.c @@ -21,6 +21,8 @@ #include "postgres.h" +#include + #include "executor/executor.h" #include "executor/nodeLimit.h" #include "miscadmin.h" @@ -29,6 +31,8 @@ static void recompute_limits(LimitState *node); static int64 compute_tuples_needed(LimitState *node); +#define IsPercentOption(opt) \ + (opt == LIMIT_OPTION_PERCENT || opt == LIMIT_OPTION_PER_WITH_TIES) /* * ExecLimit @@ -53,6 +57,7 @@ ExecLimit(PlanState *pstate) */ direction = node->ps.state->es_direction; outerPlan = outerPlanState(node); + slot = node->subSlot; /* * The main logic is a simple state machine. @@ -82,7 +87,15 @@ ExecLimit(PlanState *pstate) /* * Check for empty window; if so, treat like empty subplan. */ - if (node->count <= 0 && !node->noCount) + if (IsPercentOption(node->limitOption)) + { +if (node->percent == 0.0) +{ + node->lstate = LIMIT_EMPTY; + return NULL; +} + } + else if (node->count <= 0 && !node->noCount) { node->lstate = LIMIT_EMPTY; return NULL; @@ -118,6 +131,16 @@ ExecLimit(PlanState *pstate) break; } + /* + * We may needed this tuple in subsequent scan so put it into + * tuplestore. + */ + if (IsPercentOption(node->limitOption)) + { +tuplestore_puttupleslot(node->tupleStore, slot); +tuplestore_advance(node->tupleStore, true); + } + /* * Okay, we have the first tuple of the window. */ @@ -135,6 +158,85 @@ ExecLimit(PlanState *pstate) case LIMIT_INWINDOW: if (ScanDirectionIsForward(direction)) { +/* + * In case of coming back from backward scan the tuple is + * already in tuple store. + */ +
RE: [Patch] Optimize dropping of relation buffers using dlist
On Friday, September 25, 2020 6:02 PM, Tsunakawa-san wrote: > From: Jamison, Kirk/ジャミソン カーク > > [Results] > > Recovery/Failover performance (in seconds). 3 trial runs. > > > > | shared_buffers | master | patch | %reg| > > ||||-| > > | 128MB | 32.406 | 33.785 | 4.08% | > > | 1GB| 36.188 | 32.747 | -10.51% | > > | 2GB| 41.996 | 32.88 | -27.73% | > > Thanks for sharing good results. We want to know if we can get as > significant results as you gained before with hundreds of GBs of shared > buffers, don't we? Yes. But I don't have a high-spec machine I could use at the moment. I'll try if I can get one by next week. Or if someone would like to reproduce the test with their available higher spec machines, it'd would be much appreciated. The test case is upthread [1] > > I also did similar benchmark performance as what Tomas did [1], simple > > "pgbench -S" tests (warmup and then 15 x 1-minute runs with 1, 8 and > > 16 clients, but I'm not sure if my machine is reliable enough to > > produce reliable results for 8 clients and more. > > Let me confirm just in case. Your patch should not affect pgbench > performance, but you measured it. Is there anything you're concerned > about? > Not really. Because In the previous emails, the argument was the BufferAlloc overhead. But we don't have it in the latest patch. But just in case somebody asks about benchmark performance, I also posted the results. [1] https://www.postgresql.org/message-id/OSBPR01MB2341683DEDE0E7A8D045036FEF360%40OSBPR01MB2341.jpnprd01.prod.outlook.com Regards, Kirk Jamison
RE: Transactions involving multiple postgres foreign servers, take 2
From: Masahiko Sawada > I don't think it's always possible to avoid raising errors in advance. > Considering how postgres_fdw can implement your idea, I think > postgres_fdw would need PG_TRY() and PG_CATCH() for its connection > management. It has a connection cache in the local memory using HTAB. > It needs to create an entry for the first time to connect (e.g., when > prepare and commit prepared a transaction are performed by different > processes) and it needs to re-connect the foreign server when the > entry is invalidated. In both cases, ERROR could happen. I guess the > same is true for other FDW implementations. Possibly other FDWs might > need more work for example cleanup or releasing resources. I think Why does the client backend have to create a new connection cache entry during PREPARE or COMMIT PREPARE? Doesn't the client backend naturally continue to use connections that it has used in its current transaction? > that the pros of your idea are to make the transaction manager simple > since we don't need resolvers and launcher but the cons are to bring > the complexity to FDW implementation codes instead. Also, IMHO I don't > think it's safe way that FDW does neither re-throwing an error nor > abort transaction when an error occurs. No, I didn't say the resolver is unnecessary. The resolver takes care of terminating remote transactions when the client backend encountered an error during COMMIT/ROLLBACK PREPARED. > In terms of performance you're concerned, I wonder if we can somewhat > eliminate the bottleneck if multiple resolvers are able to run on one > database in the future. For example, if we could launch resolver > processes as many as connections on the database, individual backend > processes could have one resolver process. Since there would be > contention and inter-process communication it still brings some > overhead but it might be negligible comparing to network round trip. Do you mean that if concurrent 200 clients each update data on two foreign servers, there are 400 resolvers? ...That's overuse of resources. Regards Takayuki Tsunakawa
Re: Dynamic gathering the values for seq_page_cost/xxx_cost
On Tue, Sep 22, 2020 at 10:57 AM Andy Fan wrote: > > > My tools set the random_page_cost to 8.6, but based on the fio data, it > should be > set to 12.3 on the same hardware. and I do see the better plan as well with > 12.3. > Looks too smooth to believe it is true.. > > The attached result_fio_mytool.tar.gz is my test result. You can use git > show HEAD^^ > is the original plan with 8.6. git show HEAD^ show the plan changes after > we changed > the random_page_cost. git show HEAD shows the run time statistics changes for > these queries. > I also uploaded the test tool [1] for this for your double check. The scripts seem to start and stop the server, drop caches for every query. That's where you are seeing that setting random_page_cost to fio based ratio provides better plans. But in practice, these costs need to be set on a server where the queries are run concurrently and repeatedly. That's where the caching behaviour plays an important role. Can we write a tool which can recommend costs for that scenario? How do the fio based cost perform when the queries are run repeatedly? -- Best Wishes, Ashutosh Bapat
Re: Parallel INSERT (INTO ... SELECT ...)
On Tue, Sep 22, 2020 at 10:26 AM Greg Nancarrow wrote: > > For cases where it can't be allowed (e.g. INSERT into a table with > foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE > ...") it at least allows parallelism of the SELECT part. > Thanks Greg for the patch. I have few points (inspired from parallel copy feature work) to mention: 1. What if the target table is a foreign table or partitioned table? 2. What happens if the target table has triggers(before statement, after statement, before row, after row) that are parallel unsafe? 3. Will each worker be doing single row insertions or multi inserts? If single row insertions, will the buffer lock contentions be more? 5. How does it behave with toast columns values? 6. How does it behave if we have a RETURNING clause with INSERT INTO SELECT? I'm looking forward to seeing some initial numbers on execution times with and without patch. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
RE: [Patch] Optimize dropping of relation buffers using dlist
From: Jamison, Kirk/ジャミソン カーク > [Results] > Recovery/Failover performance (in seconds). 3 trial runs. > > | shared_buffers | master | patch | %reg| > ||||-| > | 128MB | 32.406 | 33.785 | 4.08% | > | 1GB| 36.188 | 32.747 | -10.51% | > | 2GB| 41.996 | 32.88 | -27.73% | Thanks for sharing good results. We want to know if we can get as significant results as you gained before with hundreds of GBs of shared buffers, don't we? > I also did similar benchmark performance as what Tomas did [1], simple > "pgbench -S" tests (warmup and then 15 x 1-minute runs with 1, 8 and 16 > clients, but I'm not sure if my machine is reliable enough to produce reliable > results for 8 clients and more. Let me confirm just in case. Your patch should not affect pgbench performance, but you measured it. Is there anything you're concerned about? Regards Takayuki Tsunakawa
enable pg_stat_statements to track rows processed by REFRESH MATERIALIZED VIEW
Hi, pg_stat_statements tracks the number of rows processed by some utility commands. But, currently, it does not track the number of rows processed by REFRESH MATERIALIZED VIEW. Attached patch enables pg_stat_statements to track processed rows by REFRESH MATERIALIZED VIEW. Regards, Katsuragi Yutadiff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index e0edb134f3..2a303a7f07 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -530,8 +530,8 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; -- -- Track the total number of rows retrieved or affected by the utility --- commands of COPY, FETCH, CREATE TABLE AS, CREATE MATERIALIZED VIEW --- and SELECT INTO +-- commands of COPY, FETCH, CREATE TABLE AS, CREATE MATERIALIZED VIEW, +-- REFRESH MATERIALIZED VIEW and SELECT INTO -- SELECT pg_stat_statements_reset(); pg_stat_statements_reset @@ -543,6 +543,7 @@ CREATE TABLE pgss_ctas AS SELECT a, 'ctas' b FROM generate_series(1, 10) a; SELECT generate_series(1, 10) c INTO pgss_select_into; COPY pgss_ctas (a, b) FROM STDIN; CREATE MATERIALIZED VIEW pgss_matv AS SELECT * FROM pgss_ctas; +REFRESH MATERIALIZED VIEW pgss_matv; BEGIN; DECLARE pgss_cursor CURSOR FOR SELECT * FROM pgss_matv; FETCH NEXT pgss_cursor; @@ -586,10 +587,11 @@ SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE FETCH FORWARD 5 pgss_cursor | 0 | 1 |5 FETCH FORWARD ALL pgss_cursor | 0 | 1 |7 FETCH NEXT pgss_cursor | 0 | 1 |1 + REFRESH MATERIALIZED VIEW pgss_matv | 0 | 1 | 13 SELECT generate_series(1, 10) c INTO pgss_select_into | 0 | 1 | 10 SELECT pg_stat_statements_reset() | 0 | 1 |1 SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 1 | 0 |0 -(12 rows) +(13 rows) -- -- Track user activity and reset them diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 1eac9edaee..1c2ac24cf6 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -1173,11 +1173,12 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, /* * Track the total number of rows retrieved or affected by * the utility statements of COPY, FETCH, CREATE TABLE AS, - * CREATE MATERIALIZED VIEW and SELECT INTO. + * CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW and SELECT INTO. */ rows = (qc && (qc->commandTag == CMDTAG_COPY || qc->commandTag == CMDTAG_FETCH || - qc->commandTag == CMDTAG_SELECT)) ? + qc->commandTag == CMDTAG_SELECT || + qc->commandTag == CMDTAG_REFRESH_MATERIALIZED_VIEW)) ? qc->nprocessed : 0; /* calc differences of buffer counters. */ diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql index 996a24a293..e9f5bb84e3 100644 --- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql +++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql @@ -252,8 +252,8 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; -- -- Track the total number of rows retrieved or affected by the utility --- commands of COPY, FETCH, CREATE TABLE AS, CREATE MATERIALIZED VIEW --- and SELECT INTO +-- commands of COPY, FETCH, CREATE TABLE AS, CREATE MATERIALIZED VIEW, +-- REFRESH MATERIALIZED VIEW and SELECT INTO -- SELECT pg_stat_statements_reset(); @@ -265,6 +265,7 @@ COPY pgss_ctas (a, b) FROM STDIN; 13 copy \. CREATE MATERIALIZED VIEW pgss_matv AS SELECT * FROM pgss_ctas; +REFRESH MATERIALIZED VIEW pgss_matv; BEGIN; DECLARE pgss_cursor CURSOR FOR SELECT * FROM pgss_matv; FETCH NEXT pgss_cursor; diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index f80a9e96a9..cdc2e62ad4 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -136,7 +136,7 @@ SetMatViewPopulatedState(Relation relation, bool newstate) */ ObjectAddress ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, - ParamListInfo params, QueryCompletion *qc) + ParamListInfo params, QueryCompletion *qc, uint64 *processed) { Oid matviewOid; Relation matviewRel; @@ -147,7 +147,6 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, Oid relowner; Oid OIDNewHeap; DestReceiver *dest; - uint64 processed = 0; bool concurrent; LOCKMODE lockmode; char
RE: [Patch] Optimize dropping of relation buffers using dlist
From: Amit Kapila > No, during recovery also we need to be careful. We need to ensure that > we use cached value during recovery and cached value is always > up-to-date. We can't rely on lseek and I have provided some scenario > up thread [1] where such behavior can cause problem and then see the > response from Tom Lane why the same can be true for recovery as well. > > The basic approach we are trying to pursue here is to rely on the > cached value of 'number of blocks' (as that always gives correct value > and even if there is a problem that will be our bug, we don't need to > rely on OS for correct value and it will be better w.r.t performance > as well). It is currently only possible during recovery so we are > using it in recovery path and later once Thomas's patch to cache it > for non-recovery cases is also done, we can use it for non-recovery > cases as well. Although I may be still confused, I understood that Kirk-san's patch should: * Still focus on speeding up the replay of TRUNCATE during recovery. * During recovery, DropRelFileNodeBuffers() gets the cached size of the relation fork. If it is cached, trust it and optimize the buffer invalidation. If it's not cached, we can't trust the return value of smgrnblocks() because it's the lseek(END) return value, so we avoid the optimization. * Then, add a new function, say, smgrnblocks_cached() that simply returns the cached block count, and DropRelFileNodeBuffers() uses it instead of smgrnblocks(). Regards Takayuki Tsunakawa
AppendStringInfoChar instead of appendStringInfoString
Hi In (/src/backend/replication/backup_manifest.c) I found the following code: appendStringInfoString(, "\n"); appendStringInfoString(, "\""); Since only one bit string is appended here, I think it will be better to call appendStringInfoChar. Best reagrds, houzj 0001-appendStringInfoChar-instead-of-appendStringInfoString.patch Description: 0001-appendStringInfoChar-instead-of-appendStringInfoString.patch
Re: Transactions involving multiple postgres foreign servers, take 2
On Thu, 24 Sep 2020 at 17:23, tsunakawa.ta...@fujitsu.com wrote: > > From: Masahiko Sawada > > So with your idea, I think we require FDW developers to not call > > ereport(ERROR) as much as possible. If they need to use a function > > including palloc, lappend etc that could call ereport(ERROR), they > > need to use PG_TRY() and PG_CATCH() and return the control along with > > the error message to the transaction manager rather than raising an > > error. Then the transaction manager will emit the error message at an > > error level lower than ERROR (e.g., WARNING), and call commit/rollback > > API again. But normally we do some cleanup on error but in this case > > the retrying commit/rollback is performed without any cleanup. Is that > > right? I’m not sure it’s safe though. > > > Yes. It's legitimate to require the FDW commit routine to return control, > because the prepare of 2PC is a promise to commit successfully. The > second-phase commit should avoid doing that could fail. For example, if some > memory is needed for commit, it should be allocated in prepare or before. > I don't think it's always possible to avoid raising errors in advance. Considering how postgres_fdw can implement your idea, I think postgres_fdw would need PG_TRY() and PG_CATCH() for its connection management. It has a connection cache in the local memory using HTAB. It needs to create an entry for the first time to connect (e.g., when prepare and commit prepared a transaction are performed by different processes) and it needs to re-connect the foreign server when the entry is invalidated. In both cases, ERROR could happen. I guess the same is true for other FDW implementations. Possibly other FDWs might need more work for example cleanup or releasing resources. I think that the pros of your idea are to make the transaction manager simple since we don't need resolvers and launcher but the cons are to bring the complexity to FDW implementation codes instead. Also, IMHO I don't think it's safe way that FDW does neither re-throwing an error nor abort transaction when an error occurs. In terms of performance you're concerned, I wonder if we can somewhat eliminate the bottleneck if multiple resolvers are able to run on one database in the future. For example, if we could launch resolver processes as many as connections on the database, individual backend processes could have one resolver process. Since there would be contention and inter-process communication it still brings some overhead but it might be negligible comparing to network round trip. Perhaps we can hear more opinions on that from other hackers to decide the FDW transaction API design. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Logical replication from PG v13 and below to PG v14 (devel version) is not working.
On Fri, Sep 25, 2020 at 8:12 AM Amit Kapila wrote: > > On Thu, Sep 24, 2020 at 6:33 PM Ashutosh Sharma wrote: > > > > Hi Amit, > > > > > Here, I think instead of using MySubscription->stream, we should use > > > server/walrecv version number as we used at one place in tablesync.c. > > > > Should subscribers be setting the LR protocol value based on what is > > the publisher version it is communicating with or should it be set > > based on whether streaming was enabled or not while creating that > > subscription? AFAIU if we set this value based on the publisher > > version (which is lets say >= 14), then it's quite possible that the > > subscriber will start streaming changes for the in-progress > > transactions even if the streaming was disabled while creating the > > subscription, won't it? > > > > No that won't happen because we send this option to the server > (publisher in this case) only when version is >=14 and user has > specified this option. See the below check in function > libpqrcv_startstreaming() > { > .. > if (options->proto.logical.streaming && > PQserverVersion(conn->streamConn) >= 14) > appendStringInfo(, ", streaming 'on'"); > .. > } Ok, I have modified as per your suggestion. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com v4-0001-Bugfix-in-logical-protocol-version.patch Description: Binary data
Re: Feature improvement for FETCH tab completion
2020-09-25 15:38 に Fujii Masao さんは書きました: On 2020/09/25 14:24, btnakamichin wrote: Hello! I’d like to improve the FETCH tab completion. The FETCH tab completion . Therefore, this patch fixes the problem. Previous function completed one of FORWARD, BACKWARD, RELATIVE, ABSOLUTE, but now it completes one of FORWARD, BACKWARD, RELATIVE, ABSOLUTE, ALL, NEXT, PRIOR, FIRST, LAST and Corresponded to later IN and FROM clauses. Thanks for the patch! Here are review comments. + /* Complete FETCH BACKWARD or FORWARD with one of ALL */ + else if (Matches("FETCH|MOVE", "BACKWARD|FORWARD")) + COMPLETE_WITH("ALL"); Not only "ALL" but also "FROM" and "IN" should be displayed here because they also can follow "BACKWARD" and "FORWARD"? else if (Matches("FETCH|MOVE", MatchAny, MatchAny)) + else if (Matches("FETCH|MOVE", "ABSOLUTE|BACKWARD|FORWARD|RELATIVE", MatchAny)) + COMPLETE_WITH("FROM", "IN"); This change seems to cause "FETCH FORWARD FROM " to display "FROM" and "IN". To avoid this confusing tab-completion, we should use something like MatchAnyExcept("FROM|IN") here, instead? Regards, I’m Sorry, I forgot to include pgsql_hackers in the cc, so I resend it Thank you, I appreciate your comment. I have attached patch with newline. Regards, NaokiNakamichidiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index d877cc86f0..dafbae0366 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -2911,7 +2911,8 @@ psql_completion(const char *text, int start, int end) /* DEALLOCATE */ else if (Matches("DEALLOCATE")) - COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements " UNION SELECT 'ALL'"); + COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements + " UNION SELECT 'ALL'"); /* DECLARE */ else if (Matches("DECLARE", MatchAny)) @@ -3076,19 +3077,27 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("SELECT", "INSERT", "DELETE", "UPDATE", "DECLARE"); /* FETCH && MOVE */ - /* Complete FETCH with one of FORWARD, BACKWARD, RELATIVE */ + /* + * Complete FETCH with one of FORWARD, BACKWARD, RELATIVE, ALL, NEXT, + * PRIOR, FIRST, LAST + */ else if (Matches("FETCH|MOVE")) - COMPLETE_WITH("ABSOLUTE", "BACKWARD", "FORWARD", "RELATIVE"); - /* Complete FETCH with one of ALL, NEXT, PRIOR */ - else if (Matches("FETCH|MOVE", MatchAny)) - COMPLETE_WITH("ALL", "NEXT", "PRIOR"); + COMPLETE_WITH("ABSOLUTE", "BACKWARD", "FORWARD", "RELATIVE", "ALL", "NEXT", + "PRIOR", "FIRST", "LAST"); + /* Complete FETCH BACKWARD or FORWARD with one of ALL, IN, FROM */ + else if (Matches("FETCH|MOVE", "BACKWARD|FORWARD")) + COMPLETE_WITH("ALL", "FROM", "IN"); /* - * Complete FETCH with "FROM" or "IN". These are equivalent, + * Complete FETCH with "FROM" or "IN". These are equivalent, * but we may as well tab-complete both: perhaps some users prefer one * variant or the other. */ - else if (Matches("FETCH|MOVE", MatchAny, MatchAny)) + else if (Matches("FETCH|MOVE", "ABSOLUTE|BACKWARD|FORWARD|RELATIVE", + MatchAnyExcept("FROM|IN"))) + COMPLETE_WITH("FROM", "IN"); + + else if (Matches("FETCH|MOVE", "ALL|NEXT|PRIOR|FIRST|LAST")) COMPLETE_WITH("FROM", "IN"); /* FOREIGN DATA WRAPPER */
RE: [Patch] Optimize dropping of relation buffers using dlist
Hi. > I'll send performance measurement results in the next email. Thanks a lot for > the reviews! Below are the performance measurement results. I was only able to use low-spec machine: CPU 4v, Memory 8GB, RHEL, xfs filesystem. [Failover/Recovery Test] 1. (Master) Create table (ex. 10,000 tables). Insert data to tables. 2. (M) DELETE FROM TABLE (ex. all rows of 10,000 tables) 3. (Standby) To test with failover, pause the WAL replay on standby server. (SELECT pg_wal_replay_pause();) 4. (M) psql -c "\timing on" (measures total execution of SQL queries) 5. (M) VACUUM (whole db) 6. (M) After vacuum finishes, stop primary server: pg_ctl stop -w -mi 7. (S) Resume wal replay and promote standby. Because it's difficult to measure recovery time I used the attached script (resume.sh) that prints timestamp before and after promotion. It basically does the following - "SELECT pg_wal_replay_resume();" is executed and the WAL application is resumed. - "pg_ctl promote" to promote standby. - The time difference of "select pg_is_in_recovery();" from "t" to "f" is measured. [Results] Recovery/Failover performance (in seconds). 3 trial runs. | shared_buffers | master | patch | %reg| ||||-| | 128MB | 32.406 | 33.785 | 4.08% | | 1GB| 36.188 | 32.747 | -10.51% | | 2GB| 41.996 | 32.88 | -27.73% | There's a bit of small regression with the default shared_buffers (128MB), but as for the recovery time when we have large NBuffers, it's now at least almost constant so there's boosted performance. IOW, we enter the optimization most of the time during recovery. I also did similar benchmark performance as what Tomas did [1], simple "pgbench -S" tests (warmup and then 15 x 1-minute runs with 1, 8 and 16 clients, but I'm not sure if my machine is reliable enough to produce reliable results for 8 clients and more. | # | master | patch | %reg | ||-|-|| | 1 client | 1676.937825 | 1707.018029 | -1.79% | | 8 clients | 7706.835401 | 7529.089044 | 2.31% | | 16 clients | 9823.65254 | 9991.184206 | -1.71% | If there's additional/necessary performance measurement, kindly advise me too. Thank you in advance. [1] https://www.postgresql.org/message-id/flat/20200806213334.3bzadeirly3mdtzl%40development#473168a61e229de40eaf36326232f86c Best regards, Kirk Jamison resume.sh Description: resume.sh run.sh Description: run.sh
[PATCH] Runtime control of CLOBBER_CACHE_ALWAYS
Hi all While working on extensions I've often wanted to enable cache clobbering for a targeted piece of code, without paying the price of running it for all backends during postgres startup and any initial setup tasks that are required. So here's a patch that, when CLOBBER_CACHE_ALWAYS or CLOBBER_CACHE_RECURSIVE are defined, adds a GUC named clobber_cache_depth . It defaults to 1 for CLOBBER_CACHE_ALWAYS or 3 for CLOBBER_CACHE_RECURSIVE to match the existing compiled-in behaviour. But with this change it's now possible to run Pg with clobber_cache_depth=0 then set it to 1 only for targeted tests. clobber_cache_depth is treated as an unknown GUC if Pg was not built with CLOBBER_CACHE_ALWAYS or CLOBBER_CACHE_RECURSIVE defined. - On a side note, to make things like this easier to use, I personally patch all pg_regress tests to include the following at the start of each sql input file: \set ECHO none -- Put per-test settings or overrides here \set ECHO queries then patch the expected files accordingly. That way it's easy for me to make per-test adjustments while still running the whole suite. It's not always practical to run just one targeted test with TESTS=foo. -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise From 372defde443d178fb4a9c8cf4092dea7debf72ea Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Tue, 22 Sep 2020 09:51:00 +0800 Subject: [PATCH v1] Add runtime control over CLOBBER_CACHE_ALWAYS When CLOBBER_CACHE_ALWAYS or CLOBBER_CACHE_RECURSIVE are defined, a new GUC "clobber_cache_depth" becomes available. This allows runtime control over the behaviour of cache clobber builds in order to allow more targeted testing of new or changed features using aggressive cache clobbering. clobber_cache_depth defaults to 1 if CLOBBER_CACHE_ALWAYS is defined, and to 3 if CLOBBER_CACHE_RECURSIVE is defined. That makes it match the previous hardcoded behaviour of these macros, ensuring buildfarm members won't get upset. While we're at it, expose the macros in pg_config_manual.h --- src/backend/utils/cache/inval.c | 51 +++-- src/backend/utils/misc/guc.c| 15 ++ src/include/pg_config_manual.h | 28 -- src/include/utils/inval.h | 8 ++ 4 files changed, 78 insertions(+), 24 deletions(-) diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c index 628d6f5d0c..620c9558ac 100644 --- a/src/backend/utils/cache/inval.c +++ b/src/backend/utils/cache/inval.c @@ -109,6 +109,7 @@ #include "storage/sinval.h" #include "storage/smgr.h" #include "utils/catcache.h" +#include "utils/guc.h" #include "utils/inval.h" #include "utils/memdebug.h" #include "utils/memutils.h" @@ -179,6 +180,9 @@ static SharedInvalidationMessage *SharedInvalidMessagesArray; static int numSharedInvalidMessagesArray; static int maxSharedInvalidMessagesArray; +#if CACHE_CLOBBER_ALWAYS +int clobber_cache_depth = 0; +#endif /* * Dynamically-registered callback functions. Current implementation @@ -689,35 +693,38 @@ AcceptInvalidationMessages(void) /* * Test code to force cache flushes anytime a flush could happen. * - * If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a + * CLOBBER_CACHE_ALWAYS (clobber_cache_depth = 1) provides a * fairly thorough test that the system contains no cache-flush hazards. * However, it also makes the system unbelievably slow --- the regression - * tests take about 100 times longer than normal. - * - * If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY. This - * slows things by at least a factor of 1, so I wouldn't suggest - * trying to run the entire regression tests that way. It's useful to try - * a few simple tests, to make sure that cache reload isn't subject to - * internal cache-flush hazards, but after you've done a few thousand - * recursive reloads it's unlikely you'll learn more. + * tests take about 100 times longer than normal. To mitigate the + * slowdown it can be turned on and off per-backend or globally using + * the clobber_cache_depth GUC to allow targeted testing of specific + * code paths. + * + * If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY + * (set clobber_cache_depth to > 1). This slows things by at least a + * factor of 1, so I wouldn't suggest trying to run the entire + * regression tests that way. It's useful to try a few simple tests, + * to make sure that cache reload isn't subject to internal cache-flush + * hazards, but after you've done a few thousand recursive reloads it's + * unlikely you'll learn more. + * + * You can use postgresql.conf or any other convenient means to disable + * clobbering by default then re-enable for targeted sections of tests, + * e.g you can edit a specific pg_regress test to SET + * clobber_cache_depth=1, then run the suite with: + * + * PGOPTIONS="-c clobber_cache_depth=0"
Re: Get memory contexts of an arbitrary backend process
Hi, Thanks for all your comments, I updated the patch. On Tue, Sep 1, 2020 at 12:03 AM Kasahara Tatsuhito wrote: - How about managing the status of signal send/receive and dump operations on a shared hash or others ? Sending and receiving signals, dumping memory information, and referencing dump information all work asynchronously. Therefore, it would be good to have management information to check the status of each process. A simple idea is that .. - send a signal to dump to a PID, it first record following information into the shared hash. pid (specified pid) loc (dump location, currently might be ASAP) recv (did the pid process receive a signal? first false) dumped (did the pid process dump a mem information? first false) - specified process receive the signal, update the status in the shared hash, then dumped at specified location. - specified process finish dump mem information, update the status in the shared hash. I added a shared hash table consisted of minimal members mainly for managing whether the file is dumped or not. Some members like 'loc' seem useful in the future, but I haven't added them since it's not essential at this point. On 2020-09-01 10:54, Andres Freund wrote: CREATE VIEW pg_backend_memory_contexts AS -SELECT * FROM pg_get_backend_memory_contexts(); +SELECT * FROM pg_get_backend_memory_contexts(-1); -1 is odd. Why not use NULL or even 0? Changed it from -1 to NULL. + rc = fwrite(_len, sizeof(int), 1, fpout); + rc = fwrite(name, name_len, 1, fpout); + rc = fwrite(, sizeof(int), 1, fpout); + rc = fwrite(clipped_ident, idlen, 1, fpout); + rc = fwrite(, sizeof(int), 1, fpout); + rc = fwrite(_len, sizeof(int), 1, fpout); + rc = fwrite(parent, parent_len, 1, fpout); + (void) rc; /* we'll check for error with ferror */ + + } This format is not descriptive. How about serializing to json or something? Or at least having field names? Added field names for each value. + while (true) + { + CHECK_FOR_INTERRUPTS(); + + pg_usleep(1L); + Need better signalling back/forth here. Added a shared hash table that has a flag for managing whether the file is dumped or not. I modified it to use this flag. + /* + * Open a temp file to dump the current memory context. + */ + fpout = AllocateFile(tmpfile, PG_BINARY_W); + if (fpout == NULL) + { + ereport(LOG, + (errcode_for_file_access(), + errmsg("could not write temporary memory context file \"%s\": %m", + tmpfile))); + return; + } Probably should be opened with O_CREAT | O_TRUNC? AllocateFile() calls fopen() and AFAIU fopen() with mode "w" corresponds to open() with "O_WRONLY | O_CREAT | O_TRUNC". Do you mean I should not use fopen() here? On 2020-09-24 13:01, Michael Paquier wrote: On Thu, Sep 10, 2020 at 09:11:21PM +0900, Kasahara Tatsuhito wrote: I think it's fine to have an interface to delete in an emergency, but I agree that users shouldn't be made aware of the existence or deletion of dump files, basically. Per the CF bot, the number of tests needs to be tweaked, because we test each entry filtered out with is_deeply(), meaning that the number of tests needs to be updated to reflect that if the filtered list is changed: t/010_pg_basebackup.pl ... 104/109 # Looks like you planned 109 tests but ran 110. t/010_pg_basebackup.pl ... Dubious, test returned 255 (wstat 65280, 0xff00) All 109 subtests passed Simple enough to fix. Incremented the number of tests. Any thoughts? Regards, -- Atsushi Torikoshi NTT DATA CORPORATION From 4d3ff254a634895e8c23c83bb63f519a14785f06 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 25 Sep 2020 11:34:38 +0900 Subject: [PATCH] Enabled pg_get_backend_memory_contexts() to collect arbitrary backend process's memory contexts. Previsouly, pg_get_backend_memory_contexts() could only get the local memory contexts. This patch enables to get memory contexts of the arbitrary process which PID is specified by the argument. --- src/backend/access/transam/xlog.c| 7 + src/backend/catalog/system_views.sql | 4 +- src/backend/replication/basebackup.c | 3 + src/backend/storage/ipc/ipci.c | 2 + src/backend/storage/ipc/procsignal.c | 4 + src/backend/tcop/postgres.c | 5 + src/backend/utils/adt/mcxtfuncs.c| 381 ++- src/backend/utils/init/globals.c | 1 + src/bin/initdb/initdb.c | 3 +- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 4 +- src/bin/pg_rewind/filemap.c | 3 + src/include/catalog/pg_proc.dat | 10 +-
Re: Handing off SLRU fsyncs to the checkpointer
On Fri, Sep 25, 2020 at 12:53 PM Thomas Munro wrote: > Here's a new version. The final thing I'm contemplating before > pushing this is whether there may be hidden magical dependencies in > the order of operations in CheckPointGuts(), which I've changed > around. Andres, any comments? I nagged Andres off-list and he opined that it might be better to reorder it a bit so that ProcessSyncRequests() comes after almost everything else, so that if we ever teach more things to offload their fsync work it'll be in the right order. I reordered it like that; now only CheckPointTwoPhase() comes later, based on the comment that accompanies it. In any case, we can always reconsider the ordering of this function in later commits as required. Pushed like that.
Re: history file on replica and double switchover
On 2020/09/25 13:00, David Zhang wrote: On 2020-09-24 5:00 p.m., Fujii Masao wrote: On 2020/09/25 8:15, David Zhang wrote: Hi, My understanding is that the "archiver" won't even start if "archive_mode = on" has been set on a "replica". Therefore, either (XLogArchiveMode == ARCHIVE_MODE_ALWAYS) or (XLogArchiveMode != ARCHIVE_MODE_OFF) will produce the same result. Yes, the archiver isn't invoked in the standby when archive_mode=on. But, with the original patch, walreceiver creates .ready archive status file even when archive_mode=on and no archiver is running. This causes the history file to be archived after the standby is promoted and the archiver is invoked. With my patch, walreceiver creates .ready archive status for the history file only when archive_mode=always, like it does for the streamed WAL files. This is the difference between those two patches. To prevent the streamed timeline history file from being archived, IMO we should use the condition archive_mode==always in the walreceiver. +1 Please see how the "archiver" is started in src/backend/postmaster/postmaster.c 5227 /* 5228 * Start the archiver if we're responsible for (re-)archiving received 5229 * files. 5230 */ 5231 Assert(PgArchPID == 0); 5232 if (XLogArchivingAlways()) 5233 PgArchPID = pgarch_start(); I did run the nice script "double_switchover.sh" using either "always" or "on" on patch v1 and v2. They all generate the same results below. In other words, whether history file (0003.history) is archived or not depends on "archive_mode" settings. echo "archive_mode = always" >> ${PGDATA_NODE2}/postgresql.auto.conf $ ls -l archive -rw--- 1 david david 16777216 Sep 24 14:40 00010002 ... ... -rw--- 1 david david 16777216 Sep 24 14:40 0002000A -rw--- 1 david david 41 Sep 24 14:40 0002.history -rw--- 1 david david 83 Sep 24 14:40 0003.history echo "archive_mode = on" >> ${PGDATA_NODE2}/postgresql.auto.conf $ ls -l archive -rw--- 1 david david 16777216 Sep 24 14:47 00010002 ... ... -rw--- 1 david david 16777216 Sep 24 14:47 0002000A -rw--- 1 david david 41 Sep 24 14:47 0002.history Personally, I prefer patch v2 since it allies to the statement here, https://www.postgresql.org/docs/13/warm-standby.html#CONTINUOUS-ARCHIVING-IN-STANDBY "If archive_mode is set to on, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL it did not generate itself." By the way, I think the last part of the sentence should be changed to something like below: "but will not archive any WAL, which was not generated by itself." I'm not sure if this change is an improvement or not. But if we apply the patch I proposed, maybe we should mention also history file here. For example, "but will not archive any WAL or timeline history files that it did not generate itself". make sense for me So I included this change in the patch. Patch attached. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index beb309e668..42f01c515f 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1395,7 +1395,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)' If archive_mode is set to on, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but - will not archive any WAL it did not generate itself. To get a complete + will not archive any WAL or timeline history files that + it did not generate itself. To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby. This is inherently true with file-based log shipping, as the standby can only restore files that diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 17f1a49f87..32693c56db 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -758,6 +758,10 @@ WalRcvFetchTimeLineHistoryFiles(TimeLineID first, TimeLineID last) */ writeTimeLineHistoryFile(tli, content, len); + /* Mark history file as ready for archiving */ + if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS) + XLogArchiveNotify(fname); + pfree(fname); pfree(content); }
Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2
On Fri, Sep 25, 2020 at 01:36:44AM -0400, Tom Lane wrote: > Peter Eisentraut writes: >> However, again, the SCRAM >> implementation would already appear to fail that requirement because it >> uses a custom HMAC implementation, and HMAC is listed in FIPS 140-2 as a >> covered algorithm. > > Ugh. But is there any available FIPS-approved library code that could be > used instead? That's a good point, and I think that this falls down to use OpenSSL's HMAC_* interface for this job when building with OpenSSL: https://www.openssl.org/docs/man1.1.1/man3/HMAC.html Worth noting that these have been deprecated in 3.0.0 as per the rather-recent commit dbde472, where they recommend the use of EVP_MAC_*() instead. -- Michael signature.asc Description: PGP signature
Re: [patch] Fix checksum verification in base backups for zero page headers
Hi, Am Donnerstag, den 24.09.2020, 17:24 +0300 schrieb Anastasia Lubennikova: > So I mark this one as ReadyForCommitter. Thanks! > The only minor problem is a typo (?) in the proposed commit message. > "If a page is all zero, consider that a checksum failure." It should be > "If a page is NOT all zero...". Oh right, I've fixed up the commit message in the attached V4. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz From af83f4a42403e8a994e101086affafa86d67b52a Mon Sep 17 00:00:00 2001 From: Michael Banck Date: Tue, 22 Sep 2020 16:09:36 +0200 Subject: [PATCH] Fix checksum verification in base backups for zero page headers We currently do not checksum a page if it is considered new by PageIsNew(). However, this means that if the whole page header is zero, PageIsNew() will consider that page new due to the pd_upper field being zero and no subsequent checksum verification is done. To fix, factor out the all-zeroes check from PageIsVerified() into a new method PageIsZero() and call that for pages considered new by PageIsNew(). If a page is not totally empty, consider that a checksum failure. Add one test to the pg_basebackup TAP tests to check this error. Reported-By: Andres Freund Reviewed-By: Asif Rehman, Anastasia Lubennikova Discussion: https://postgr.es/m/20190319200050.ncuxejradurja...@alap3.anarazel.de --- src/backend/replication/basebackup.c | 30 + src/backend/storage/page/bufpage.c | 35 +++- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 18 +- src/include/storage/bufpage.h| 11 +++--- 4 files changed, 73 insertions(+), 21 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index b89df01fa7..c82765a429 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1746,6 +1746,36 @@ sendFile(const char *readfilename, const char *tarfilename, "be reported", readfilename))); } } +else +{ + /* + * We skip completely new pages after checking they are + * all-zero, since they don't have a checksum yet. + */ + if (PageIsNew(page) && !PageIsZero(page)) + { + /* + * pd_upper is zero, but the page is not all zero. We + * cannot run pg_checksum_page() on the page as it + * would throw an assertion failure. Consider this a + * checksum failure. + */ + checksum_failures++; + + if (checksum_failures <= 5) + ereport(WARNING, + (errmsg("checksum verification failed in " + "file \"%s\", block %d: pd_upper " + "is zero but page is not all-zero", + readfilename, blkno))); + if (checksum_failures == 5) + ereport(WARNING, + (errmsg("further checksum verification " + "failures in file \"%s\" will not " + "be reported", readfilename))); + } +} + block_retry = false; blkno++; } diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index 4bc2bf955d..8be3cd6190 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -82,11 +82,8 @@ bool PageIsVerified(Page page, BlockNumber blkno) { PageHeader p = (PageHeader) page; - size_t *pagebytes; - int i; bool checksum_failure = false; bool header_sane = false; - bool all_zeroes = false; uint16 checksum = 0; /* @@ -120,18 +117,7 @@ PageIsVerified(Page page, BlockNumber blkno) } /* Check all-zeroes case */ - all_zeroes = true; - pagebytes = (size_t *) page; - for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++) - { - if (pagebytes[i] != 0) - { - all_zeroes = false; - break; - } - } - - if (all_zeroes) + if (PageIsZero(page)) return true; /* @@ -154,6 +140,25 @@ PageIsVerified(Page page, BlockNumber blkno) return false; } +/* + * PageIsZero + * Check that the page consists only of zero bytes. + * + */ +bool +PageIsZero(Page page) +{ + int i; + size_t *pagebytes = (size_t *) page; + + for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++) + { + if (pagebytes[i] != 0) + return false; + } + + return true; +} /* * PageAddItemExtended diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index f674a7c94e..f5735569c5 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -6,7 +6,7 @@ use File::Basename qw(basename dirname); use File::Path qw(rmtree); use PostgresNode; use TestLib; -use Test::More tests => 109; +use
Re: [patch] Concurrent table reindex per-index progress reporting
On Thu, Sep 24, 2020 at 09:19:18PM +0200, Matthias van de Meent wrote: > While working on a PG12-instance I noticed that the progress reporting of > concurrent index creation for non-index relations fails to update the > index/relation OIDs that it's currently working on in the > pg_stat_progress_create_index view after creating the indexes. Please find > attached a patch which properly sets these values at the appropriate places. > > Any thoughts? I agree that this is a bug and that we had better report correctly the heap and index IDs worked on, as these could also be part of a toast relation from the parent table reindexed. However, your patch is visibly incorrect in the two places you are updating. > + * Configure progress reporting to report for this index. > + * While we're at it, reset the phase as it is cleared by > start_command. > + */ > + pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, > heapId); > + pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID, > newIndexId); > + pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE, > + > PROGRESS_CREATEIDX_PHASE_WAIT_1); First, updating PROGRESS_CREATEIDX_PHASE here to WAIT_1 makes no sense, because this is not a wait phase, and index_build() called within index_concurrently_build() will set this state correctly a bit after so IMO it is fine to leave that unset here, and the build phase is by far the bulk of the operation that needs tracking. I think that you are also missing to update PROGRESS_CREATEIDX_COMMAND to PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY, and PROGRESS_CREATEIDX_ACCESS_METHOD_OID to the correct index AM OID, similarly to reindex_index(). > + /* > + * Configure progress reporting to report for this index. > + * While we're at it, reset the phase as it is cleared by > start_command. > + */ > + pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, > heapId); > + pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID, > newIndexId); > + pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE, > + > PROGRESS_CREATEIDX_PHASE_WAIT_2); > + > validate_index(heapId, newIndexId, snapshot); Basically the same three comments here: PROGRESS_CREATEIDX_PHASE set to WAIT_2 makes no real sense, and validate_index() would update the phase as it should be. This should set PROGRESS_CREATEIDX_COMMAND to PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY, and PROGRESS_CREATEIDX_ACCESS_METHOD_OID to the correct index AM OID. While reviewing this code, I also noticed that the wait state set just before dropping the indexes was wrong. The code was using WAIT_4, but this has to be WAIT_5. And this gives me the attached. This also means that we still set the table and relation OID to the last index worked on for WAIT_2, WAIT_4 and WAIT_5, but we cannot set the command start to relationOid either as given in input of ReindexRelationConcurrently() as this could be a single index given for REINDEX INDEX CONCURRENTLY. Matthias, does that look right to you? -- Michael diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index f1b5f87e6a..d43051aea9 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -3409,6 +3409,7 @@ ReindexRelationConcurrently(Oid relationOid, int options) Oid oldIndexId = lfirst_oid(lc); Oid newIndexId = lfirst_oid(lc2); Oid heapId; + Oid indexam; /* Start new transaction for this index's concurrent build */ StartTransactionCommand(); @@ -3429,8 +3430,21 @@ ReindexRelationConcurrently(Oid relationOid, int options) */ indexRel = index_open(oldIndexId, ShareUpdateExclusiveLock); heapId = indexRel->rd_index->indrelid; + /* The access method of the old and new indexes match */ + indexam = indexRel->rd_rel->relam; index_close(indexRel, NoLock); + /* + * Update progress for the index to build, with the correct parent + * table involved. + */ + pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, heapId); + pgstat_progress_update_param(PROGRESS_CREATEIDX_COMMAND, + PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY); + pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID, newIndexId); + pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID, + indexam); + /* Perform concurrent build of new index */ index_concurrently_build(heapId, newIndexId); @@ -3458,6 +3472,8 @@ ReindexRelationConcurrently(Oid relationOid, int options) Oid heapId; TransactionId limitXmin; Snapshot snapshot; + Relation newIndexRel; + Oid indexam; StartTransactionCommand(); @@ -3468,8 +3484,6 @@ ReindexRelationConcurrently(Oid relationOid,
Re: Feature improvement for FETCH tab completion
On 2020/09/25 14:24, btnakamichin wrote: Hello! I’d like to improve the FETCH tab completion. The FETCH tab completion . Therefore, this patch fixes the problem. Previous function completed one of FORWARD, BACKWARD, RELATIVE, ABSOLUTE, but now it completes one of FORWARD, BACKWARD, RELATIVE, ABSOLUTE, ALL, NEXT, PRIOR, FIRST, LAST and Corresponded to later IN and FROM clauses. Thanks for the patch! Here are review comments. + /* Complete FETCH BACKWARD or FORWARD with one of ALL */ + else if (Matches("FETCH|MOVE", "BACKWARD|FORWARD")) + COMPLETE_WITH("ALL"); Not only "ALL" but also "FROM" and "IN" should be displayed here because they also can follow "BACKWARD" and "FORWARD"? else if (Matches("FETCH|MOVE", MatchAny, MatchAny)) + else if (Matches("FETCH|MOVE", "ABSOLUTE|BACKWARD|FORWARD|RELATIVE", MatchAny)) + COMPLETE_WITH("FROM", "IN"); This change seems to cause "FETCH FORWARD FROM " to display "FROM" and "IN". To avoid this confusing tab-completion, we should use something like MatchAnyExcept("FROM|IN") here, instead? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Load TIME fields - proposed performance improvement
The patch has been re-implemented based on previous advice. Please see attached. ~ Test: A test table was created and 20 million rows inserted as follows: test=# create table t1 (id int, a timestamp, b time without time zone default '01:02:03', c date default CURRENT_DATE, d time with time zone default CURRENT_TIME, e time with time zone default LOCALTIME); CREATE TABLE $ time psql -d test -c "insert into t1(id, a) values(generate_series(1,2000), timestamp 'now');" ~ Observations: BEFORE PATCH perf results 6.18% GetSQLCurrentTime 5.73% GetSQLCurrentDate 5.20% GetSQLLocalTime 4.67% GetCurrentDateTime -.--% GetCurrentTimeUsec elapsed time Run1 1m57s Run2 1m58s Run3 2m00s AFTER PATCH perf results 1.77% GetSQLCurrentTime 0.12% GetSQLCurrentDate 0.50% GetSQLLocalTime 0.36% GetCurrentDateTime -.--% GetCurrentTimeUsec elapsed time Run1 1m36s Run2 1m36s Run3 1m36s (represents 19% improvement for this worst case table/data) ~ Note: I patched the function GetCurrentTimeUsec consistently with the others, but actually I was not able to discover any SQL syntax which could cause that function to be invoked multiple times. Perhaps the patch for that function should be removed? --- Kind Regards, Peter Smith Fujitsu Australia On Tue, Sep 22, 2020 at 1:06 PM Peter Smith wrote: > > Hi Tom. > > Thanks for your feedback. > > On Tue, Sep 22, 2020 at 12:44 PM Tom Lane wrote: > > > Still, for the size of the patch I'm envisioning, it'd be well > > worth the trouble. > > The OP patch I gave was just a POC to test the effect and to see if > the idea was judged as worthwhile... > > I will rewrite/fix it based on your suggestions. > > Kind Regards, > Peter Smith. > Fujitsu Australia. PS_cache_pg_tm-v01.patch Description: Binary data
Problem of ko.po on branch REL9_5_STABLE
Hi, When I checked the encoding of the Po files, I noticed that src/bin/pg_config/po/ko.po on REL9_5_STABLE branch seemed a little different. The ‘Content-Type’ of this file and file’s encoding are different from those under other modules, as follows: src/bin/pg_config/po/ko.po: "Content-Type: text/plain; charset=euc-kr\n" src/bin/initdb/po/ko.po: "Content-Type: text/plain; charset=UTF-8\n" These even different from the other branches, as follows: REL9_5_STABLE src/bin/pg_config/po/ko.po: "Content-Type: text/plain; charset=euc-kr\n" REL9_6_STABLE src/bin/pg_config/po/ko.po "Content-Type: text/plain; charset=UTF-8\n" Is there any particular reason for doing this? Or are there any challenges for compatible problems? Thanks in advance. -- Yang Rong Development Department II Software Division II Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) ADDR.: No.6 Wenzhu Road, Software Avenue, Nanjing, 210012, China TEL : +86+25-86630566-9431 COINS: 7998-9431 FAX : +86+25-83317685 MAIL: yangr.f...@cn.fujitsu.com --
Re: Feature improvement of tab completion for DEALLOCATE
2020-09-25 14:30 に Fujii Masao さんは書きました: On 2020/09/25 13:45, btnakamichin wrote: Hello! I’d like to improve the deallocate tab completion. The deallocate tab completion can’t complement “ALL”. Therefore, this patch fixes the problem. Thanks for the patch! It looks basically good, but I think it's better to add newline just before UNION to avoid long line, as follows. - COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements); + COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements + " UNION SELECT 'ALL'"); Regards, Thank you, I appreciate your comment. I have attached pattch with newline. Regards, NaokiNskamichidiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index d877cc86f0..75f81d66dc 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -2911,7 +2911,8 @@ psql_completion(const char *text, int start, int end) /* DEALLOCATE */ else if (Matches("DEALLOCATE")) - COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements " UNION SELECT 'ALL'"); + COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements + " UNION SELECT 'ALL'"); /* DECLARE */ else if (Matches("DECLARE", MatchAny))
Re: Logical replication from PG v13 and below to PG v14 (devel version) is not working.
On Fri, Sep 25, 2020 at 8:12 AM Amit Kapila wrote: > > On Thu, Sep 24, 2020 at 6:33 PM Ashutosh Sharma wrote: > > > > Hi Amit, > > > > > Here, I think instead of using MySubscription->stream, we should use > > > server/walrecv version number as we used at one place in tablesync.c. > > > > Should subscribers be setting the LR protocol value based on what is > > the publisher version it is communicating with or should it be set > > based on whether streaming was enabled or not while creating that > > subscription? AFAIU if we set this value based on the publisher > > version (which is lets say >= 14), then it's quite possible that the > > subscriber will start streaming changes for the in-progress > > transactions even if the streaming was disabled while creating the > > subscription, won't it? > > > > No that won't happen because we send this option to the server > (publisher in this case) only when version is >=14 and user has > specified this option. See the below check in function > libpqrcv_startstreaming() > { > .. > if (options->proto.logical.streaming && > PQserverVersion(conn->streamConn) >= 14) > appendStringInfo(, ", streaming 'on'"); > .. > } > Ah, okay, understood, thanks. So, that means we can use the streaming protocol version if the server (publisher) supports it, regardless of whether the client (subscriber) has the streaming option enabled or not. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com