Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On 2020/01/28 0:24, Fujii Masao wrote: On 2020/01/15 19:18, Paul Guo wrote: I further fixed the last test failure (due to a small bug in the test, not in code). Attached are the new patch series. Let's see the CI pipeline result. Thanks for updating the patches! I started reading the 0003 patch. I marked this patch as Waiting on Author in CF because there is no update since my last review comments. Could you mark it as Needs Review again if you post the updated version of the patch. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: Missing errcode() in ereport
Thomas Munro writes: > I think this caused anole to say: > "reloptions.c", line 1362: error #2042: operand types are incompatible > ("void" and "int") > errdetail_internal("%s", _(optenum->detailmsg)) : 0)); Yeah, I was just looking at that :-( We could revert the change to have these functions return void, or we could run around and change the places with this usage pattern to use "(void) 0" instead of just "0". The latter would be somewhat painful if only minority compilers warn, though. Also, I don't think that having to change ereport usage was part of the agreed-to plan here ... so I'm leaning to the former. regards, tom lane
Re: Missing errcode() in ereport
On Wed, Mar 25, 2020 at 9:30 AM Tom Lane wrote: > Andres Freund writes: > > On 2020-03-23 17:24:49 -0400, Tom Lane wrote: > >> Hearing no objections, I started to review Andres' patchset with > >> that plan in mind. > > > Thanks for pushing the first part! > > I pushed all of it, actually. I think this caused anole to say: "reloptions.c", line 1362: error #2042: operand types are incompatible ("void" and "int") errdetail_internal("%s", _(optenum->detailmsg)) : 0));
Re: replay pause vs. standby promotion
On 2020/03/25 0:17, Sergei Kornilov wrote: Hello I pushed the latest version of the patch. If you have further opinion about immediate promotion, let's keep discussing that! Thank you! Honestly, I forgot that the promotion is documented in high-availability.sgml as: Before failover, any WAL immediately available in the archive or in pg_wal will be restored, but no attempt is made to connect to the master. I mistakenly thought that promote should be "immediately"... If a promotion is triggered while recovery is paused, the paused state ends and a promotion continues. Could we add a few words in func.sgml to clarify the behavior? Especially for users from my example above. Something like: If a promotion is triggered while recovery is paused, the paused state ends, replay of any WAL immediately available in the archive or in pg_wal will be continued and then a promotion will be completed. This description is true if pause is requested by pg_wal_replay_pause(), but not if recovery target is reached and pause is requested by recovery_target_action=pause. In the latter case, even if there are WAL data avaiable in pg_wal or archive, they are not replayed, i.e., the promotion completes immediately. Probably we should document those two cases explicitly to avoid the confusion about a promotion and recovery pause? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: error context for vacuum to include block number
On Wed, 25 Mar 2020 at 14:08, Justin Pryzby wrote: > > On Wed, Mar 25, 2020 at 10:22:21AM +0530, Amit Kapila wrote: > > On Wed, Mar 25, 2020 at 10:05 AM Masahiko Sawada > > wrote: > > > > > > On Wed, 25 Mar 2020 at 12:44, Amit Kapila wrote: > > > > > > > > On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada > > > > wrote: > > > > > > > > > > > > > > > I got the point. But if we set the error context before that, I think > > > > > we need to change the error context message. The error context message > > > > > of heap truncation phase is "while truncating relation \"%s.%s\" to %u > > > > > blocks", but cbarg->blkno will be the number of blocks of the current > > > > > relation. > > > > > > > > > >case VACUUM_ERRCB_PHASE_TRUNCATE: > > > > >if (BlockNumberIsValid(cbarg->blkno)) > > > > >errcontext("while truncating relation \"%s.%s\" to %u > > > > > blocks", > > > > > cbarg->relnamespace, cbarg->relname, > > > > > cbarg->blkno); > > > > >break; > > > > > > > > > > > > > Do you mean to say that actually we are just prefetching or reading > > > > the pages in count_nondeletable_pages() but the message doesn't have > > > > any such indication? If not that, what problem do you see with the > > > > message? What is your suggestion? > > > > > > I meant that with the patch, suppose that the table has 100 blocks and > > > we're truncating it to 50 blocks in RelationTruncate(), the error > > > context message will be "while truncating relation "aaa.bbb" to 100 > > > blocks", which is not correct. I think it should be "while truncating > > > relation "aaa.bbb" to 50 blocks". We can know the relation can be > > > truncated to 50 blocks by the result of count_nondeletable_pages(). So > > > if we update the arguments before it we will use the number of blocks > > > of relation before truncation. > > > > > > > Won't the latest patch by Justin will fix this as he has updated the > > block count after count_nondeletable_pages? Apart from that, I feel > > The issue is if the error happens *during* count_nondeletable_pages(). > We don't want it to say "truncating relation to 100 blocks". Right. > > > the first call to update_vacuum_error_cbarg in lazy_truncate_heap > > should have input parameter as vacrelstats->nonempty_pages instead of > > new_rel_pages to indicate the remaining pages after truncation? > > Yea, I think that addresses the issue. +1 Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: error context for vacuum to include block number
On Wed, Mar 25, 2020 at 10:22:21AM +0530, Amit Kapila wrote: > On Wed, Mar 25, 2020 at 10:05 AM Masahiko Sawada > wrote: > > > > On Wed, 25 Mar 2020 at 12:44, Amit Kapila wrote: > > > > > > On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada > > > wrote: > > > > > > > > > > > > I got the point. But if we set the error context before that, I think > > > > we need to change the error context message. The error context message > > > > of heap truncation phase is "while truncating relation \"%s.%s\" to %u > > > > blocks", but cbarg->blkno will be the number of blocks of the current > > > > relation. > > > > > > > >case VACUUM_ERRCB_PHASE_TRUNCATE: > > > >if (BlockNumberIsValid(cbarg->blkno)) > > > >errcontext("while truncating relation \"%s.%s\" to %u > > > > blocks", > > > > cbarg->relnamespace, cbarg->relname, > > > > cbarg->blkno); > > > >break; > > > > > > > > > > Do you mean to say that actually we are just prefetching or reading > > > the pages in count_nondeletable_pages() but the message doesn't have > > > any such indication? If not that, what problem do you see with the > > > message? What is your suggestion? > > > > I meant that with the patch, suppose that the table has 100 blocks and > > we're truncating it to 50 blocks in RelationTruncate(), the error > > context message will be "while truncating relation "aaa.bbb" to 100 > > blocks", which is not correct. I think it should be "while truncating > > relation "aaa.bbb" to 50 blocks". We can know the relation can be > > truncated to 50 blocks by the result of count_nondeletable_pages(). So > > if we update the arguments before it we will use the number of blocks > > of relation before truncation. > > > > Won't the latest patch by Justin will fix this as he has updated the > block count after count_nondeletable_pages? Apart from that, I feel The issue is if the error happens *during* count_nondeletable_pages(). We don't want it to say "truncating relation to 100 blocks". > the first call to update_vacuum_error_cbarg in lazy_truncate_heap > should have input parameter as vacrelstats->nonempty_pages instead of > new_rel_pages to indicate the remaining pages after truncation? Yea, I think that addresses the issue. -- Justin
Re: bad logging around broken restore_command
On 2020/03/10 11:47, Kyotaro Horiguchi wrote: At Thu, 6 Feb 2020 23:23:42 +0900, Fujii Masao wrote in On 2020/02/06 1:10, Jeff Janes wrote: If the restore command claims to have succeeded, but fails to have created a file with the right name (due to typos or escaping or quoting issues, for example), no complaint is issued at the time, and it then fails later with a relatively uninformative error message like "could not locate required checkpoint record". ... I don't see why ENOENT is thought to deserve the silent treatment. It seems weird that success gets logged ("restored log file \"%s\" from archive"), but one particular type of unexpected failure does not. Agreed. In the first place it is not perfectly silent and that problem cannot happen. In the ENOENT case, the function reports "could not restore file \"%s\" from archive: %s", but with DEBUG2 then returns false, and the callers treat the failure properly. Yes. I've attached a patch which emits a LOG message for ENOENT. Isn't it better to use "could not stat file" message even in ENOENT case? If yes, something like message that you used in the patch should be logged as DETAIL or HINT message. So, what about the attached patch? If you want to see some log messages in the case, it is sufficient to raise the loglevel of the existing message from DEBUG2 to LOG. Isn't it too noisy to log every time when we could not restore the archived file? In archive recovery case, it's common to fail to restore archive files and try to replay WAL files in pg_wal. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: error context for vacuum to include block number
On Wed, Mar 25, 2020 at 10:05 AM Masahiko Sawada wrote: > > On Wed, 25 Mar 2020 at 12:44, Amit Kapila wrote: > > > > On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada > > wrote: > > > > > > > > > I got the point. But if we set the error context before that, I think > > > we need to change the error context message. The error context message > > > of heap truncation phase is "while truncating relation \"%s.%s\" to %u > > > blocks", but cbarg->blkno will be the number of blocks of the current > > > relation. > > > > > >case VACUUM_ERRCB_PHASE_TRUNCATE: > > >if (BlockNumberIsValid(cbarg->blkno)) > > >errcontext("while truncating relation \"%s.%s\" to %u > > > blocks", > > > cbarg->relnamespace, cbarg->relname, > > > cbarg->blkno); > > >break; > > > > > > > Do you mean to say that actually we are just prefetching or reading > > the pages in count_nondeletable_pages() but the message doesn't have > > any such indication? If not that, what problem do you see with the > > message? What is your suggestion? > > I meant that with the patch, suppose that the table has 100 blocks and > we're truncating it to 50 blocks in RelationTruncate(), the error > context message will be "while truncating relation "aaa.bbb" to 100 > blocks", which is not correct. I think it should be "while truncating > relation "aaa.bbb" to 50 blocks". We can know the relation can be > truncated to 50 blocks by the result of count_nondeletable_pages(). So > if we update the arguments before it we will use the number of blocks > of relation before truncation. > Won't the latest patch by Justin will fix this as he has updated the block count after count_nondeletable_pages? Apart from that, I feel the first call to update_vacuum_error_cbarg in lazy_truncate_heap should have input parameter as vacrelstats->nonempty_pages instead of new_rel_pages to indicate the remaining pages after truncation? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: error context for vacuum to include block number
On Wed, Mar 25, 2020 at 01:34:43PM +0900, Masahiko Sawada wrote: > I meant that with the patch, suppose that the table has 100 blocks and > we're truncating it to 50 blocks in RelationTruncate(), the error > context message will be "while truncating relation "aaa.bbb" to 100 > blocks", which is not correct. > I think it should be "while truncating > relation "aaa.bbb" to 50 blocks". We can know the relation can be > truncated to 50 blocks by the result of count_nondeletable_pages(). So > if we update the arguments before it we will use the number of blocks > of relation before truncation. Hm, yea, at that point it's: |new_rel_pages = RelationGetNumberOfBlocks(onerel); ..so we can do better. > My suggestion is either that we change the error message to, for > example, "while truncating relation "aaa.bbb" having 100 blocks", or > that we change the patch so that we can use "50 blocks" in the error > context message. We could do: update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_TRUNCATE, InvalidBlockNumber, NULL, false); new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); vacrelstats->blkno = new_rel_pages; ... case VACUUM_ERRCB_PHASE_TRUNCATE: if (BlockNumberIsValid(cbarg->blkno)) errcontext("while truncating relation \"%s.%s\" to %u blocks", cbarg->relnamespace, cbarg->relname, cbarg->blkno); else /* Error happened before/during count_nondeletable_pages() */ errcontext("while truncating relation \"%s.%s\"", cbarg->relnamespace, cbarg->relname); break; -- Justin
Re: error context for vacuum to include block number
On Wed, 25 Mar 2020 at 12:44, Amit Kapila wrote: > > On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada > wrote: > > > > On Tue, 24 Mar 2020 at 22:37, Amit Kapila wrote: > > > > > > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada > > > wrote: > > > > > > > > > > > > I've read through the latest version patch (v31), here are my comments: > > > > > > > > 1. > > > > +/* Update error traceback information */ > > > > +olderrcbarg = *vacrelstats; > > > > +update_vacuum_error_cbarg(vacrelstats, > > > > + VACUUM_ERRCB_PHASE_TRUNCATE, > > > > new_rel_pages, NULL, > > > > + false); > > > > + > > > > /* > > > > * Scan backwards from the end to verify that the end pages > > > > actually > > > > * contain no tuples. This is *necessary*, not optional, > > > > because > > > > * other backends could have added tuples to these pages > > > > whilst we > > > > * were vacuuming. > > > > */ > > > > new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); > > > > > > > > We need to set the error context after setting new_rel_pages. > > > > > > > > > > We want to cover the errors raised in count_nondeletable_pages(). In > > > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which > > > use to cover those errors, but that was not good as we were > > > setting/resetting it multiple times and it was not clear such a > > > separate phase would add any value. > > > > I got the point. But if we set the error context before that, I think > > we need to change the error context message. The error context message > > of heap truncation phase is "while truncating relation \"%s.%s\" to %u > > blocks", but cbarg->blkno will be the number of blocks of the current > > relation. > > > >case VACUUM_ERRCB_PHASE_TRUNCATE: > >if (BlockNumberIsValid(cbarg->blkno)) > >errcontext("while truncating relation \"%s.%s\" to %u > > blocks", > > cbarg->relnamespace, cbarg->relname, > > cbarg->blkno); > >break; > > > > Do you mean to say that actually we are just prefetching or reading > the pages in count_nondeletable_pages() but the message doesn't have > any such indication? If not that, what problem do you see with the > message? What is your suggestion? I meant that with the patch, suppose that the table has 100 blocks and we're truncating it to 50 blocks in RelationTruncate(), the error context message will be "while truncating relation "aaa.bbb" to 100 blocks", which is not correct. I think it should be "while truncating relation "aaa.bbb" to 50 blocks". We can know the relation can be truncated to 50 blocks by the result of count_nondeletable_pages(). So if we update the arguments before it we will use the number of blocks of relation before truncation. My suggestion is either that we change the error message to, for example, "while truncating relation "aaa.bbb" having 100 blocks", or that we change the patch so that we can use "50 blocks" in the error context message. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Fastpath while arranging the changes in LSN order in logical decoding
On Wed, Mar 25, 2020 at 9:23 AM Amit Kapila wrote: > > On Wed, Mar 25, 2020 at 12:46 AM Andres Freund wrote: > > > > On 2020-03-24 18:36:03 +0530, Dilip Kumar wrote: > > > IMHO, I have tried the best case but did not see any performance gain > > > so I am not planning to test this further. I have attached the patch > > > for removing the TODO. > > > > Pushed. Thanks! > > > > I have updated the CF entry. Thanks to all involved in this. Thanks! -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Fastpath while arranging the changes in LSN order in logical decoding
On Wed, Mar 25, 2020 at 12:46 AM Andres Freund wrote: > > On 2020-03-24 18:36:03 +0530, Dilip Kumar wrote: > > IMHO, I have tried the best case but did not see any performance gain > > so I am not planning to test this further. I have attached the patch > > for removing the TODO. > > Pushed. Thanks! > I have updated the CF entry. Thanks to all involved in this. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: error context for vacuum to include block number
On Wed, Mar 25, 2020 at 8:49 AM Justin Pryzby wrote: > > On Tue, Mar 24, 2020 at 09:47:30PM +0900, Masahiko Sawada wrote: > > We need to set the error context after setting new_rel_pages. > > Done > > > The type name ErrCbPhase seems to be very generic name, how about > > VacErrCbPhase or VacuumErrCbPhase? > > Done. > > Thanks for your review comments. > @@ -870,6 +904,12 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, else skipping_blocks = false; + /* Setup error traceback support for ereport() */ + errcallback.callback = vacuum_error_callback; + errcallback.arg = vacrelstats; + errcallback.previous = error_context_stack; + error_context_stack = I think by mistake you have re-introduced this chunk of code. We don't need this as we have done it in the caller. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: error context for vacuum to include block number
On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada wrote: > > On Tue, 24 Mar 2020 at 22:37, Amit Kapila wrote: > > > > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada > > wrote: > > > > > > > > > I've read through the latest version patch (v31), here are my comments: > > > > > > 1. > > > +/* Update error traceback information */ > > > +olderrcbarg = *vacrelstats; > > > +update_vacuum_error_cbarg(vacrelstats, > > > + VACUUM_ERRCB_PHASE_TRUNCATE, > > > new_rel_pages, NULL, > > > + false); > > > + > > > /* > > > * Scan backwards from the end to verify that the end pages > > > actually > > > * contain no tuples. This is *necessary*, not optional, because > > > * other backends could have added tuples to these pages whilst > > > we > > > * were vacuuming. > > > */ > > > new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); > > > > > > We need to set the error context after setting new_rel_pages. > > > > > > > We want to cover the errors raised in count_nondeletable_pages(). In > > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which > > use to cover those errors, but that was not good as we were > > setting/resetting it multiple times and it was not clear such a > > separate phase would add any value. > > I got the point. But if we set the error context before that, I think > we need to change the error context message. The error context message > of heap truncation phase is "while truncating relation \"%s.%s\" to %u > blocks", but cbarg->blkno will be the number of blocks of the current > relation. > >case VACUUM_ERRCB_PHASE_TRUNCATE: >if (BlockNumberIsValid(cbarg->blkno)) >errcontext("while truncating relation \"%s.%s\" to %u blocks", > cbarg->relnamespace, cbarg->relname, cbarg->blkno); >break; > Do you mean to say that actually we are just prefetching or reading the pages in count_nondeletable_pages() but the message doesn't have any such indication? If not that, what problem do you see with the message? What is your suggestion? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: error context for vacuum to include block number
On Tue, Mar 24, 2020 at 09:47:30PM +0900, Masahiko Sawada wrote: > We need to set the error context after setting new_rel_pages. Done > The type name ErrCbPhase seems to be very generic name, how about > VacErrCbPhase or VacuumErrCbPhase? Done. Thanks for your review comments. -- Justin >From 26c57039135896ebf29b96c172d35d869ed1ce69 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 12 Dec 2019 20:54:37 -0600 Subject: [PATCH v32 1/3] Introduce vacuum errcontext to display additional information. The additional information displayed will be block number for error occurring while processing heap and index name for error occurring while processing the index. This will help us in diagnosing the problems that occur during a vacuum. For ex. due to corruption (either caused by bad hardware or by some bug) if we get some error while vacuuming, it can help us identify the block in heap and or additional index information. It sets up an error context callback to display additional information with the error. During different phases of vacuum (heap scan, heap vacuum, index vacuum, index clean up, heap truncate), we update the error context callback to display appropriate information. We can extend it to a bit more granular level like adding the phases for FSM operations or for prefetching the blocks while truncating. However, I felt that it requires adding many more error callback function calls and can make the code a bit complex, so left those for now. Author: Justin Pryzby, with few changes by Amit Kapila Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier and Sawada Masahiko Discussion: https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com --- src/backend/access/heap/vacuumlazy.c | 260 --- src/tools/pgindent/typedefs.list | 1 + 2 files changed, 236 insertions(+), 25 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 03c43efc32..cbea791968 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -144,6 +144,17 @@ */ #define ParallelVacuumIsActive(lps) PointerIsValid(lps) +/* Phases of vacuum during which we report error context. */ +typedef enum +{ + VACUUM_ERRCB_PHASE_UNKNOWN, + VACUUM_ERRCB_PHASE_SCAN_HEAP, + VACUUM_ERRCB_PHASE_VACUUM_INDEX, + VACUUM_ERRCB_PHASE_VACUUM_HEAP, + VACUUM_ERRCB_PHASE_INDEX_CLEANUP, + VACUUM_ERRCB_PHASE_TRUNCATE +} VacErrCbPhase; + /* * LVDeadTuples stores the dead tuple TIDs collected during the heap scan. * This is allocated in the DSM segment in parallel mode and in local memory @@ -270,6 +281,8 @@ typedef struct LVParallelState typedef struct LVRelStats { + char *relnamespace; + char *relname; /* useindex = true means two-pass strategy; false means one-pass */ bool useindex; /* Overall statistics about rel */ @@ -290,8 +303,12 @@ typedef struct LVRelStats int num_index_scans; TransactionId latestRemovedXid; bool lock_waiter_detected; -} LVRelStats; + /* Used for error callback */ + char *indname; + BlockNumber blkno; /* used only for heap operations */ + VacErrCbPhase phase; +} LVRelStats; /* A few variables that don't seem worth passing around as parameters */ static int elevel = -1; @@ -314,10 +331,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel, LVRelStats *vacrelstats, LVParallelState *lps, int nindexes); static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats, - LVDeadTuples *dead_tuples, double reltuples); + LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats); static void lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult **stats, - double reltuples, bool estimated_count); + double reltuples, bool estimated_count, LVRelStats *vacrelstats); static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer); static bool should_attempt_truncation(VacuumParams *params, @@ -337,13 +354,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult * int nindexes); static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats, LVShared *lvshared, LVDeadTuples *dead_tuples, - int nindexes); + int nindexes, LVRelStats *vacrelstats); static void vacuum_indexes_leader(Relation *Irel, IndexBulkDeleteResult **stats, LVRelStats *vacrelstats, LVParallelState *lps, int nindexes); static void vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats, LVShared *lvshared, LVSharedIndStats *shared_indstats, - LVDeadTuples *dead_tuples); + LVDeadTuples *dead_tuples, LVRelStats *vacrelstats); static void lazy_cleanup_all_indexes(Relation *Irel, IndexBulkDeleteResult **stats, LVRelStats *vacrelstats,
Re: Collation versions on Windows (help wanted, apply within)
On Tue, Mar 24, 2020 at 7:56 AM Juan José Santamaría Flecha wrote: > On Mon, Mar 23, 2020 at 5:59 AM Thomas Munro wrote: >> Done in this new 0002 patch (untested). 0001 removes the comment that >> individual collations can't have a NULL version, reports NULL for >> Linux/glibc collations like C.UTF-8 by stripping the suffix and >> comparing with C and POSIX as suggested by Peter E. > > It applies and passes tests without a problem in Windows, and works as > expected. Thanks! Pushed. >From the things we learned in this thread, I think there is an open item for someone to write a patch to call EnumSystemLocalesEx() and populate the initial set of collations, where we use "locale -a" on Unix. I'm not sure where the encoding is supposed to come from though, which is why I didn't try to write a patch myself.
Re: improve transparency of bitmap-only heap scans
On Wed, Mar 25, 2020 at 12:44 AM Tom Lane wrote: > > I took a quick look through this patch. While I see nothing to complain > about implementation-wise, I'm a bit befuddled as to why we need this > reporting when there is no comparable data provided for regular index-only > scans. Over there, you just get "Heap Fetches: n", and the existing > counts for bitmap scans seem to cover the same territory. > Isn't deducing similar information ("Skipped Heap Fetches: n") there is a bit easier than it is here? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END
Hi pghackers, This is my first time posting here ... Gilles Darold and I are developing a new FDW which is based on the contrib/postgres_fdw. The postgres_fdw logic uses a RegisterXactCallback function to send local transactions remote. But I found that a registered XactCallback is not always called for a successful client ROLLBACK statement. So, a successful local ROLLBACK does not get sent remote by FDW, and now the local and remote transaction states are messed up, out of sync. The local database is "eating" the successful rollback. I attach a git format-patch file 0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch The patch fixes the problem and is ready to commit as far as I can tell. It adds some comment lines and three lines of code to CommitTransactionCommand() in the TBLOCK_ABORT_END case. Line (1) to reset the transaction's blockState back to TBLOCK_ABORT, ahead of (2) a new call to callXactCallbacks(). If the callback returns successful (which is usually the case), (3) the new code switches back to TBLOCK_ABORT_END, then continues with old code to CleanupTransaction() as before. If any callback does error out, the TBLOCK_ABORT was the correct block state for an error. I have not added a regression test since my scenario involves a C module ... I didn't see such a regression test, but somebody can teach me where to put the C module and Makefile if a regression test should be added. Heads up that the scenario to hit this is a bit complex (to me) ... I attach the following unit test files: 1) eat_rollback.c, _PG_init() installs my_utility_hook() for INFO log for debugging. RegisterSubXactCallback(mySubtransactionCallback) which injects some logging and an ERROR for savepoints, i.e., negative testing, e.g., like a remote FDW savepoint failed. And RegisterXactTransaction(myTransactionCallback) with info logging. 2) Makefile, to make the eat_rollback.so 3) eat_rollback2.sql, drives the fail sequence, especially the SAVEPOINT, which errors out, then later a successful ROLLBACK gets incorrectly eaten (no CALLBACK info logging, demonstrates the bug), then another successful ROLLBACK (now there is CALLBACK info logging). 4) eat_rollback2.out, output without the fix, the rollback at line 68 is successful but there is not myTransactionCallback() INFO output 5) eat_rollback2.fixed, output after the fix, the rollback at line 68 is successful, and now there is a myTransactionCallback() INFO log. Success! It first failed for me in v11.1, I suspect it failed since before then too. And it is still failing in current master. Bye the way, we worked around the bug in our FDW by handling sub/xact in the utility hook. A transaction callback is still needed for implicit, internal rollbacks. Another advantage of the workaround is (I think) it reduces the code complexity and improves performance because the entire subxact stack is not unwound to drive each and every subtransaction level to remote. Also sub/transaction statements are sent remote as they arrive (local and remote are more "transactionally" synced, not stacked by FDW for later). And of course, the workaround doesn't hit the above bug, since our utility hook correctly handles the client ROLLBACK. If it makes sense to the community, I could try and patch contrib/postgres_fdw to do what we are doing. But note that I don't need it myself: we have our own new FDW for remote DB2 z/OS (!) ... at LzLabs we are a building a software defined mainframe with PostgreSQL (of all things). Hope it helps! Dave Sharpe I don't necessarily agree with everything I say. (MM) www.lzlabs.com 0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch Description: 0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch eat_rollback.c Description: eat_rollback.c Makefile Description: Makefile eat_rollback2.sql Description: eat_rollback2.sql eat_rollback2.out Description: eat_rollback2.out eat_rollback2.fixed Description: eat_rollback2.fixed
Re: Run-time pruning for ModifyTable
Hi David, Sorry I couldn't get to this sooner. On Wed, Mar 25, 2020 at 9:49 AM David Rowley wrote: > On Wed, 25 Mar 2020 at 13:00, Tom Lane wrote: > > David Rowley writes: > > > I had a closer look at this today and the code I have in > > > inheritance_planner() is certainly not right. > > > > Although I didn't get around to it for v13, there's still a plan on the > > table for inheritance_planner() to get nuked from orbit [1]. > > > > Maybe this improvement should be put on hold till that's done? > > Possibly. I'm not really wedded to the idea of getting it in. However, > it would really only be the inheritance planner part that would need > to be changed later. I don't think any of the other code would need to > be adjusted. > > Amit shared his thoughts in [1]. If you'd rather I held off, then I will. > > David > > [1] > https://www.postgresql.org/message-id/CA%2BHiwqGhD7ieKsv5%2BGkmHgs-XhP2DoUhtESVb3MU-4j14%3DG6LA%40mail.gmail.com Actually, I was saying in that email that the update/delete planning overhaul being talked about will make the entirety of the functionality this patch is adding, which is ModifyTable node being able to prune its subplans based on run-time parameter values, redundant. That's because, with the overhaul, there won't be multiple subplans under ModifyTable, only one which would take care of any pruning that's necessary. What I did say in favor of this patch though is that it doesn not seem that invasive, so maybe okay to get in for v13. -- Thank you, Amit Langote EnterpriseDB: http://www.enterprisedb.com
Re: some AppVeyor files
On Tue, Mar 24, 2020 at 5:05 AM Peter Eisentraut wrote: > You can also run this yourself without the detour through the commit > fest app. Attached are three patches that add .appveyor.yml files, for > MSVC, MinGW, and Cygwin respectively. (An open problem is to combine > them all into one.) I have been using these regularly over the last few > months to test code on these Windows variants. Thanks! I added a link to this thread to a Wiki page that tries to collect information on this topic[1]. Another thing you could be interested in is the ability to test on several different MSVC versions (I tried to find some appveyor.yml files I had around here somewhere to do that, but no cigar... it's just different paths for those .bat files that set up the environment). Here is my current wish list for Windows CI: 1. Run check-world with tap tests. 2. Turn on the equivalent of -Werror (maybe). 3. Turn on asserts. 4. Print backtraces on crash. 5. Dump all potentially relevant logs on failure (initdb.log, regression.diff etc). 6. Find a Windows thing that is like ccache and preserve its cache across builds (like Travis, which saves some build time). [1] https://wiki.postgresql.org/wiki/Continuous_Integration
Re: pg_upgrade fails with non-standard ACL
Hello David, On 3/25/2020 2:08 AM, David Steele wrote: On 12/17/19 3:10 AM, Arthur Zakirov wrote: I attached new version of the patch. It still uses pg_identify_object(), I'm not sure about other ways to build identities yet. This patch applies and builds but fails regression tests on Linux and Windows: https://travis-ci.org/postgresql-cfbot/postgresql/builds/666134656 https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.85292 Regression tests fail because cfbot applies "test_rename_catalog_objects.patch". Regression tests pass without it. This patch shouldn't be applied by cfbot. I'm not sure how to do this. But maybe it is possible to use different extension name for the test patch, not ".patch". -- Artur
AllocSetEstimateChunkSpace()
Attached is a small patch that introduces a simple function, AllocSetEstimateChunkSpace(), and uses it to improve the estimate for the size of a hash entry for hash aggregation. Getting reasonable estimates for the hash entry size (including the TupleHashEntryData, the group key tuple, the per-group state, and by- ref transition values) is important for multiple reasons: * It helps the planner know when hash aggregation is likely to spill, and how to cost it. * It helps hash aggregation regulate itself by setting a group limit (separate from the memory limit) to account for growing by-ref transition values. * It helps choose a good initial size for the hash table. Too large of a hash table will crowd out space that could be used for the group keys or the per-group state. Each group requires up to three palloc chunks: one for the group key tuple, one for the per-group states, and one for a by-ref transition value. Each chunk can have a lot of overhead: in addition to the chunk header (16 bytes overhead), it also rounds the value up to a power of two (~25% overhead). So, it's important to account for this chunk overhead. I considered making it a method of a memory context, but the planner will call this function before the hash agg memory context is created. It seems to make more sense for this to just be an AllocSet-specific function for now. Regards, Jeff Davis diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 2d94d7dc25e..7fa71ad2d65 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -1637,16 +1637,37 @@ find_hash_columns(AggState *aggstate) /* * Estimate per-hash-table-entry overhead. + * + * NB: This assumes the memory context at execution time will be an + * AllocSetContext. */ Size -hash_agg_entry_size(int numAggs, Size tupleWidth, Size transitionSpace) +hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace) { + Size tupleChunkSize; + Size pergroupChunkSize; + Size transitionChunkSize; + Size tupleSize = (MAXALIGN(SizeofMinimalTupleHeader) + + tupleWidth); + Size pergroupSize = numTrans * sizeof(AggStatePerGroupData); + + tupleChunkSize = AllocSetEstimateChunkSpace(tupleSize); + + if (pergroupSize > 0) + pergroupChunkSize = AllocSetEstimateChunkSpace(pergroupSize); + else + pergroupChunkSize = 0; + + if (transitionSpace > 0) + transitionChunkSize = AllocSetEstimateChunkSpace(transitionSpace); + else + transitionChunkSize = 0; + return - MAXALIGN(SizeofMinimalTupleHeader) + - MAXALIGN(tupleWidth) + - MAXALIGN(sizeof(TupleHashEntryData) + - numAggs * sizeof(AggStatePerGroupData)) + - transitionSpace; + sizeof(TupleHashEntryData) + + tupleChunkSize + + pergroupChunkSize + + transitionChunkSize; } /* diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index c0623f106d2..678ddd77912 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -560,6 +560,30 @@ AllocSetContextCreateInternal(MemoryContext parent, return (MemoryContext) set; } +/* + * Estimate total memory consumed for a chunk of the requested size. + */ +Size +AllocSetEstimateChunkSpace(Size size) +{ + Size chunk_size; + + if (size > ALLOC_CHUNK_LIMIT) + { + chunk_size = MAXALIGN(size); + + return chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ; + } + else + { + int fidx = AllocSetFreeIndex(size); + + chunk_size = (1 << ALLOC_MINBITS) << fidx; + + return chunk_size + ALLOC_CHUNKHDRSZ; + } +} + /* * AllocSetReset * Frees all memory which is allocated in the given set. diff --git a/src/include/executor/nodeAgg.h b/src/include/executor/nodeAgg.h index a5b8a004d1e..c2b55728bfa 100644 --- a/src/include/executor/nodeAgg.h +++ b/src/include/executor/nodeAgg.h @@ -314,7 +314,7 @@ extern AggState *ExecInitAgg(Agg *node, EState *estate, int eflags); extern void ExecEndAgg(AggState *node); extern void ExecReScanAgg(AggState *node); -extern Size hash_agg_entry_size(int numAggs, Size tupleWidth, +extern Size hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace); extern void hash_agg_set_limits(double hashentrysize, uint64 input_groups, int used_bits, Size *mem_limit, diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h index 909bc2e9888..67e53e4b977 100644 --- a/src/include/utils/memutils.h +++ b/src/include/utils/memutils.h @@ -155,6 +155,7 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent, Size minContextSize, Size initBlockSize, Size maxBlockSize); +extern Size AllocSetEstimateChunkSpace(Size size); /* * This wrapper macro exists to check for non-constant strings used as context
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 24, 2020 at 8:23 PM James Coleman wrote: > While working on finding a test case to show rescan isn't implemented > properly yet, I came across a bug. At the top of > ExecInitIncrementalSort, we assert that eflags does not contain > EXEC_FLAG_REWIND. But the following query (with merge and hash joins > disabled) breaks that assertion: > > select * from t join (select * from t order by a, b) s on s.a = t.a > where t.a in (1,2); > > The comments about this flag in src/include/executor/executor.h say: > > * REWIND indicates that the plan node should try to efficiently support > * rescans without parameter changes. (Nodes must support ExecReScan calls > * in any case, but if this flag was not given, they are at liberty to do it > * through complete recalculation. Note that a parameter change forces a > * full recalculation in any case.) > > Now we know that except in rare cases (as just discussed recently up > thread) we can't implement rescan efficiently. > > So is this a planner bug (i.e., should we try not to generate > incremental sort plans that require efficient rewind)? Or can we just > remove that part of the assertion and know that we'll implement the > rescan, albeit inefficiently? We already explicitly declare that we > don't support backwards scanning, but I don't see a way to declare the > same for rewind. Other nodes seem to get a materialization node placed above them to support this case "better". Is that something we should be doing? > I'm going to try the latter approach now to see if it at least solves > the immediate problem... So a couple of interesting results here. First, it does seem to fix the assertion failure, and rescan is being used in this case (as I expected). The plans have a bit of a weird shape, at least in my mind. First, to get the incremental sort on the right side of the join I had to: set enable_mergejoin = off; set enable_hashjoin = off; and got this plan: Gather (cost=1000.47..108541.96 rows=2 width=16) Workers Planned: 2 -> Nested Loop (cost=0.47..107541.76 rows=1 width=16) Join Filter: (t.a = t_1.a) -> Parallel Seq Scan on t (cost=0.00..9633.33 rows=1 width=8) Filter: (a = ANY ('{1,2}'::integer[])) -> Incremental Sort (cost=0.47..75408.43 rows=100 width=8) Sort Key: t_1.a, t_1.b Presorted Key: t_1.a -> Index Scan using idx_t_a on t t_1 (cost=0.42..30408.42 rows=100 width=8) To get rid of the parallelism but keep the same basic plan shape I had to further: set enable_seqscan = off; set enable_material = off; and got this plan: Nested Loop (cost=0.89..195829.74 rows=2 width=16) Join Filter: (t.a = t_1.a) -> Index Scan using idx_t_a on t (cost=0.42..12.88 rows=2 width=8) Index Cond: (a = ANY ('{1,2}'::integer[])) -> Incremental Sort (cost=0.47..75408.43 rows=100 width=8) Sort Key: t_1.a, t_1.b Presorted Key: t_1.a -> Index Scan using idx_t_a on t t_1 (cost=0.42..30408.42 rows=100 width=8) Two observations: 1. Ideally the planner would have realized that the join condition can be safely pushed down into both index scans. I was surprised this didn't happen, but...maybe that's just not supported? 2. Ideally the nested loop node would have the smarts to know that the right child is ordered, and therefore it can stop pulling nodes from it as soon as it stops matching the join condition for each iteration in the loop. I'm less surprised this isn't supported; it seems like a fairly advanced optimization (OTOH it is the kind of interesting optimization incremental sort opens up in more cases. I don't *think* either of these are issues with the patch, but wanted to mention them in case it piqued someone's curiosity or in case it actually is a bug [in our patch or otherwise]. James
Re: Run-time pruning for ModifyTable
On Wed, 25 Mar 2020 at 13:00, Tom Lane wrote: > > David Rowley writes: > > I had a closer look at this today and the code I have in > > inheritance_planner() is certainly not right. > > Although I didn't get around to it for v13, there's still a plan on the > table for inheritance_planner() to get nuked from orbit [1]. > > Maybe this improvement should be put on hold till that's done? Possibly. I'm not really wedded to the idea of getting it in. However, it would really only be the inheritance planner part that would need to be changed later. I don't think any of the other code would need to be adjusted. Amit shared his thoughts in [1]. If you'd rather I held off, then I will. David [1] https://www.postgresql.org/message-id/CA%2BHiwqGhD7ieKsv5%2BGkmHgs-XhP2DoUhtESVb3MU-4j14%3DG6LA%40mail.gmail.com
Re: Include sequence relation support in logical replication
On 2020-03-24 16:19:21 -0700, Cary Huang wrote: > Hi > > > > From the PG logical replication documentation, I see that there is a > listed limitation that sequence relation is not replicated > logically. After some examination, I see that retrieving the next > value from a sequence using the nextval() call will emits a WAL update > every 32 calls to nextval(). In fact, when it emits a WAL update, it > will write a future value 32 increments from now, and maintain a > internal cache for delivering sequence numbers. It is done this way to > minimize the write operation to WAL record at a risk of losing some > values during a crash. So if we were to replicate the sequence, the > subscriber will receive a future value (32 calls to nextval()) from > now, and it obviously does not reflect current status. Sequence > changes caused by other sequence-related SQL functions like setval() > or ALTER SEQUENCE xxx, will always emit a WAL update, so replicating > changes caused by these should not be a problem. > > > > I have shared a patch that allows sequence relation to be supported in > logical replication via the decoding plugin ( test_decoding for example ); it > does not support sequence relation in logical replication between a PG > publisher and a PG subscriber via pgoutput plugin as it will require much > more work. For the replication to make sense, the patch actually disables the > WAL update at every 32 nextval() calls, so every call to nextval() will emit > a WAL update for proper replication. This is done by setting SEQ_LOG_VALS to > 0 in sequence.c > > > > I think the question is that should we minimize WAL update frequency (every > 32 calls) for getting next value in a sequence at a cost of losing values > during crash or being able to replicate a sequence relation properly at a > cost or more WAL updates? > > > > > > Cary Huang > > - > > HighGo Software Inc. (Canada) > > mailto:cary.hu...@highgo.ca > > http://www.highgo.ca > diff --git a/contrib/test_decoding/test_decoding.c > b/contrib/test_decoding/test_decoding.c > index 93c948856e..7a7e572d6c 100644 > --- a/contrib/test_decoding/test_decoding.c > +++ b/contrib/test_decoding/test_decoding.c > @@ -466,6 +466,15 @@ pg_decode_change(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > > >data.tp.oldtuple->tuple, > true); > break; > + case REORDER_BUFFER_CHANGE_SEQUENCE: > + appendStringInfoString(ctx->out, " > SEQUENCE:"); > + if (change->data.sequence.newtuple == > NULL) > + > appendStringInfoString(ctx->out, " (no-tuple-data)"); > + else > + tuple_to_stringinfo(ctx->out, > tupdesc, > + > >data.sequence.newtuple->tuple, > + > false); > + break; > default: > Assert(false); > } > diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c > index 6aab73bfd4..941015e4aa 100644 > --- a/src/backend/commands/sequence.c > +++ b/src/backend/commands/sequence.c > @@ -49,11 +49,10 @@ > > > /* > - * We don't want to log each fetching of a value from a sequence, > - * so we pre-log a few fetches in advance. In the event of > - * crash we can lose (skip over) as many values as we pre-logged. > + * Sequence replication is now supported and we will now need to log each > sequence > + * update to WAL such that the standby can properly receive the sequence > change > */ > -#define SEQ_LOG_VALS 32 > +#define SEQ_LOG_VALS 0 > > /* > * The "special area" of a sequence's buffer page looks like this. > diff --git a/src/backend/replication/logical/decode.c > b/src/backend/replication/logical/decode.c > index c2e5e3abf8..3dc14ead08 100644 > --- a/src/backend/replication/logical/decode.c > +++ b/src/backend/replication/logical/decode.c > @@ -42,6 +42,7 @@ > #include "replication/reorderbuffer.h" > #include "replication/snapbuild.h" > #include "storage/standby.h" > +#include "commands/sequence.h" > > typedef struct XLogRecordBuffer > { > @@ -70,9 +71,11 @@ static void DecodeCommit(LogicalDecodingContext *ctx, > XLogRecordBuffer *buf, >xl_xact_parsed_commit *parsed, > TransactionId xid); > static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, > xl_xact_parsed_abort *parsed, > TransactionId xid); > +static void DecodeSequence(LogicalDecodingContext *ctx,
Re: Index Skip Scan
On Wed, Mar 25, 2020 at 12:41 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > On Wed, Mar 11, 2020 at 06:56:09PM +0800, Andy Fan wrote: > > > > There was a dedicated thread [1] where David explain his idea very > > detailed, and you can also check that messages around that message for > > the context. hope it helps. > > Thanks for pointing out to this thread! Somehow I've missed it, and now > looks like we need to make some efforts to align patches for index skip > scan and distincClause elimination. > Yes:). Looks Index skip scan is a way of make a distinct result without a real distinct node, which happens after the UniqueKeys check where I try to see if the result is unique already and before the place where create a unique node for distinct node(With index skip scan we don't need it all). Currently in my patch, the logical here is 1). Check the UniqueKey to see if the result is not unique already. if not, go to next 2). After the distinct paths are created, I will add the result of distinct path as a unique key. Will you add the index skip scan path during create_distincts_paths and add the UniqueKey to RelOptInfo? if so I guess my current patch can handle it since it cares about the result of distinct path but no worried about how we archive that. Best Regards Andy Fan
Re: PATCH: add support for IN and @> in functional-dependency statistics use
On Thu, Mar 19, 2020 at 07:53:39PM +, Dean Rasheed wrote: On Wed, 18 Mar 2020 at 00:29, Tomas Vondra wrote: OK, I took a look. I think from the correctness POV the patch is OK, but I think the dependencies_clauselist_selectivity() function now got a bit too complex. I've been able to parse it now, but I'm sure I'll have trouble in the future :-( Can we refactor / split it somehow and move bits of the logic to smaller functions, or something like that? Yeah, it has gotten a bit long. It's somewhat tricky splitting it up, because of the number of shared variables used throughout the function, but here's an updated patch splitting it into what seemed like the 2 most logical pieces. The first piece (still in dependencies_clauselist_selectivity()) works out what dependencies can/should be applied, and the second piece in a new function does the actual work of applying the list of functional dependencies to the clause list. I think that has made it easier to follow, and it has also reduced the complexity of the final "no applicable stats" branch. Seems OK to me. I'd perhaps name deps_clauselist_selectivity differently, it's a bit too similar to dependencies_clauselist_selectivity. Perhaps something like clauselist_apply_dependencies? But that's a minor detail. Another thing I'd like to suggest is keeping the "old" formula, and instead of just replacing it with P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b) but explaining how the old formula may produce nonsensical selectivity, and how the new formula addresses that issue. I think this is purely a comment issue? I've added some more extensive comments attempting to justify the formulae. Yes, it was purely a comment issue. Seems fine now. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 24, 2020 at 7:08 PM Tomas Vondra wrote: > > On Tue, Mar 24, 2020 at 06:26:11PM -0400, James Coleman wrote: > >On Mon, Mar 23, 2020 at 11:44 PM Alvaro Herrera > > wrote: > >> > >> On 2020-Mar-23, James Coleman wrote: > >> > >> > 4. nodeIncrementalSort.c ExecReScanIncrementalSort: This whole chunk > >> > is suspect. I've mentioned previously I don't have a great mental > >> > model of how rescan works and its invariants (IIRC someone said it was > >> > about moving around a result set in a cursor). Regardless I'm pretty > >> > sure this code just doesn't work correctly. > >> > >> I don't think that's the whole of it. My own vague understanding of > >> ReScan is that it's there to support running a node again, possibly with > >> different parameters. For example if you have a join of an indexscan > >> on the outer side and an incremental sort on the inner side, and the > >> values from the index are used as parameters to the incremental sort, > >> then the incremental sort is going to receive ReScan calls for each of > >> the values that the index returns. Sometimes the index could give you > >> the same values as before (because there's a dupe in the index), so you > >> can just return the same values from the incremental sort; but other > >> times it's going to return different values so you need to reset the > >> incremental sort to "start from scratch" using the new values as > >> parameters. > >> > >> Now, if you have a cursor reading from the incremental sort and fetch > >> all tuples, then rewind completely and fetch all again, then that's > >> going to be a rescan as well. > >> > >> I agree with you that the code doesn't seem to implement that. > > > >I grepped the codebase for rescan, and noted this relevant info in > >src/backend/executor/README: > > > >* Rescan command to reset a node and make it generate its output sequence > >over again. > > > >* Parameters that can alter a node's results. After adjusting a parameter, > >the rescan command must be applied to that node and all nodes above it. > >There is a moderately intelligent scheme to avoid rescanning nodes > >unnecessarily (for example, Sort does not rescan its input if no parameters > >of the input have changed, since it can just reread its stored sorted data). > > > >That jives pretty well with what you're saying. > > > >The interesting thing with incremental sort, as the comments in the > >patch already note, is that even if the params haven't changed, we > >can't regenerate the same values again *unless* we know that we're > >still in the same batch, or, have only processed a single full batch > >(and the tuples are still in the full sort state) or we've > >transitioned to prefix mode and have only transferred tuples from the > >full sort state for a single prefix key group. > > > >That's a pretty narrow range of applicability of not needing to > >re-execute the entire node, at least based on my assumptions about > >when rescanning will typically happen. > > > >So, two followup questions: > > > >1. Given the narrow applicability, might it make sense to just say > >"we're only going to do a total reset and rescan and not try to > >implement a smart 'don't rescan if we don't have to'"? > > > > I think that's a sensible approach. > > >2. What would be a typical or good way to test this? Should I > >basically repeat many of the existing implementation tests but with a > >cursor and verify that rescanning produces the same results? That's > >probably the path I'm going to take if there are no objections. Of > >course we would need even more testing if we wanted to have the "smart > >rescan" functionality. > > > > I haven't checked, but how are we testing it for the other nodes? I haven't checked yet, but figured I'd ask in case someone had ideas off the top of their head. While working on finding a test case to show rescan isn't implemented properly yet, I came across a bug. At the top of ExecInitIncrementalSort, we assert that eflags does not contain EXEC_FLAG_REWIND. But the following query (with merge and hash joins disabled) breaks that assertion: select * from t join (select * from t order by a, b) s on s.a = t.a where t.a in (1,2); The comments about this flag in src/include/executor/executor.h say: * REWIND indicates that the plan node should try to efficiently support * rescans without parameter changes. (Nodes must support ExecReScan calls * in any case, but if this flag was not given, they are at liberty to do it * through complete recalculation. Note that a parameter change forces a * full recalculation in any case.) Now we know that except in rare cases (as just discussed recently up thread) we can't implement rescan efficiently. So is this a planner bug (i.e., should we try not to generate incremental sort plans that require efficient rewind)? Or can we just remove that part of the assertion and know that we'll implement the rescan, albeit inefficiently? We already explicitly declare that we don't support
Re: Run-time pruning for ModifyTable
David Rowley writes: > I had a closer look at this today and the code I have in > inheritance_planner() is certainly not right. Although I didn't get around to it for v13, there's still a plan on the table for inheritance_planner() to get nuked from orbit [1]. Maybe this improvement should be put on hold till that's done? regards, tom lane [1] https://www.postgresql.org/message-id/357.1550612935%40sss.pgh.pa.us
Re: Run-time pruning for ModifyTable
On Tue, 10 Mar 2020 at 00:13, David Rowley wrote: > Over in inheritance_planner(), I noticed that the RT index of the > SELECT query and the UPDATE/DELETE query can differ. There was some > code that performed translations. I changed that code slightly so that > it's a bit more optimal. It was building two lists, one for the old > RT index and one for the new. It added elements to this list > regardless of if the RT indexes were the same or not. I've now changed > that to only add to the list if they differ, which I feel should never > be slower and most likely always faster. I'm also now building a > translation map between the old and new RT indexes, however, I only > found one test in the regression tests which require any sort of > translation of these RT indexes. This was with an inheritance table, > so I need to do a bit more work to find a case where this happens with > a partitioned table to ensure all this works. I had a closer look at this today and the code I have in inheritance_planner() is certainly not right. It's pretty easy to made the SELECT and UPDATE/DELETE's RT indexes differ with something like: drop table part_t cascade; create table part_t (a int, b int, c int) partition by list (a); create table part_t12 partition of part_t for values in(1,2) partition by list (a); create table part_t12_1 partition of part_t12 for values in(1); create table part_t12_2 partition of part_t12 for values in(2); create table part_t3 partition of part_t for values in(3); create view vw_part_t as select * from part_t; explain analyze update vw_part_t set a = t2.a +0 from part_t t2 where t2.a = vw_part_t.a and vw_part_t.a = (select 1); In this case, the sub-partitioned table changes RT index. I can't just take the RelOptInfo's from the partition_root's simple_rel_array and put them in the correct element in the root's simple_rel_array as they RT indexes stored within also need to be translated. I'll be having another look at this to see what the best fix is going to be. David
Re: NOT IN subquery optimization
"Li, Zheng" writes: > * I find it entirely unacceptable to stick some planner temporary > fields into struct SubLink. If you need that storage you'll have > to find some other place to put it. But in point of fact I don't > think you need it; it doesn't look to me to be critical to generate > the subquery's plan any earlier than make_subplan would have done it. > Moreover, you should really strive to *not* do that, because it's > likely to get in the way of other future optimizations. As the > existing comment in make_subplan already suggests, we might want to > delay subplan planning even further than that in future. > The reason for calling make_subplan this early is that we want to > Call subplan_is_hashable(plan), to decide whether to proceed with the proposed > transformation. Well, you're going to have to find another way, because this one won't do. If you really need to get whacked over the head with a counterexample for this approach, consider what happens if some part of the planner decides to pass the SubLink through copyObject, expression_tree_mutator, etc in between where you've done the planning and where make_subplan looks at it. Since you haven't taught copyfuncs.c about these fields, they'll semi-accidentally wind up as NULL afterwards, meaning you lost the information anyway. (In fact, I wouldn't be surprised if that's happening already in some cases; you couldn't really tell, since make_subplan will just repeat the work.) On the other hand, you can't have copyfuncs.c copying such fields either --- we don't have copyfuncs support for PlannerInfo, and if we did, the case would end up as infinite recursion. Nor would it be particularly cool to try to fake things out by copying the pointers as scalars; that will lead to dangling pointers later on. BTW, so far as I can see, the only reason you're bothering with the whole thing is to compare the size of the subquery output with work_mem, because that's all that subplan_is_hashable does. I wonder whether that consideration is even still necessary in the wake of 1f39bce02. If it is, I wonder whether there isn't a cheaper way to figure it out. (Note similar comment in make_subplan.) Also ... > We try to stick with the fast hashed subplan when possible to avoid > potential performance degradation from the transformation which may > restrict the planner to choose Nested Loop Anti Join in order to handle > Null properly, But can't you detect that case directly? It seems like you'd need to figure out the NULL situation anyway to know whether the transformation to antijoin is valid in the first place. regards, tom lane
Include sequence relation support in logical replication
Hi >From the PG logical replication documentation, I see that there is a listed >limitation that sequence relation is not replicated logically. After some >examination, I see that retrieving the next value from a sequence using the >nextval() call will emits a WAL update every 32 calls to nextval(). In fact, >when it emits a WAL update, it will write a future value 32 increments from >now, and maintain a internal cache for delivering sequence numbers. It is done >this way to minimize the write operation to WAL record at a risk of losing >some values during a crash. So if we were to replicate the sequence, the >subscriber will receive a future value (32 calls to nextval()) from now, and >it obviously does not reflect current status. Sequence changes caused by other >sequence-related SQL functions like setval() or ALTER SEQUENCE xxx, will >always emit a WAL update, so replicating changes caused by these should not be >a problem. I have shared a patch that allows sequence relation to be supported in logical replication via the decoding plugin ( test_decoding for example ); it does not support sequence relation in logical replication between a PG publisher and a PG subscriber via pgoutput plugin as it will require much more work. For the replication to make sense, the patch actually disables the WAL update at every 32 nextval() calls, so every call to nextval() will emit a WAL update for proper replication. This is done by setting SEQ_LOG_VALS to 0 in sequence.c I think the question is that should we minimize WAL update frequency (every 32 calls) for getting next value in a sequence at a cost of losing values during crash or being able to replicate a sequence relation properly at a cost or more WAL updates? Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca sequence_replication.patch Description: Binary data
Re: Ltree syntax improvement
Nikita Glukhov writes: > Attached new version of the patch. I spent a little bit of time looking through this, and have a few comments: * You have a lot of places where tests for particular ASCII characters are done like this: if ((charlen == 1) && t_iseq(src, '\\')) This is a tedious and expensive way to spell if (*src == '\\') because charlen is necessarily 1 if you are looking at an ASCII character; there is no allowed backend encoding in which an ASCII character can be the first byte of a multibyte character. Aside from the direct simplifications of the tests that this makes possible, I see some places where you'd not have to pass charlen around, either. * I spent a fair amount of time thinking that a lot of the added code was wrong because it was only considering escaping and not double-quoting. I eventually concluded that the idea is to convert double-quoting to a pure escape-based representation during input and store it that way. However, I don't really see why that is either necessary or a good idea --- the internal storage already has a length counter for each label. So I think what you really ought to be doing here is simplifying out both quotes and escapes during ltree_in and just storing the notionally-represented string internally. (If I've misunderstood what the plan is, well the utter lack of documentation in the patch isn't helping.) * The added test cases seem a bit excessive and repetitive. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 24, 2020 at 06:26:11PM -0400, James Coleman wrote: On Mon, Mar 23, 2020 at 11:44 PM Alvaro Herrera wrote: On 2020-Mar-23, James Coleman wrote: > 4. nodeIncrementalSort.c ExecReScanIncrementalSort: This whole chunk > is suspect. I've mentioned previously I don't have a great mental > model of how rescan works and its invariants (IIRC someone said it was > about moving around a result set in a cursor). Regardless I'm pretty > sure this code just doesn't work correctly. I don't think that's the whole of it. My own vague understanding of ReScan is that it's there to support running a node again, possibly with different parameters. For example if you have a join of an indexscan on the outer side and an incremental sort on the inner side, and the values from the index are used as parameters to the incremental sort, then the incremental sort is going to receive ReScan calls for each of the values that the index returns. Sometimes the index could give you the same values as before (because there's a dupe in the index), so you can just return the same values from the incremental sort; but other times it's going to return different values so you need to reset the incremental sort to "start from scratch" using the new values as parameters. Now, if you have a cursor reading from the incremental sort and fetch all tuples, then rewind completely and fetch all again, then that's going to be a rescan as well. I agree with you that the code doesn't seem to implement that. I grepped the codebase for rescan, and noted this relevant info in src/backend/executor/README: * Rescan command to reset a node and make it generate its output sequence over again. * Parameters that can alter a node's results. After adjusting a parameter, the rescan command must be applied to that node and all nodes above it. There is a moderately intelligent scheme to avoid rescanning nodes unnecessarily (for example, Sort does not rescan its input if no parameters of the input have changed, since it can just reread its stored sorted data). That jives pretty well with what you're saying. The interesting thing with incremental sort, as the comments in the patch already note, is that even if the params haven't changed, we can't regenerate the same values again *unless* we know that we're still in the same batch, or, have only processed a single full batch (and the tuples are still in the full sort state) or we've transitioned to prefix mode and have only transferred tuples from the full sort state for a single prefix key group. That's a pretty narrow range of applicability of not needing to re-execute the entire node, at least based on my assumptions about when rescanning will typically happen. So, two followup questions: 1. Given the narrow applicability, might it make sense to just say "we're only going to do a total reset and rescan and not try to implement a smart 'don't rescan if we don't have to'"? I think that's a sensible approach. 2. What would be a typical or good way to test this? Should I basically repeat many of the existing implementation tests but with a cursor and verify that rescanning produces the same results? That's probably the path I'm going to take if there are no objections. Of course we would need even more testing if we wanted to have the "smart rescan" functionality. I haven't checked, but how are we testing it for the other nodes? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: NOT IN subquery optimization
Hi Tom, Thanks for the feedback. * I find it entirely unacceptable to stick some planner temporary fields into struct SubLink. If you need that storage you'll have to find some other place to put it. But in point of fact I don't think you need it; it doesn't look to me to be critical to generate the subquery's plan any earlier than make_subplan would have done it. Moreover, you should really strive to *not* do that, because it's likely to get in the way of other future optimizations. As the existing comment in make_subplan already suggests, we might want to delay subplan planning even further than that in future. The reason for calling make_subplan this early is that we want to Call subplan_is_hashable(plan), to decide whether to proceed with the proposed transformation. We try to stick with the fast hashed subplan when possible to avoid potential performance degradation from the transformation which may restrict the planner to choose Nested Loop Anti Join in order to handle Null properly, the following is an example from subselect.out: explain (costs false) select * from s where n not in (select u from l); QUERY PLAN --- Nested Loop Anti Join InitPlan 1 (returns $0) -> Seq Scan on l l_1 -> Seq Scan on s Filter: ((n IS NOT NULL) OR (NOT $0)) -> Index Only Scan using l_u on l Index Cond: (u = s.n) However, if the subplan is not hashable, the above Nested Loop Anti Join is actually faster. * I'm also not too happy with the (undocumented) rearrangement of reduce_outer_joins. There's a specific sequence of processing that that's involved in, as documented at the top of prepjointree.c, and I doubt that you can just randomly call it from other places and expect good results. In particular, since JOIN alias var flattening won't have happened yet when this code is being run from pull_up_sublinks, it's unlikely that reduce_outer_joins will reliably get the same answers it would get normally. I also wonder whether it's safe to make the parsetree changes it makes earlier than normal, and whether it will be problematic to run it twice on the same tree, and whether its rather indirect connection to distribute_qual_to_rels is going to misbehave. The rearrangement of reduce_outer_joins was to make the null test function is_node_nonnullable() more accurate. Later we added strict predicates logic in is_node_nonnullable(), so I think we can get rid of the rearrangement of reduce_outer_joins now without losing accuracy. * The proposed test additions seem to about triple the runtime of subselect.sql. This seems excessive. I also wonder why it's necessary for this test to build its own large tables; couldn't it re-use ones that already exist in the regression database? I added a lot of test cases. But yes, I can reuse the existing large table if there is one that doesn't fit in 64KB work_mem. * Not really sure that we need a new planner GUC for this, but if we do, it needs to be documented. The new GUC is just in case if anything goes wrong, the user can easily turn it off. Regards, Zheng
Re: WIP: WAL prefetch (another approach)
Hi, On 2020-03-18 18:18:44 +1300, Thomas Munro wrote: > From 1b03eb5ada24c3b23ab8ca6db50e0c5d90d38259 Mon Sep 17 00:00:00 2001 > From: Thomas Munro > Date: Mon, 9 Dec 2019 17:22:07 +1300 > Subject: [PATCH 3/5] Add WalRcvGetWriteRecPtr() (new definition). > > A later patch will read received WAL to prefetch referenced blocks, > without waiting for the data to be flushed to disk. To do that, it > needs to be able to see the write pointer advancing in shared memory. > > The function formerly bearing name was recently renamed to > WalRcvGetFlushRecPtr(), which better described what it does. Hm. I'm a bit weary of reusing the name with a different meaning. If there's any external references, this'll hide that they need to adapt. Perhaps, even if it's a bit clunky, name it GetUnflushedRecPtr? > From c62fde23f70ff06833d743a1c85716e15f3c813c Mon Sep 17 00:00:00 2001 > From: Thomas Munro > Date: Tue, 17 Mar 2020 17:26:41 +1300 > Subject: [PATCH 4/5] Allow PrefetchBuffer() to report what happened. > > Report whether a prefetch was actually initiated due to a cache miss, so > that callers can limit the number of concurrent I/Os they try to issue, > without counting the prefetch calls that did nothing because the page > was already in our buffers. > > If the requested block was already cached, return a valid buffer. This > might enable future code to avoid a buffer mapping lookup, though it > will need to recheck the buffer before using it because it's not pinned > so could be reclaimed at any time. > > Report neither hit nor miss when a relation's backing file is missing, > to prepare for use during recovery. This will be used to handle cases > of relations that are referenced in the WAL but have been unlinked > already due to actions covered by WAL records that haven't been replayed > yet, after a crash. We probably should take this into account in nodeBitmapHeapscan.c > diff --git a/src/backend/storage/buffer/bufmgr.c > b/src/backend/storage/buffer/bufmgr.c > index d30aed6fd9..4ceb40a856 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -469,11 +469,13 @@ static int ts_ckpt_progress_comparator(Datum a, > Datum b, void *arg); > /* > * Implementation of PrefetchBuffer() for shared buffers. > */ > -void > +PrefetchBufferResult > PrefetchSharedBuffer(struct SMgrRelationData *smgr_reln, >ForkNumber forkNum, >BlockNumber blockNum) > { > + PrefetchBufferResult result = { InvalidBuffer, false }; > + > #ifdef USE_PREFETCH > BufferTag newTag; /* identity of requested block */ > uint32 newHash;/* hash value for newTag */ > @@ -497,7 +499,23 @@ PrefetchSharedBuffer(struct SMgrRelationData *smgr_reln, > > /* If not in buffers, initiate prefetch */ > if (buf_id < 0) > - smgrprefetch(smgr_reln, forkNum, blockNum); > + { > + /* > + * Try to initiate an asynchronous read. This returns false in > + * recovery if the relation file doesn't exist. > + */ > + if (smgrprefetch(smgr_reln, forkNum, blockNum)) > + result.initiated_io = true; > + } > + else > + { > + /* > + * Report the buffer it was in at that time. The caller may be > able > + * to avoid a buffer table lookup, but it's not pinned and it > must be > + * rechecked! > + */ > + result.buffer = buf_id + 1; Perhaps it'd be better to name this "last_buffer" or such, to make it clearer that it may be outdated? > -void > +PrefetchBufferResult > PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum) > { > #ifdef USE_PREFETCH > @@ -540,13 +564,17 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, > BlockNumber blockNum) >errmsg("cannot access temporary tables > of other sessions"))); > > /* pass it off to localbuf.c */ > - PrefetchLocalBuffer(reln->rd_smgr, forkNum, blockNum); > + return PrefetchLocalBuffer(reln->rd_smgr, forkNum, blockNum); > } > else > { > /* pass it to the shared buffer version */ > - PrefetchSharedBuffer(reln->rd_smgr, forkNum, blockNum); > + return PrefetchSharedBuffer(reln->rd_smgr, forkNum, blockNum); > } > +#else > + PrefetchBuffer result = { InvalidBuffer, false }; > + > + return result; > #endif /* USE_PREFETCH > */ > } Hm. Now that results are returned indicating whether the buffer is in s_b - shouldn't the return value be accurate regardless of USE_PREFETCH? > +/* > + * Type returned by PrefetchBuffer(). > + */ > +typedef struct PrefetchBufferResult > +{ > + Buffer buffer; /* If valid,
Re: Option to dump foreign data in pg_dump
On 2020-Mar-24, Alvaro Herrera wrote: > On 2020-Mar-23, Alvaro Herrera wrote: > > > COPY public.ft1 (c1, c2, c3) FROM stdin; > > pg_dump: error: query failed: ERROR: foreign-data wrapper "dummy" has no > > handler > > pg_dump: error: query was: COPY (SELECT c1, c2, c3 FROM public.ft1 ) TO > > stdout; > > > > Maybe what we should do just verify that you do get that error (and no > > other errors). > > Done that way. Will be pushing this shortly. Hmm, but travis is failing on the cfbot, and I can't see why ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Improve checking for pg_index.xmin
On Tue, Mar 24, 2020 at 3:38 PM Amit Kapila wrote: > On Sun, Jan 12, 2020 at 3:33 AM Alexander Korotkov > wrote: > > > > On Wed, Jan 8, 2020 at 4:37 PM Tom Lane wrote: > > > Heikki Linnakangas writes: > > > > On 01/11/2019 01:50, Alexander Korotkov wrote: > > > >> This happens so, because we're checking that there is no broken HOT > > > >> chains after index creation by comparison pg_index.xmin and > > > >> TransactionXmin. So, we check that pg_index.xmin is in the past for > > > >> current transaction in lossy way by comparison just xmins. Attached > > > >> patch changes this check to XidInMVCCSnapshot(). > > > >> With patch the issue is gone. My doubt about this patch is that it > > > >> changes check with TransactionXmin to check with GetActiveSnapshot(), > > > >> which might be more recent. However, query shouldn't be executer with > > > >> older snapshot than one it was planned with. > > > > > > > Hmm. Maybe you could construct a case like that with a creative mix of > > > > stable and volatile functions? Using GetOldestSnapshot() would be safer. > > > > > > I really wonder if this is safe at all. > > > > > > (1) Can we assume that the query will be executed with same-or-newer > > > snapshot as what was used by the planner? There's no such constraint > > > in the plancache, I'm pretty sure. > > > > > > (2) Is committed-ness of the index-creating transaction actually > > > sufficient to ensure that none of the broken HOT chains it saw are > > > a problem for the onlooker transaction? This is, at best, really > > > un-obvious. Some of those HOT chains could involve xacts that were > > > still not committed when the index build finished, I believe. > > > > > > (3) What if the index was made with CREATE INDEX CONCURRENTLY --- > > > which xid is actually on the pg_index row, and how does that factor > > > into (1) and (2)? > > > > Thank you for pointing. I'll investigate these issues in detail. > > > > Are you planning to work on this patch [1] for current CF? If not, > then I think it is better to either move this to the next CF or mark > it as RWF. I didn't manage to investigate this subject and provide new patch version. I'm marking this RWF. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Mar 23, 2020 at 11:44 PM Alvaro Herrera wrote: > > On 2020-Mar-23, James Coleman wrote: > > > 4. nodeIncrementalSort.c ExecReScanIncrementalSort: This whole chunk > > is suspect. I've mentioned previously I don't have a great mental > > model of how rescan works and its invariants (IIRC someone said it was > > about moving around a result set in a cursor). Regardless I'm pretty > > sure this code just doesn't work correctly. > > I don't think that's the whole of it. My own vague understanding of > ReScan is that it's there to support running a node again, possibly with > different parameters. For example if you have a join of an indexscan > on the outer side and an incremental sort on the inner side, and the > values from the index are used as parameters to the incremental sort, > then the incremental sort is going to receive ReScan calls for each of > the values that the index returns. Sometimes the index could give you > the same values as before (because there's a dupe in the index), so you > can just return the same values from the incremental sort; but other > times it's going to return different values so you need to reset the > incremental sort to "start from scratch" using the new values as > parameters. > > Now, if you have a cursor reading from the incremental sort and fetch > all tuples, then rewind completely and fetch all again, then that's > going to be a rescan as well. > > I agree with you that the code doesn't seem to implement that. I grepped the codebase for rescan, and noted this relevant info in src/backend/executor/README: * Rescan command to reset a node and make it generate its output sequence over again. * Parameters that can alter a node's results. After adjusting a parameter, the rescan command must be applied to that node and all nodes above it. There is a moderately intelligent scheme to avoid rescanning nodes unnecessarily (for example, Sort does not rescan its input if no parameters of the input have changed, since it can just reread its stored sorted data). That jives pretty well with what you're saying. The interesting thing with incremental sort, as the comments in the patch already note, is that even if the params haven't changed, we can't regenerate the same values again *unless* we know that we're still in the same batch, or, have only processed a single full batch (and the tuples are still in the full sort state) or we've transitioned to prefix mode and have only transferred tuples from the full sort state for a single prefix key group. That's a pretty narrow range of applicability of not needing to re-execute the entire node, at least based on my assumptions about when rescanning will typically happen. So, two followup questions: 1. Given the narrow applicability, might it make sense to just say "we're only going to do a total reset and rescan and not try to implement a smart 'don't rescan if we don't have to'"? 2. What would be a typical or good way to test this? Should I basically repeat many of the existing implementation tests but with a cursor and verify that rescanning produces the same results? That's probably the path I'm going to take if there are no objections. Of course we would need even more testing if we wanted to have the "smart rescan" functionality. Thoughts? James
Re: PostgreSQL proposal of Google Summer of Code
On Tue, Mar 24, 2020 at 7:07 PM Maryam Farrukh wrote: > Dear Sir/Madam, > > I am very much interested in working on a project of PostgreSQL for Google > summer internship. While I was writing a proposal, I came across some > guidelines by the company to get in touch about the nature of the project > and then draft the proposal. I would be very much interested in learning > more about the project so I can come up with a reasonable proposal. > > Hi Maryam, You can start having a look on the following links: https://wiki.postgresql.org/wiki/GSoC https://wiki.postgresql.org/wiki/GSoC_2020 As an old PostgreSQL GSoC student I can tell you it's an amazing experience. Regards, -- Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
PostgreSQL proposal of Google Summer of Code
Dear Sir/Madam, I am very much interested in working on a project of PostgreSQL for Google summer internship. While I was writing a proposal, I came across some guidelines by the company to get in touch about the nature of the project and then draft the proposal. I would be very much interested in learning more about the project so I can come up with a reasonable proposal. Best regards, Maryam Farrukh
Re: allow online change primary_conninfo
I think the startup sighup handler should be in startup.c, not xlog.c, which has enough random code already. We can add an accessor in xlog.c to let changing the walrcv status flag, to be called from the signal handler. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: vacuum verbose detail logs are unclear; log at *start* of each stage
Justin Pryzby writes: > On Wed, Jan 22, 2020 at 02:34:57PM +0900, Michael Paquier wrote: >> From patch 0003: >> /* >> +* Indent multi-line DETAIL if being sent to client (verbose) >> +* We don't know if it's sent to the client (client_min_messages); >> +* Also, that affects output to the logfile, too; assume that it's >> more >> +* important to format messages requested by the client than to make >> +* verbose logs pretty when also sent to the logfile. >> +*/ >> + msgprefix = elevel==INFO ? "!\t" : ""; >> Such stuff gets a -1 from me. This is not project-like, and you make >> the translation of those messages much harder than they should be. > I don't see why its harder to translate ? The really fundamental problem with this is that you are trying to make the server do what is properly the client's job, namely format messages nicely. Please read the message style guidelines [1], particularly the bit about "Formatting", which basically says "don't": Formatting Don't put any specific assumptions about formatting into the message texts. Expect clients and the server log to wrap lines to fit their own needs. In long messages, newline characters (\n) can be used to indicate suggested paragraph breaks. Don't end a message with a newline. Don't use tabs or other formatting characters. (In error context displays, newlines are automatically added to separate levels of context such as function calls.) Rationale: Messages are not necessarily displayed on terminal-type displays. In GUI displays or browsers these formatting instructions are at best ignored. regards, tom lane [1] https://www.postgresql.org/docs/devel/error-style-guide.html
Re: Additional improvements to extended statistics
On Sun, Mar 15, 2020 at 3:23 PM Tomas Vondra wrote: > On Sun, Mar 15, 2020 at 02:48:02PM +1300, Thomas Munro wrote: > >Stimulated by some bad plans involving JSON, I found my way to your > >WIP stats-on-expressions patch in this thread. Do I understand > >correctly that it will eventually also support single expressions, > >like CREATE STATISTICS t_distinct_abc (ndistinct) ON > >(my_jsonb_column->>'abc') FROM t? It looks like that would solve > >problems that otherwise require a generated column or an expression > >index just to get ndistinct. > > Yes, I think that's generally the plan. I was also thinking about > inventing some sort of special JSON statistics (e.g. extracting paths > from the JSONB and computing frequencies, or something like that). But > stats on expressions are one of the things I'd like to do in PG14. Interesting idea. If you had simple single-expression statistics, I suppose a cave-person version of this would be to write a script/stored procedure that extracts the distinct set of JSON paths and does CREATE STATISTICS for expressions to access each path. That said, I suspect that in many cases there's a small set of a paths and a human designer would know what to do. I didn't manage to try your WIP stats-on-expressions patch due to bitrot and unfinished parts, but I am hoping it just needs to remove the "if (numcols < 2) ereport(ERROR ...)" check to get a very very useful thing.
Re: Missing errcode() in ereport
Andres Freund writes: > On 2020-03-23 17:24:49 -0400, Tom Lane wrote: >> Hearing no objections, I started to review Andres' patchset with >> that plan in mind. > Thanks for pushing the first part! I pushed all of it, actually. regards, tom lane
Re: Option to dump foreign data in pg_dump
On 2020-Mar-23, Alvaro Herrera wrote: > COPY public.ft1 (c1, c2, c3) FROM stdin; > pg_dump: error: query failed: ERROR: foreign-data wrapper "dummy" has no > handler > pg_dump: error: query was: COPY (SELECT c1, c2, c3 FROM public.ft1 ) TO > stdout; > > Maybe what we should do just verify that you do get that error (and no > other errors). Done that way. Will be pushing this shortly. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 4c9456992640431c45ed47aec488ac20cae9a4b0 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 20 Mar 2020 18:42:03 -0300 Subject: [PATCH v9 1/2] pg_dump: Allow dumping data of specific foreign servers Author: Luis Carril Discussion: https://postgr.es/m/lejpr01mb0185483c0079d2f651b16231e7...@lejpr01mb0185.deuprd01.prod.outlook.de --- doc/src/sgml/ref/pg_dump.sgml | 30 ++ src/bin/pg_dump/pg_dump.c | 110 -- src/bin/pg_dump/pg_dump.h | 1 + 3 files changed, 136 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 13bd320b31..a9bc397165 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -767,6 +767,36 @@ PostgreSQL documentation + + --include-foreign-data=foreignserver + + +Dump the data for any foreign table with a foreign server +matching foreignserver +pattern. Multiple foreign servers can be selected by writing multiple +--include-foreign-data switches. +Also, the foreignserver parameter is +interpreted as a pattern according to the same rules used by +psql's \d commands (see ), +so multiple foreign servers can also be selected by writing wildcard characters +in the pattern. When using wildcards, be careful to quote the pattern +if needed to prevent the shell from expanding the wildcards; see +. +The only exception is that an empty pattern is disallowed. + + + + + When --include-foreign-data is specified, + pg_dump does not check that the foreign + table is writeable. Therefore, there is no guarantee that the + results of a foreign table dump can be successfully restored. + + + + + --inserts diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 959b36a95c..1849dfe3d7 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -119,6 +119,8 @@ static SimpleStringList table_exclude_patterns = {NULL, NULL}; static SimpleOidList table_exclude_oids = {NULL, NULL}; static SimpleStringList tabledata_exclude_patterns = {NULL, NULL}; static SimpleOidList tabledata_exclude_oids = {NULL, NULL}; +static SimpleStringList foreign_servers_include_patterns = {NULL, NULL}; +static SimpleOidList foreign_servers_include_oids = {NULL, NULL}; /* placeholders for the delimiters for comments */ @@ -153,6 +155,9 @@ static void expand_schema_name_patterns(Archive *fout, SimpleStringList *patterns, SimpleOidList *oids, bool strict_names); +static void expand_foreign_server_name_patterns(Archive *fout, +SimpleStringList *patterns, +SimpleOidList *oids); static void expand_table_name_patterns(Archive *fout, SimpleStringList *patterns, SimpleOidList *oids, @@ -385,6 +390,7 @@ main(int argc, char **argv) {"no-sync", no_argument, NULL, 7}, {"on-conflict-do-nothing", no_argument, _nothing, 1}, {"rows-per-insert", required_argument, NULL, 10}, + {"include-foreign-data", required_argument, NULL, 11}, {NULL, 0, NULL, 0} }; @@ -600,6 +606,11 @@ main(int argc, char **argv) dopt.dump_inserts = (int) rowsPerInsert; break; + case 11: /* include foreign data */ +simple_string_list_append(_servers_include_patterns, + optarg); +break; + default: fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit_nicely(1); @@ -641,6 +652,12 @@ main(int argc, char **argv) exit_nicely(1); } + if (dopt.schemaOnly && foreign_servers_include_patterns.head != NULL) + fatal("options -s/--schema-only and --include-foreign-data cannot be used together"); + + if (numWorkers > 1 && foreign_servers_include_patterns.head != NULL) + fatal("option --include-foreign-data is not supported with parallel backup"); + if (dopt.dataOnly && dopt.outputClean) { pg_log_error("options -c/--clean and -a/--data-only cannot be used together"); @@ -808,6 +825,9 @@ main(int argc, char **argv) _exclude_oids, false); + expand_foreign_server_name_patterns(fout, _servers_include_patterns, + _servers_include_oids); + /* non-matching exclusion patterns aren't an error */ /* @@ -1011,6 +1031,9 @@ help(const char *progname)
Re: Add A Glossary
On Tue, Mar 24, 2020 at 3:40 PM Jürgen Purtz wrote: > On 24.03.20 19:46, Robert Haas wrote: > Do we use shared_buffers for WAL ? > > No. > > What's about the explanation in > https://www.postgresql.org/docs/12/runtime-config-wal.html : "wal_buffers > (integer)The amount of shared memory used for WAL data that has not yet > been written to disk. The default setting of -1 selects a size equal to > 1/32nd (about 3%) of shared_buffers, ... " ? My understanding was, that the > parameter wal_buffers grabs some of the existing shared_buffers for its own > purpose. Is this a misinterpretation? Are shared_buffers and wal_buffers two > different shared memory areas? Yes. The code adds up the shared memory requests from all of the different subsystems and then allocates one giant chunk of shared memory which is divided up between them. The overwhelming majority of that memory goes into shared_buffers, but not all of it. You can use the new pg_get_shmem_allocations() function to see how it's used. For example, with shared_buffers=4GB: rhaas=# select name, pg_size_pretty(size) from pg_get_shmem_allocations() order by size desc limit 10; name | pg_size_pretty --+ Buffer Blocks| 4096 MB Buffer Descriptors | 32 MB | 32 MB XLOG Ctl | 16 MB Buffer IO Locks | 16 MB Checkpointer Data| 12 MB Checkpoint BufferIds | 10 MB clog | 2067 kB | 1876 kB subtrans | 261 kB (10 rows) rhaas=# select count(*), pg_size_pretty(sum(size)) from pg_get_shmem_allocations(); count | pg_size_pretty ---+ 54 | 4219 MB (1 row) So, in this configuration, there whole shared memory segment is 4219MB, of which 4096MB is allocated to shared_buffers and the rest to dozens of smaller allocations, with 1876 kB left over that might get snapped up later by an extension that wants some shared memory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Add A Glossary
On 24.03.20 19:46, Robert Haas wrote: Do we use shared_buffers for WAL ? No. What's about the explanation in https://www.postgresql.org/docs/12/runtime-config-wal.html : "wal_buffers (integer) The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, ... " ? My understanding was, that the parameter wal_buffers grabs some of the existing shared_buffers for its own purpose. Is this a misinterpretation? Are shared_buffers and wal_buffers two different shared memory areas? Kind regards, Jürgen
Re: Missing errcode() in ereport
On 2020-03-23 17:24:49 -0400, Tom Lane wrote: > Hearing no objections, I started to review Andres' patchset with > that plan in mind. Thanks for pushing the first part!
Re: Add A Glossary
> > > > > + Records to the file system and creates a special > > > > Does the chckpointer actually write WAL ? > > Yes. > > > An FK doesn't require the values in its table to be unique, right ? > > I believe it does require that the values are unique. > > > I think there's some confusion. Constraints are not objects, right ? > > I think constraints are definitely objects. They have names and you > can, for example, COMMENT on them. > > > Do we use shared_buffers for WAL ? > > No. > > (I have not reviewed the patch; these are just a few comments on your > comments.) > > I'm going to be coalescing the feedback into an updated patch very soon (tonight/tomorrow), so please keep the feedback on the text/wording coming until then. If anyone has a first attempt at all the ACID definitions, I'd love to see those as well.
Re: Fastpath while arranging the changes in LSN order in logical decoding
On 2020-03-24 18:36:03 +0530, Dilip Kumar wrote: > IMHO, I have tried the best case but did not see any performance gain > so I am not planning to test this further. I have attached the patch > for removing the TODO. Pushed. Thanks!
Re: improve transparency of bitmap-only heap scans
I took a quick look through this patch. While I see nothing to complain about implementation-wise, I'm a bit befuddled as to why we need this reporting when there is no comparable data provided for regular index-only scans. Over there, you just get "Heap Fetches: n", and the existing counts for bitmap scans seem to cover the same territory. I agree with the original comment that it's pretty strange that EXPLAIN doesn't identify an index-only BMS at all; but fixing that is a different patch. regards, tom lane
Re: Adding a test for speculative insert abort case
On 2020-03-24 18:03:57 +0530, Amit Kapila wrote: > Can we change the status of CF entry for this patch [1] to committed > or is there any work pending? Done!
Re: How to only auto-restart BGW only on crash or _PG_init
On Tue, Mar 24, 2020 at 2:33 PM Jeremy Finzel wrote: > I would be grateful for some direction on how to use Background workers to > have a worker automatically restart *only* in certain cases, i.e. on > postmaster start (_PG_init) or a soft crash. I run into all sorts of trouble > if I set bgw_restart_time to actually restart on sigterm, because in most > cases I don't want it to restart (i.e. it was launched with invalid config, > the SQL becomes invalid...). But I *do* want it to auto-restart in any kind > of crash. If I set bgw_restart_time to never restart, then it doesn't > restart after a soft crash, which I want. > > This is for my extension pglogical_ticker, and specifically within this main > function where a sigterm might happen: > https://github.com/enova/pglogical_ticker/blob/ef9b68fd6b5b99787034520009577f8cfec0049c/pglogical_ticker.c#L85-L201 > > I have tried several things unsuccessfully (checking result of SPI_execute or > SPI_connect) , usually resulting in a constantly restarting and failing > worker. So, is there a straightforward way to only have the worker > auto-restart in a very narrow range of cases? I think what you can do is configure the worker to always restart, but then have it exit(0) in the cases where you don't want it to restart, and exit(1) in the cases where you do want it to restart. See: https://git.postgresql.org/pg/commitdiff/be7558162acc5578d0b2cf0c8d4c76b6076ce352 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Add A Glossary
On Fri, Mar 20, 2020 at 3:58 PM Justin Pryzby wrote: > > + A process that writes dirty pages and WAL > > + Records to the file system and creates a special > > Does the chckpointer actually write WAL ? Yes. > An FK doesn't require the values in its table to be unique, right ? I believe it does require that the values are unique. > I think there's some confusion. Constraints are not objects, right ? I think constraints are definitely objects. They have names and you can, for example, COMMENT on them. > Do we use shared_buffers for WAL ? No. (I have not reviewed the patch; these are just a few comments on your comments.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Implement INSERT SET syntax
I wrote: > No doubt that's all fixable, but the realization that some cases of > this syntax are *not* just syntactic sugar for standards-compliant > syntax is giving me pause. Do we really want to get out front of > the SQL committee on extending INSERT in an incompatible way? One compromise that might be worth thinking about is to disallow multiassignments in this syntax, so as to (1) avoid the possibility of generating something that can't be represented by standard INSERT and (2) get something done in time for v13. The end of March is not that far off. Perhaps somebody would come back and extend it later, or perhaps not. A slightly more ambitious compromise would be to allow multiassignment only when the source can be pulled apart into independent subexpressions, comparable to the restriction we used to have in UPDATE itself (before 8f889b108 or thereabouts). In either case the transformation could be done right in gram.y and a helpful error thrown for unsupported cases. regards, tom lane
How to only auto-restart BGW only on crash or _PG_init
Good afternoon! I would be grateful for some direction on how to use Background workers to have a worker automatically restart *only* in certain cases, i.e. on postmaster start (_PG_init) or a soft crash. I run into all sorts of trouble if I set bgw_restart_time to actually restart on sigterm, because in most cases I don't want it to restart (i.e. it was launched with invalid config, the SQL becomes invalid...). But I *do* want it to auto-restart in any kind of crash. If I set bgw_restart_time to never restart, then it doesn't restart after a soft crash, which I want. This is for my extension pglogical_ticker, and specifically within this main function where a sigterm might happen: https://github.com/enova/pglogical_ticker/blob/ef9b68fd6b5b99787034520009577f8cfec0049c/pglogical_ticker.c#L85-L201 I have tried several things unsuccessfully (checking result of SPI_execute or SPI_connect) , usually resulting in a constantly restarting and failing worker. So, is there a straightforward way to only have the worker auto-restart in a very narrow range of cases? Many thanks! Jeremy
Re: Add A Glossary
On 2020-03-20 01:11, Alvaro Herrera wrote: I gave this a look. I first reformatted it so I could read it; that's 0001. Second I changed all the long items into s, which are shorter and don't have to repeat the title of the refered to page. (Of course, this changes the link to be in the same style as every other link in our documentation; some people don't like it. But it's our style.) AFAICT, all the elements in this patch should be changed to . If there is something undesirable about the output style, we can change that, but it's not this patch's job to make up its own rules. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Implement INSERT SET syntax
David Steele writes: > On 12/3/19 4:44 AM, Gareth Palmer wrote: >> Attached is a fixed version. > Does this version of the patch address your concerns? No. I still find the reliance on a FROM clause being present to be pretty arbitrary. Also, I don't believe that ruleutils.c requires no changes, because it's not going to be possible to transform every usage of this syntax to old-style. I tried to prove the point with this trivial example: regression=# create table foo (f1 int ,f2 int, f3 int); CREATE TABLE regression=# create table bar (f1 int ,f2 int, f3 int); CREATE TABLE regression=# create rule r1 as on insert to foo do instead regression-# insert into bar set (f1,f2,f3) = (select f1,f2,f3 from foo); intending to show that the rule decompilation was bogus, but I didn't get that far because the parser crashed: TRAP: FailedAssertion("pstate->p_multiassign_exprs == NIL", File: "parse_target.c", Line: 287) postgres: postgres regression [local] CREATE RULE(ExceptionalCondition+0x55)[0x8fb6e5] postgres: postgres regression [local] CREATE RULE[0x5bd0c3] postgres: postgres regression [local] CREATE RULE[0x583def] postgres: postgres regression [local] CREATE RULE(transformStmt+0x2d5)[0x582665] postgres: postgres regression [local] CREATE RULE(transformRuleStmt+0x2ad)[0x5bf2ad] postgres: postgres regression [local] CREATE RULE(DefineRule+0x17)[0x793847] If I do it like this, I get a different assertion: regression=# insert into bar set (f1,f2,f3) = (select f1,f2,f3) from foo; server closed the connection unexpectedly TRAP: FailedAssertion("exprKind == EXPR_KIND_UPDATE_SOURCE", File: "parse_target.c", Line: 209) postgres: postgres regression [local] INSERT(ExceptionalCondition+0x55)[0x8fb6e5] postgres: postgres regression [local] INSERT(transformTargetList+0x1a7)[0x5bd277] postgres: postgres regression [local] INSERT(transformStmt+0xbe0)[0x582f70] postgres: postgres regression [local] INSERT[0x5839f3] postgres: postgres regression [local] INSERT(transformStmt+0x2d5)[0x582665] postgres: postgres regression [local] INSERT(transformTopLevelStmt+0xd)[0x58411d] postgres: postgres regression [local] INSERT(parse_analyze+0x69)[0x584269] No doubt that's all fixable, but the realization that some cases of this syntax are *not* just syntactic sugar for standards-compliant syntax is giving me pause. Do we really want to get out front of the SQL committee on extending INSERT in an incompatible way? regards, tom lane
Re: Columns correlation and adaptive query optimization
On 12/24/19 3:15 AM, Konstantin Knizhnik wrote: New version of patch implicitly adding multicolumn statistic in auto_explain extension and using it in optimizer for more precise estimation of join selectivity. This patch fixes race condition while adding statistics and restricts generated statistic name to fit in 64 bytes (NameData). This patch no longer applies: https://commitfest.postgresql.org/27/2386/ The CF entry has been updated to Waiting on Author. Regards, -- -David da...@pgmasters.net
Re: pg_upgrade fails with non-standard ACL
On 12/17/19 3:10 AM, Arthur Zakirov wrote: I attached new version of the patch. It still uses pg_identify_object(), I'm not sure about other ways to build identities yet. This patch applies and builds but fails regression tests on Linux and Windows: https://travis-ci.org/postgresql-cfbot/postgresql/builds/666134656 https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.85292 The CF entry has been updated to Waiting on Author. Regards, -- -David da...@pgmasters.net
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
On Fri, Mar 13, 2020 at 6:58 AM Justin Pryzby wrote: > > Thanks for picking up this patch. There's a minor typo: > > +* readable outside of this sessoin. Therefore > doing IO here isn't > > => session > > -- > Justin > Thanks, please see the updated and rebased patch. (master 17a28b03645e27d73bf69a95d7569b61e58f06eb) -- Ibrar Ahmed diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out index f0dcb897c4..6ac3e525eb 100644 --- a/contrib/pg_visibility/expected/pg_visibility.out +++ b/contrib/pg_visibility/expected/pg_visibility.out @@ -131,6 +131,69 @@ select pg_truncate_visibility_map('test_partition'); (1 row) +-- test copy freeze +create table copyfreeze (a int, b char(1500)); +-- load all rows via COPY FREEZE and ensure that all pages are set all-visible +-- and all-frozen. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | t | t + 1 | t | t + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + +-- load half the rows via regular COPY and rest via COPY FREEZE. The pages +-- which are touched by regular COPY must not be set all-visible/all-frozen. On +-- the other hand, pages allocated by COPY FREEZE should be marked +-- all-frozen/all-visible. +begin; +truncate copyfreeze; +copy copyfreeze from stdin; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | f | f + 1 | f | f + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + +-- Try a mix of regular COPY and COPY FREEZE. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +copy copyfreeze from stdin; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | t | t + 1 | f | f + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + -- cleanup drop table test_partitioned; drop view test_view; @@ -140,3 +203,4 @@ drop server dummy_server; drop foreign data wrapper dummy; drop materialized view matview_visibility_test; drop table regular_table; +drop table copyfreeze; diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql index c2a7f1d9e4..01a65fdab4 100644 --- a/contrib/pg_visibility/sql/pg_visibility.sql +++ b/contrib/pg_visibility/sql/pg_visibility.sql @@ -72,6 +72,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition'); select * from pg_check_frozen('test_partition'); -- hopefully none select pg_truncate_visibility_map('test_partition'); +-- test copy freeze +create table copyfreeze (a int, b char(1500)); + +-- load all rows via COPY FREEZE and ensure that all pages are set all-visible +-- and all-frozen. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +1 '1' +2 '2' +3 '3' +4 '4' +5 '5' +6 '6' +7 '7' +8 '8' +9 '9' +10 '10' +11 '11' +12 '12' +\. +commit; +select * from pg_visibility_map('copyfreeze'); +select * from pg_check_frozen('copyfreeze'); + +-- load half the rows via regular COPY and rest via COPY FREEZE. The pages +-- which are touched by regular COPY must not be set all-visible/all-frozen. On +-- the other hand, pages allocated by COPY FREEZE should be marked +-- all-frozen/all-visible. +begin; +truncate copyfreeze; +copy copyfreeze from stdin; +1 '1' +2 '2' +3 '3' +4 '4' +5 '5' +6 '6' +\. +copy copyfreeze from stdin freeze; +7 '7' +8 '8' +9 '9' +10 '10' +11 '11' +12 '12' +\. +commit; +select * from pg_visibility_map('copyfreeze'); +select * from pg_check_frozen('copyfreeze'); + +-- Try a mix of regular COPY and COPY FREEZE. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +1 '1' +2 '2' +3 '3' +4 '4' +5 '5' +\. +copy copyfreeze from stdin; +6 '6' +\. +copy copyfreeze from stdin freeze; +7 '7' +8 '8' +9 '9' +10 '10' +11 '11' +12 '12' +\. +commit; +select * from pg_visibility_map('copyfreeze'); +select * from pg_check_frozen('copyfreeze'); + -- cleanup drop table test_partitioned; drop view test_view; @@ -81,3 +157,4 @@ drop server dummy_server; drop foreign data wrapper dummy; drop materialized view matview_visibility_test; drop table regular_table; +drop table copyfreeze; diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 29694b8aa4..614958e5ee 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2114,6 +2114,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples, int ndone; PGAlignedBlock
Re: pg_upgrade fails with non-standard ACL
On 17.12.2019 11:10, Arthur Zakirov wrote: On 2019/12/05 11:31, Michael Paquier wrote: On Wed, Dec 04, 2019 at 06:15:52PM +0900, Arthur Zakirov wrote: Ah, I thought that pg_identify_object() gives properly quoted identity, and it could be used to make SQL script. It depends on the object type. For columns I can see in your patch that you are using a dedicated code path, but I don't think that we should assume that there is an exact one-one mapping between the object type and the grammar of GRANT/REVOKE for the object type supported because both are completely independent things and facilities. Take for example foreign data wrappers. Of course for this patch this depends if we have system object of the type which would be impacted. That's not the case of foreign data wrappers and likely it will never be, but there is no way to be sure that this won't get broken silently in the future. I attached new version of the patch. It still uses pg_identify_object(), I'm not sure about other ways to build identities yet. Michael, do I understand it correctly that your concerns about pg_identify_object() relate only to the revoke sql script? Now, I tend to agree that we should split this patch into two separate parts, to make it simpler. The first patch is to find affected objects and print warnings and the second is to generate script. I added "test_rename_catalog_objects.patch" and "test_add_acl_to_catalog_objects.sql". So testing steps are following: - initialize new source instance (e.g. pg v12) and run "test_add_acl_to_catalog_objects.sql" on it - apply "pg_upgrade_ACL_check_v6.patch" and "test_add_acl_to_catalog_objects.sql" for HEAD - initialize new target instance for HEAD - run pg_upgrade, it should create revoke_objects.sql file "test_rename_catalog_objects.patch" should be applied after "pg_upgrade_ACL_check_v6.patch". Renamed objects are the following: - table pg_subscription -> pg_sub - columns pg_subscription.subenabled -> subactive, pg_subscription.subconninfo -> subconn - view pg_stat_subscription -> pg_stat_sub - columns pg_stat_subscription.received_lsn -> received_location, pg_stat_subscription.latest_end_lsn -> latest_end_location - function pg_stat_get_subscription -> pg_stat_get_sub - language sql -> pgsql I've tried to test it. Script is attached. Described test case works. If a new cluster contains renamed objects, /pg_upgrade --check/ successfully finds them and generates a script to revoke non-default ACL. Though, without test_rename_catalog_objects.patch, /pg_upgrade --check/ still generates the same message, which is false positive. I am going to fix it and send the updated patch later this week. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company pg_upgrade_ACL_test.sh Description: application/shellscript
Re: backup manifests
On Mon, Mar 23, 2020 at 11:43 PM Amit Kapila wrote: > All others except one are passing now. See the summary of the failed > test below and attached are failed run logs. > > Test Summary Report > --- > t/003_corruption.pl (Wstat: 65280 Tests: 14 Failed: 0) > Non-zero exit status: 255 > Parse errors: Bad plan. You planned 44 tests but ran 14. > Files=6, Tests=123, 164 wallclock secs ( 0.06 usr + 0.02 sys = 0.08 CPU) > Result: FAIL Hmm. It looks like it's trying to remove the symlink that points to the tablespace directory, and failing with no error message. I could set that permutation to be skipped on Windows, or maybe there's an alternate method you can suggest that would work? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Implement INSERT SET syntax
Hi Tom, On 12/3/19 4:44 AM, Gareth Palmer wrote: On Sun, Dec 1, 2019 at 4:32 PM Michael Paquier wrote: On Fri, Nov 22, 2019 at 12:24:15PM +1300, Gareth Palmer wrote: Attached is an updated patch with for_locking_clause added, test-cases re-use existing tables and the comments and documentation have been expanded. Per the automatic patch tester, documentation included in the patch does not build. Could you please fix that? I have moved the patch to next CF, waiting on author. Attached is a fixed version. Does this version of the patch address your concerns? Regards, -- -David da...@pgmasters.net
Re: Index Skip Scan
> On Wed, Mar 11, 2020 at 06:56:09PM +0800, Andy Fan wrote: > > There was a dedicated thread [1] where David explain his idea very > detailed, and you can also check that messages around that message for > the context. hope it helps. Thanks for pointing out to this thread! Somehow I've missed it, and now looks like we need to make some efforts to align patches for index skip scan and distincClause elimination.
Re: Index Skip Scan
> On Wed, Mar 11, 2020 at 11:17:51AM +1300, David Rowley wrote: > > Yes, I was complaining that a ProjectionPath breaks the optimisation > and I don't believe there's any reason that it should. > > I believe the way to make that work correctly requires paying > attention to the Path's uniquekeys rather than what type of path it > is. Thanks for the suggestion. As a result of the discussion I've modified the patch, does it look similar to what you had in mind? In this version if all conditions are met and there are corresponding unique keys, a new index skip scan path will be added to unique_pathlists. In case if requested distinct clauses match with unique keys, create_distinct_paths can choose this path without needen to know what kind of path is it. Also unique_keys are passed through ProjectionPath, so optimization for the example mentioned in this thread before now should work (I've added one test for that). I haven't changed anything about UniqueKey structure itself (one of the suggestions was about Expr instead of EquivalenceClass), but I believe we need anyway to figure out how two existing imlementation (in this patch and from [1]) of this idea can be connected. [1]: https://www.postgresql.org/message-id/flat/CAKU4AWrwZMAL%3DuaFUDMf4WGOVkEL3ONbatqju9nSXTUucpp_pw%40mail.gmail.com >From c7af8157da82db1cedf02e6ec0de355b56275680 Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Tue, 24 Mar 2020 17:04:32 +0100 Subject: [PATCH v33 1/2] Unique key Design by David Rowley. Author: Jesper Pedersen --- src/backend/nodes/outfuncs.c | 14 ++ src/backend/nodes/print.c | 39 +++ src/backend/optimizer/path/Makefile | 3 +- src/backend/optimizer/path/allpaths.c | 8 +++ src/backend/optimizer/path/indxpath.c | 41 src/backend/optimizer/path/pathkeys.c | 71 ++- src/backend/optimizer/plan/planagg.c | 1 + src/backend/optimizer/plan/planmain.c | 1 + src/backend/optimizer/plan/planner.c | 37 +- src/backend/optimizer/util/pathnode.c | 46 + src/include/nodes/nodes.h | 1 + src/include/nodes/pathnodes.h | 19 +++ src/include/nodes/print.h | 1 + src/include/optimizer/pathnode.h | 2 + src/include/optimizer/paths.h | 11 + 15 files changed, 272 insertions(+), 23 deletions(-) diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index d76fae44b8..16083e7a7e 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1723,6 +1723,7 @@ _outPathInfo(StringInfo str, const Path *node) WRITE_FLOAT_FIELD(startup_cost, "%.2f"); WRITE_FLOAT_FIELD(total_cost, "%.2f"); WRITE_NODE_FIELD(pathkeys); + WRITE_NODE_FIELD(uniquekeys); } /* @@ -2205,6 +2206,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node) WRITE_NODE_FIELD(eq_classes); WRITE_BOOL_FIELD(ec_merging_done); WRITE_NODE_FIELD(canon_pathkeys); + WRITE_NODE_FIELD(canon_uniquekeys); WRITE_NODE_FIELD(left_join_clauses); WRITE_NODE_FIELD(right_join_clauses); WRITE_NODE_FIELD(full_join_clauses); @@ -2214,6 +2216,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node) WRITE_NODE_FIELD(placeholder_list); WRITE_NODE_FIELD(fkey_list); WRITE_NODE_FIELD(query_pathkeys); + WRITE_NODE_FIELD(query_uniquekeys); WRITE_NODE_FIELD(group_pathkeys); WRITE_NODE_FIELD(window_pathkeys); WRITE_NODE_FIELD(distinct_pathkeys); @@ -2401,6 +2404,14 @@ _outPathKey(StringInfo str, const PathKey *node) WRITE_BOOL_FIELD(pk_nulls_first); } +static void +_outUniqueKey(StringInfo str, const UniqueKey *node) +{ + WRITE_NODE_TYPE("UNIQUEKEY"); + + WRITE_NODE_FIELD(eq_clause); +} + static void _outPathTarget(StringInfo str, const PathTarget *node) { @@ -4092,6 +4103,9 @@ outNode(StringInfo str, const void *obj) case T_PathKey: _outPathKey(str, obj); break; + case T_UniqueKey: +_outUniqueKey(str, obj); +break; case T_PathTarget: _outPathTarget(str, obj); break; diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c index 42476724d8..d286b34544 100644 --- a/src/backend/nodes/print.c +++ b/src/backend/nodes/print.c @@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable) printf(")\n"); } +/* + * print_uniquekeys - + * uniquekeys list of UniqueKeys + */ +void +print_uniquekeys(const List *uniquekeys, const List *rtable) +{ + ListCell *l; + + printf("("); + foreach(l, uniquekeys) + { + UniqueKey *unique_key = (UniqueKey *) lfirst(l); + EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause; + ListCell *k; + bool first = true; + + /* chase up */ + while (eclass->ec_merged) + eclass = eclass->ec_merged; + + printf("("); + foreach(k, eclass->ec_members) + { + EquivalenceMember *mem = (EquivalenceMember *) lfirst(k); + + if (first) +first = false; + else +printf(", "); + print_expr((Node *) mem->em_expr,
Re: Built-in connection pooler
Hi David, On 24.03.2020 16:26, David Steele wrote: Hi Konstantin, On 11/14/19 2:06 AM, Konstantin Knizhnik wrote: Attached please find rebased patch with this bug fixed. This patch no longer applies: http://cfbot.cputube.org/patch_27_2067.log CF entry has been updated to Waiting on Author. Rebased version of the patch is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/spi/refint.c b/contrib/spi/refint.c index 6fbfef2..27aa6cb 100644 --- a/contrib/spi/refint.c +++ b/contrib/spi/refint.c @@ -11,6 +11,7 @@ #include "commands/trigger.h" #include "executor/spi.h" +#include "storage/proc.h" #include "utils/builtins.h" #include "utils/memutils.h" #include "utils/rel.h" @@ -94,6 +95,8 @@ check_primary_key(PG_FUNCTION_ARGS) else tuple = trigdata->tg_newtuple; + MyProc->is_tainted = true; + trigger = trigdata->tg_trigger; nargs = trigger->tgnargs; args = trigger->tgargs; @@ -286,6 +289,8 @@ check_foreign_key(PG_FUNCTION_ARGS) /* internal error */ elog(ERROR, "check_foreign_key: cannot process INSERT events"); + MyProc->is_tainted = true; + /* Have to check tg_trigtuple - tuple being deleted */ trigtuple = trigdata->tg_trigtuple; diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 355b408..23210ba 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -732,6 +732,169 @@ include_dir 'conf.d' + + max_sessions (integer) + + max_sessions configuration parameter + + + + + The maximum number of client sessions that can be handled by + one connection proxy when session pooling is enabled. + This parameter does not add any memory or CPU overhead, so + specifying a large max_sessions value + does not affect performance. + If the max_sessions limit is reached new connections are not accepted. + + + The default value is 1000. This parameter can only be set at server start. + + + + + + session_pool_size (integer) + + session_pool_size configuration parameter + + + + + Enables session pooling and defines the maximum number of + backends that can be used by client sessions for each database/user combination. + Launched non-tainted backends are never terminated even if there are no active sessions. + Backend is considered as tainted if client updates GUCs, creates temporary table or prepared statements. + Tainted backend can server only one client. + + + The default value is 10, so up to 10 backends will serve each database, + + + + + + proxy_port (integer) + + proxy_port configuration parameter + + + + + Sets the TCP port for the connection pooler. + Clients connected to main "port" will be assigned dedicated backends, + while client connected to proxy port will be connected to backends through proxy which + performs transaction level scheduling. + + + The default value is 6543. + + + + + + connection_proxies (integer) + + connection_proxies configuration parameter + + + + + Sets number of connection proxies. + Postmaster spawns separate worker process for each proxy. Postmaster scatters connections between proxies using one of scheduling policies (round-robin, random, load-balancing). + Each proxy launches its own subset of backends. + So maximal number of non-tainted backends is session_pool_size*connection_proxies*databases*roles. + + + The default value is 0, so session pooling is disabled. + + + + + + session_schedule (enum) + + session_schedule configuration parameter + + + + + Specifies scheduling policy for assigning session to proxies in case of + connection pooling. Default policy is round-robin. + + + With round-robin policy postmaster cyclicly scatter sessions between proxies. + + + With random policy postmaster randomly choose proxy for new session. + + + With load-balancing policy postmaster choose proxy with lowest load average. + Load average of proxy is estimated by number of clients connection assigned to this proxy with extra weight for SSL connections. + + + + + + idle_pool_worker_timeout (integer) + + idle_pool_worker_timeout configuration parameter + + + + + Terminate an idle connection pool worker after the specified number of milliseconds. + The default value is 0, so
Re: Control your disk usage in PG: Introduction to Disk Quota Extension
On 12/2/19 1:39 AM, Haozhou Wang wrote: Thank you very much for your email. I rebased the code with the newest master and attached it in the attachment. This patch applies but no longer builds on Linux or Windows: https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/666113036 https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.85284 The CF entry has been updated to Waiting on Author. Regards, -- -David da...@pgmasters.net
Re: Creating foreign key on partitioned table is too slow
On 2020-Mar-24, David Steele wrote: > This patch still applies but there seems to be some disagreement on > how to proceed. Actually, I don't think there's any disagreement regarding the patch I last posted. (There was disagreement on the previous patches, which were very different). Tom suggested to look at the heuristics used for RECOVER_RELATION_BUILD_MEMORY, and the patch does exactly that. It would be great if Kato Sho can try the original test case with my latest patch (the one in https://postgr.es/m/20191113214544.GA16060@alvherre.pgsql ) and let us know if it improves things. The patch as posted generates these warnings in my current GCC that it didn't when I checked last, but they're harmless -- if/when I push, it'll be without the parens. /pgsql/source/master/src/backend/utils/cache/relcache.c:1064:21: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((relp->relkind == RELKIND_PARTITIONED_TABLE) ~~^~~~ /pgsql/source/master/src/backend/utils/cache/relcache.c:1064:21: note: remove extraneous parentheses around the comparison to silence this warning if ((relp->relkind == RELKIND_PARTITIONED_TABLE) ~ ^ ~ /pgsql/source/master/src/backend/utils/cache/relcache.c:1064:21: note: use '=' to turn this equality comparison into an assignment if ((relp->relkind == RELKIND_PARTITIONED_TABLE) ^~ = /pgsql/source/master/src/backend/utils/cache/relcache.c:1242:33: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ~~^~~~ /pgsql/source/master/src/backend/utils/cache/relcache.c:1242:33: note: remove extraneous parentheses around the comparison to silence this warning if ((relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ~ ^ ~ /pgsql/source/master/src/backend/utils/cache/relcache.c:1242:33: note: use '=' to turn this equality comparison into an assignment if ((relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ^~ = 2 warnings generated. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: replay pause vs. standby promotion
Hello > I pushed the latest version of the patch. If you have further opinion > about immediate promotion, let's keep discussing that! Thank you! Honestly, I forgot that the promotion is documented in high-availability.sgml as: > Before failover, any WAL immediately available in the archive or in pg_wal > will be > restored, but no attempt is made to connect to the master. I mistakenly thought that promote should be "immediately"... > If a promotion is triggered while recovery is paused, the paused state ends > and a promotion continues. Could we add a few words in func.sgml to clarify the behavior? Especially for users from my example above. Something like: > If a promotion is triggered while recovery is paused, the paused state ends, > replay of any WAL immediately available in the archive or in pg_wal will be > continued and then a promotion will be completed. regards, Sergei
Re: documentation pdf build fail (HEAD)
Peter Eisentraut writes: > On 2020-03-24 15:31, Tom Lane wrote: >> The problem seems to be that cedffbdb8 >> introduced some broken table markup. I wonder why xmllint >> failed to catch it? > It's not a validity issue in the DocBook markup. The error comes from > FOP, which complains because it requires the column count, but other > processors don't necessarily require it. Maybe not, but if the count is there, shouldn't it be checked? In this particular case, the table was obviously broken if you looked at the rendered HTML, but I'd kind of expect the toolchain to provide basic sanity checks without having to do that. regards, tom lane
Re: extension patch of CREATE OR REPLACE TRIGGER
On 12/1/19 8:56 PM, osumi.takami...@fujitsu.com wrote: The latest patch includes calls to heap_open(), causing its compilation to fail. Could you please send a rebased version of the patch? I have moved the entry to next CF, waiting on author. Thanks. I've fixed where you pointed out. This patch no longer applies: http://cfbot.cputube.org/patch_27_2307.log CF entry has been updated to Waiting on Author. Also, I'm waiting for other kind of feedbacks from anyone. Hopefully a re-based patch will help with that. Regards, -- -David da...@pgmasters.net
Re: pgbench - add \aset to store results of a combined query
On 11/29/19 4:34 AM, Fabien COELHO wrote: It seems to me that there is no point to have the variable aset in Command because this structure includes already MetaCommand, so the information is duplicated. [...] Perhaps I am missing something? Yep. ISTM that you are missing that aset is not an independent meta command like most others but really changes the state of the previous SQL command, so that it needs to be stored into that with some additional fields. This is the same with "gset" which is tagged by a non-null "varprefix". So I cannot remove the "aset" field. And I would suggest to change readCommandResponse() to use a MetaCommand in argument. MetaCommand is not enough: we need varprefix, and then distinguishing between aset and gset. Although this last point can be done with a MetaCommand, ISTM that a bool is_aset is clear and good enough. It is possible to switch if you insist on it, but I do not think it is desirable. Michael, do you agree with Fabien's comments? Attached v4 removes an unwanted rebased comment duplication and does minor changes while re-reading the code. This patch no longer applies: http://cfbot.cputube.org/patch_27_2091.log CF entry has been updated to Waiting on Author. Regards, -- -David da...@pgmasters.net
Re: [patch]socket_timeout in interfaces/libpq
On 11/29/19 12:22 AM, nagaura.ryo...@fujitsu.com wrote: From: Michael Paquier On Wed, Jun 26, 2019 at 11:56:28AM +, nagaura.ryo...@fujitsu.com wrote: It seems that you did not think so at that time. # Please refer to [1] I don't think all the reviewers are completely negative. I recall having a negative impression on the patch when first looking at it, and still have the same impression when looking at the last version. Just with a quick look, assuming that you can bypass all cleanup operations normally taken by pqDropConnection() through a hijacking of pqWait() is not fine as this routine explicitely assumes to *never* have a timeout for its wait. > I couldn't understand what you meant. Do you say that we shouldn't change pqWait() behavior? Or should I modify my patch to use pqDropConnection()? This patch no longer applies: http://cfbot.cputube.org/patch_27_2175.log CF entry has been updated to Waiting on Author. More importantly it looks like there is still no consensus on this patch, which is an uncommitted part of a previous patch [1]. Unless somebody chimes in I'll mark this Returned with Feedback at the end of the CF. Regards, -- -David da...@pgmasters.net [1] https://www.postgresql.org/message-id/raw/20190406065428.GA2145%40paquier.xyz
Re: psql - improve test coverage from 41% to 88%
Hi Fabien, On 11/27/19 11:01 PM, Michael Paquier wrote: On Wed, Nov 27, 2019 at 10:14:16AM +0100, Fabien COELHO wrote: Indeed, I did not notice. This patch no longer applies: http://cfbot.cputube.org/patch_27_2262.log CF entry has been updated to Waiting on Author. Regards, -- -David da...@pgmasters.net
Re: documentation pdf build fail (HEAD)
On 2020-03-24 15:31, Tom Lane wrote: The problem seems to be that cedffbdb8 introduced some broken table markup. I wonder why xmllint failed to catch it? It's not a validity issue in the DocBook markup. The error comes from FOP, which complains because it requires the column count, but other processors don't necessarily require it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Creating foreign key on partitioned table is too slow
On 11/13/19 4:45 PM, Alvaro Herrera wrote: > But it is not totally unreasonable to have lots of partitions, and as we improve the system, more and more people will want to. +1 This patch still applies but there seems to be some disagreement on how to proceed. Any thoughts? Regards, -- -David da...@pgmasters.net
Re: color by default
On Mon, Mar 23, 2020 at 9:32 AM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > > I'm giving up on making color the default, since there is clearly no > consensus. > > Attached is the documentation patch reworked. > I think there is also some value in adding the functionality proposed in terminal_supports_color(). Regards, Juan José Santamaría Flecha
Re: Additional improvements to extended statistics
On Tue, Mar 24, 2020 at 01:20:07PM +, Dean Rasheed wrote: On Tue, 24 Mar 2020 at 01:08, Tomas Vondra wrote: Hmmm. So let's consider a simple OR clause with two arguments, both covered by single statistics object. Something like this: CREATE TABLE t (a int, b int); INSERT INTO t SELECT mod(i, 10), mod(i, 10) FROM generate_series(1,10); CREATE STATISTICS s (mcv) ON a,b FROM t; SELECT * FROM t WHERE a = 0 OR b = 0; Which is estimated correctly... Hmm, the reason that is estimated correctly is that the MCV values cover all values in the table, so mcv_totalsel is 1 (or pretty close to 1), and other_sel is capped to nearly 0, and so the result is basically just mcv_sel -- i.e., in this case the MCV estimates are returned more-or-less as-is, rather than being applied as a correction, and so the result is independent of how many times extended stats are applied. The more interesting (and probably more realistic) case is where the MCV values do not cover the all values in the table, as in the new mcv_lists_partial examples in the regression tests, for example this test case, which produces a significant overestimate: SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 0 although actually, I think there's another reason for that (in addition to the extended stats correction being applied twice) -- the way the MCV code makes use of base selectivity doesn't seem really appropriate for OR clauses, because the way base_frequency is computed is really based on the assumption that every column would be matched. I'm not sure that there's any easy answer to that though. I feels like what's needed when computing the match bitmaps in mcv.c, is to produce a bitmap (would it fit in 1 byte?) per value, to indicate which columns match, and base_frequency values per column. That would be significantly more work though, so almost certainly isn't feasible for PG13. Good point. I haven't thought about the base frequencies. I think 1 byte should be enough, as we limit the number of columns to 8. IIUC the problem you have in mind is that we end up calling statext_mcv_clauselist_selectivity twice, once for the top-level AND clause with a single argument, and then recursively for the expanded OR clause. Indeed, that seems to be applying the correction twice. >I'm not sure what's the best way to resolve that. Perhaps >statext_mcv_clauselist_selectivity() / statext_is_compatible_clause() >should ignore OR clauses from an AND-list, on the basis that they will >get processed recursively later. Or perhaps estimatedclauses can >somehow be used to prevent this, though I'm not sure exactly how that >would work. I don't know. I feel uneasy about just ignoring some of the clauses, because what happens for complex clauses, where the OR is not directly in the AND clause, but is negated or something like that? Isn't it the case that clauselist_selectivity_simple (and the OR variant) should ignore extended stats entirely? That is, we'd need to add a flag (or _simple variant) to clause_selectivity, so that it calls causelist_selectivity_simple_or. So the calls would look like this: clauselist_selectivity statext_clauselist_selectivity statext_mcv_clauselist_selectivity clauselist_selectivity_simple <--- disable extended stats clause_selectivity clauselist_selectivity_simple_or clause_selectivity clause_selectivity mcv_clauselist_selectivity clauselist_selectivity_simple already estimated I've only quickly hacked clause_selectivity, but it does not seems very invasive (of course, it means disruption to clause_selectivity callers, but I suppose most call clauselist_selectivity). Sounds like a reasonable approach, but I think it would be better to preserve the current public API by having clauselist_selectivity() become a thin wrapper around a new function that optionally applies extended stats. OK, makes sense. I'll take a stab at it. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: documentation pdf build fail (HEAD)
Erikjan Rijkers writes: > I build the pdf (for HEAD) almost daily without problems, but at the > moment I get the error below. > I am not sure whether to blame my particular setup (debian stretch), or > whether there might be an error in the .sgml. The html files still > build OK. Yeah, I see it too. The problem seems to be that cedffbdb8 introduced some broken table markup. I wonder why xmllint failed to catch it? While catching morerows mistakes might be hard in general, it shouldn't have been difficult to notice that this table row contained more columns than the table spec allowed. > If anyone has a suggestion on how to tackle this I'd be grateful. The "position" noted in the error report seems to be a line number and column number in the .fo file. Once you go there and look around at surrounding text, you can locate the matching .sgml input and then try to eyeball what's wrong with it. Fix pushed. regards, tom lane
Re: error context for vacuum to include block number
On Tue, 24 Mar 2020 at 22:37, Amit Kapila wrote: > > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada > wrote: > > > > > > I've read through the latest version patch (v31), here are my comments: > > > > 1. > > +/* Update error traceback information */ > > +olderrcbarg = *vacrelstats; > > +update_vacuum_error_cbarg(vacrelstats, > > + VACUUM_ERRCB_PHASE_TRUNCATE, > > new_rel_pages, NULL, > > + false); > > + > > /* > > * Scan backwards from the end to verify that the end pages > > actually > > * contain no tuples. This is *necessary*, not optional, because > > * other backends could have added tuples to these pages whilst we > > * were vacuuming. > > */ > > new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); > > > > We need to set the error context after setting new_rel_pages. > > > > We want to cover the errors raised in count_nondeletable_pages(). In > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which > use to cover those errors, but that was not good as we were > setting/resetting it multiple times and it was not clear such a > separate phase would add any value. I got the point. But if we set the error context before that, I think we need to change the error context message. The error context message of heap truncation phase is "while truncating relation \"%s.%s\" to %u blocks", but cbarg->blkno will be the number of blocks of the current relation. case VACUUM_ERRCB_PHASE_TRUNCATE: if (BlockNumberIsValid(cbarg->blkno)) errcontext("while truncating relation \"%s.%s\" to %u blocks", cbarg->relnamespace, cbarg->relname, cbarg->blkno); break; > > > 2. > > + vacrelstats->relnamespace = > > get_namespace_name(RelationGetNamespace(onerel)); > > + vacrelstats->relname = pstrdup(RelationGetRelationName(onerel)); > > > > I think we can pfree these two variables to avoid a memory leak during > > vacuum on multiple relations. > > > > Yeah, I had also thought about it but I noticed that we are not > freeing for vacrelstats. Also, I think the memory is allocated in > TopTransactionContext which should be freed via > CommitTransactionCommand before vacuuming of the next relation, so not > sure if there is much value in freeing those variables. Right, thank you for explanation. I was concerned memory leak because relation name and schema name could be large compared to vacrelstats but I agree to free them at commit time. > > > 3. > > +/* Phases of vacuum during which we report error context. */ > > +typedef enum > > +{ > > + VACUUM_ERRCB_PHASE_UNKNOWN, > > + VACUUM_ERRCB_PHASE_SCAN_HEAP, > > + VACUUM_ERRCB_PHASE_VACUUM_INDEX, > > + VACUUM_ERRCB_PHASE_VACUUM_HEAP, > > + VACUUM_ERRCB_PHASE_INDEX_CLEANUP, > > + VACUUM_ERRCB_PHASE_TRUNCATE > > +} ErrCbPhase; > > > > Comparing to the vacuum progress phases, there is not a phase for > > final cleanup which includes updating the relation stats. Is there any > > reason why we don't have that phase for ErrCbPhase? > > > > I think we can add it if we want, but it was not clear to me if that is > helpful. Yeah, me too. The most likely place where an error happens is vac_update_relstats but the error message seems to be enough. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Internal key management system
On Tue, Mar 24, 2020 at 02:29:57PM +0900, Masahiko Sawada wrote: > That seems to work fine. > > So we will have pg_cryptokeys within PGDATA and each key is stored > into separate file named the key id such as "sql", "tde-wal" and > "tde-block". I'll update the patch and post. Yes, that makes sense to me. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: documentation pdf build fail (HEAD)
Ubuntu 18.04: no crash, but possibly a side effect: [INFO] FOUserAgent - Rendered page #2685. [INFO] FOUserAgent - Rendered page #2686. [INFO] FOUserAgent - Rendered page #2687. [WARN] FOUserAgent - Destination: Unresolved ID reference "function-encode" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "function-decode" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-altercollation-notes-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-altertable-notes-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-createaggregate-notes-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-createindex-storage-parameters-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-createindex-concurrently-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-createtable-storage-parameters-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-createtable-compatibility-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-declare-notes-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-inserting-params-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-on-conflict-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-prepare-examples-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-reindex-concurrently-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-with-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-from-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-where-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-groupby-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-having-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-window-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-select-list-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-distinct-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-union-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-intersect-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-except-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-orderby-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-limit-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "sql-for-update-share-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "pg-dump-examples-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "app-psql-patterns-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "app-psql-variables-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "app-psql-interpolation-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "app-psql-prompting-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "app-psql-environment-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "app-psql-examples-title" found. [WARN] FOUserAgent - Destination: Unresolved ID reference "app-postgres-single-user-title" found. [INFO] FOUserAgent - Rendered page #2688. [WARN] FOUserAgent - Page 226: Unresolved ID reference "function-decode" found. [WARN] FOUserAgent - Page 226: Unresolved ID reference "function-encode" found. Kind regards, J. Purtz
Re: improve transparency of bitmap-only heap scans
On Tue, Mar 24, 2020 at 1:24 AM Amit Kapila wrote: > > On Fri, Mar 20, 2020 at 7:09 AM James Coleman wrote: > > > > Awesome, thanks for confirming with an actual plan. > > > > > I don't think it matters in nontext mode, but at least in text mode, I > > > think > > > maybe the Unfetched blocks should be output after the exact and lossy > > > blocks, > > > in case someone is parsing it, and because bitmap-only is a relatively new > > > feature. Its output is probably less common than exact/lossy. > > > > I tweaked that (and a comment that didn't reference the change); see > > attached. > > > > Few comments: > 1. > - > - if (tbmres->ntuples >= 0) > + else if (tbmres->ntuples >= 0) > node->exact_pages++; > > How is this change related to this patch? > 2. > + * unfetched_pagestotal number of pages not retrieved due to vm > * prefetch_iterator iterator for prefetching ahead of current page > * prefetch_pages# pages prefetch iterator is ahead of current > * prefetch_targetcurrent target prefetch distance > @@ -1591,6 +1592,7 @@ typedef struct BitmapHeapScanState > Buffer pvmbuffer; > long exact_pages; > long lossy_pages; > + long unfetched_pages; > > Can we name it as skipped_pages? That seems easy enough to do. > 3. Can we add a test or two for this functionality? >From what I can tell the current lossy page count isn't tested either; would we expect the explain output from such a test to be stable across different architectures etc.? James
Re: error context for vacuum to include block number
On Tue, Mar 24, 2020 at 7:18 PM Justin Pryzby wrote: > > On Tue, Mar 24, 2020 at 07:07:03PM +0530, Amit Kapila wrote: > > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada > > wrote: > > > 1. > > > +/* Update error traceback information */ > > > +olderrcbarg = *vacrelstats; > > > +update_vacuum_error_cbarg(vacrelstats, > > > + VACUUM_ERRCB_PHASE_TRUNCATE, > > > new_rel_pages, NULL, > > > + false); > > > + > > > /* > > > * Scan backwards from the end to verify that the end pages > > > actually > > > * contain no tuples. This is *necessary*, not optional, because > > > * other backends could have added tuples to these pages whilst > > > we > > > * were vacuuming. > > > */ > > > new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); > > > > > > We need to set the error context after setting new_rel_pages. > > > > We want to cover the errors raised in count_nondeletable_pages(). In > > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which > > use to cover those errors, but that was not good as we were > > setting/resetting it multiple times and it was not clear such a > > separate phase would add any value. > > I insisted on covering count_nondeletable_pages since it calls ReadBuffer(), > but I think we need to at least set vacrelsats->blkno = new_rel_pages, since > it > may be different, right ? > yeah, that makes sense. > > > 2. > > > + vacrelstats->relnamespace = > > > get_namespace_name(RelationGetNamespace(onerel)); > > > + vacrelstats->relname = pstrdup(RelationGetRelationName(onerel)); > > > > > > I think we can pfree these two variables to avoid a memory leak during > > > vacuum on multiple relations. > > > > Yeah, I had also thought about it but I noticed that we are not > > freeing for vacrelstats. Also, I think the memory is allocated in > > TopTransactionContext which should be freed via > > CommitTransactionCommand before vacuuming of the next relation, so not > > sure if there is much value in freeing those variables. > > One small reason to free them is that (as Tom mentioned upthread) it's good to > ensure that those variables are their own allocation, and not depending on > being able to access relcache or anything else during an unexpected error. > That is a good reason to allocate them separately but not for doing retail free especially when the caller of the function will free the context in which that is allocated. I think Sawada-San's concern was that it will leak memory across the vacuum of multiple relations but that is not the case here. Won't it look odd if we are freeing memory for members of vacrelstats but not for vacrelstats itself? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: error context for vacuum to include block number
On Tue, Mar 24, 2020 at 07:07:03PM +0530, Amit Kapila wrote: > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada > wrote: > > 1. > > +/* Update error traceback information */ > > +olderrcbarg = *vacrelstats; > > +update_vacuum_error_cbarg(vacrelstats, > > + VACUUM_ERRCB_PHASE_TRUNCATE, > > new_rel_pages, NULL, > > + false); > > + > > /* > > * Scan backwards from the end to verify that the end pages > > actually > > * contain no tuples. This is *necessary*, not optional, because > > * other backends could have added tuples to these pages whilst we > > * were vacuuming. > > */ > > new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); > > > > We need to set the error context after setting new_rel_pages. > > We want to cover the errors raised in count_nondeletable_pages(). In > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which > use to cover those errors, but that was not good as we were > setting/resetting it multiple times and it was not clear such a > separate phase would add any value. I insisted on covering count_nondeletable_pages since it calls ReadBuffer(), but I think we need to at least set vacrelsats->blkno = new_rel_pages, since it may be different, right ? > > 2. > > + vacrelstats->relnamespace = > > get_namespace_name(RelationGetNamespace(onerel)); > > + vacrelstats->relname = pstrdup(RelationGetRelationName(onerel)); > > > > I think we can pfree these two variables to avoid a memory leak during > > vacuum on multiple relations. > > Yeah, I had also thought about it but I noticed that we are not > freeing for vacrelstats. Also, I think the memory is allocated in > TopTransactionContext which should be freed via > CommitTransactionCommand before vacuuming of the next relation, so not > sure if there is much value in freeing those variables. One small reason to free them is that (as Tom mentioned upthread) it's good to ensure that those variables are their own allocation, and not depending on being able to access relcache or anything else during an unexpected error. -- Justin
documentation pdf build fail (HEAD)
Hello, I build the pdf (for HEAD) almost daily without problems, but at the moment I get the error below. I am not sure whether to blame my particular setup (debian stretch), or whether there might be an error in the .sgml. The html files still build OK. If anyone has a suggestion on how to tackle this I'd be grateful. thanks, Erik Rijkers [...] [INFO] FOUserAgent - Rendered page #526. [INFO] FOUserAgent - Rendered page #527. [INFO] FOUserAgent - Rendered page #528. [INFO] FOUserAgent - Rendered page #529. [[ERROR] FOP - Exception org.apache.fop.fo.ValidationException: The column-number or number of cells in the row overflows the number of fo:table-columns specified for the table. (See position 47337:52207) javax.xml.transform.TransformerException: org.apache.fop.fo.ValidationException: The column-number or number of cells in the row overflows the number of fo:table-columns specified for the table. (See position 47337:52207)>org.apache.fop.apps.FOPException: org.apache.fop.fo.ValidationException: The column-number or number of cells in the row overflows the number of fo:table-columns specified for the table. (See position 47337:52207) javax.xml.transform.TransformerException: org.apache.fop.fo.ValidationException: The column-number or number of cells in the row overflows the number of fo:table-columns specified for the table. (See position 47337:52207) at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:289) at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:115) at org.apache.fop.cli.Main.startFOP(Main.java:186) at org.apache.fop.cli.Main.main(Main.java:217) Caused by: javax.xml.transform.TransformerException: org.apache.fop.fo.ValidationException: The column-number or number of cells in the row overflows the number of fo:table-columns specified for the table. (See position 47337:52207) at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:502) at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:286) ... 3 more Caused by: org.apache.fop.fo.ValidationException: The column-number or number of cells in the row overflows the number of fo:table-columns specified for the table. (See position 47337:52207) at org.apache.fop.events.ValidationExceptionFactory.createException(ValidationExceptionFactory.java:38) at org.apache.fop.events.EventExceptionManager.throwException(EventExceptionManager.java:58) at org.apache.fop.events.DefaultEventBroadcaster$1.invoke(DefaultEventBroadcaster.java:175) at com.sun.proxy.$Proxy4.tooManyCells(Unknown Source) at org.apache.fop.fo.flow.table.TableCellContainer.addTableCellChild(TableCellContainer.java:75) at org.apache.fop.fo.flow.table.TableRow.addChildNode(TableRow.java:95) at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.startElement(FOTreeBuilder.java:324) at org.apache.fop.fo.FOTreeBuilder.startElement(FOTreeBuilder.java:179) at org.apache.xalan.transformer.TransformerIdentityImpl.startElement(TransformerIdentityImpl.java:1073) at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source) at org.apache.xerces.xinclude.XIncludeHandler.startElement(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485) ... 4 more
Re: error context for vacuum to include block number
On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada wrote: > > > I've read through the latest version patch (v31), here are my comments: > > 1. > +/* Update error traceback information */ > +olderrcbarg = *vacrelstats; > +update_vacuum_error_cbarg(vacrelstats, > + VACUUM_ERRCB_PHASE_TRUNCATE, > new_rel_pages, NULL, > + false); > + > /* > * Scan backwards from the end to verify that the end pages actually > * contain no tuples. This is *necessary*, not optional, because > * other backends could have added tuples to these pages whilst we > * were vacuuming. > */ > new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); > > We need to set the error context after setting new_rel_pages. > We want to cover the errors raised in count_nondeletable_pages(). In an earlier version of the patch, we had TRUNCATE_PREFETCH phase which use to cover those errors, but that was not good as we were setting/resetting it multiple times and it was not clear such a separate phase would add any value. > 2. > + vacrelstats->relnamespace = > get_namespace_name(RelationGetNamespace(onerel)); > + vacrelstats->relname = pstrdup(RelationGetRelationName(onerel)); > > I think we can pfree these two variables to avoid a memory leak during > vacuum on multiple relations. > Yeah, I had also thought about it but I noticed that we are not freeing for vacrelstats. Also, I think the memory is allocated in TopTransactionContext which should be freed via CommitTransactionCommand before vacuuming of the next relation, so not sure if there is much value in freeing those variables. > 3. > +/* Phases of vacuum during which we report error context. */ > +typedef enum > +{ > + VACUUM_ERRCB_PHASE_UNKNOWN, > + VACUUM_ERRCB_PHASE_SCAN_HEAP, > + VACUUM_ERRCB_PHASE_VACUUM_INDEX, > + VACUUM_ERRCB_PHASE_VACUUM_HEAP, > + VACUUM_ERRCB_PHASE_INDEX_CLEANUP, > + VACUUM_ERRCB_PHASE_TRUNCATE > +} ErrCbPhase; > > Comparing to the vacuum progress phases, there is not a phase for > final cleanup which includes updating the relation stats. Is there any > reason why we don't have that phase for ErrCbPhase? > I think we can add it if we want, but it was not clear to me if that is helpful. > The type name ErrCbPhase seems to be very generic name, how about > VacErrCbPhase or VacuumErrCbPhase? > It sounds like a better name. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: truncating timestamps on arbitrary intervals
Hello John, this looks like a nice feature. I'm wondering how it relates to the following use-case: When drawing charts, the user can select pre-defined widths on times (like "15 min", "1 hour"). The data for these slots is fitted always to intervalls that start in 0 in the slot, e.g. if the user selects "15 min", the interval always starts at xx:00, xx:15, xx:30 or xx:45. This is to aid caching of the resulting data, and so that requesting the same chart at xx:00 and xx:01 actually draws the same chart, and not some bar with only one minute data at at the end. In PSQL, this works out to using this as GROUP BY and then summing up all values: SELECT FLOOR(EXTRACT(EPOCH from thetime) / 3600) * 3600, SUM(events) FROM mytable ... GROUP BY 1; whereas here 3600 means "hourly". It is of course easy for things like "1 hour", but for yearly I had to come up with things like: EXTRAC(YEAR FROM thetime) * 12 + EXTRACT(MONTH FROM thetime) and its gets even more involved for weeks, weekdays or odd things like fortnights. So would that mean one could replace this by: date_trunc_interval('3600 seconds'::interval, thetime) and it would be easier, and (hopefully) faster? Best regards, Tels
Re: [Proposal] Global temporary tables
Hi Wenjing, Please check my findings(on gtt_v20.patch) as below: *TestCase1:* (cache lookup failed on GTT) -- Session1: postgres=# create global temporary table gtt1(c1 int) on commit delete rows; CREATE TABLE -- Session2: postgres=# drop table gtt1 ; DROP TABLE -- Session1: postgres=# create global temporary table gtt1(c1 int) on commit delete rows; ERROR: cache lookup failed for relation 16384 *TestCase2:* -- Session1: postgres=# create global temporary table gtt (c1 integer) on commit preserve rows; CREATE TABLE postgres=# insert into gtt values(10); INSERT 0 1 -- Session2: postgres=# drop table gtt; DROP TABLE I hope "session2" should not allow to perform the "DROP" operation on GTT having data. *Behavior of GTT in Oracle Database in such scenario:* For a completed transaction on GTT with(on_commit_delete_rows='FALSE') with data in a session, we will not be able to DROP from any session, we need to TRUNCATE the data first to DROP the table. SQL> drop table gtt; drop table gtt * ERROR at line 1: ORA-14452: attempt to create, alter or drop an index on temporary table already in use On Tue, Mar 17, 2020 at 9:16 AM 曾文旌(义从) wrote: > > > 2020年3月11日 下午3:52,Prabhat Sahu 写道: > > On Mon, Mar 9, 2020 at 10:02 PM 曾文旌(义从) > wrote: > >> >> >> Fixed in global_temporary_table_v18-pg13.patch. >> > Hi Wenjing, > Thanks for the patch. I have verified the previous issues with > "gtt_v18_pg13.patch" and those are resolved. > Please find below case: > > postgres=# create sequence seq; > CREATE SEQUENCE > > postgres=# CREATE GLOBAL TEMPORARY TABLE gtt1(c1 int PRIMARY KEY) ON > COMMIT DELETE ROWS; > CREATE TABLE > > postgres=# CREATE GLOBAL TEMPORARY TABLE gtt2(c1 int PRIMARY KEY) ON > COMMIT PRESERVE ROWS; > CREATE TABLE > > postgres=# alter table gtt1 add c2 int default nextval('seq'); > ERROR: cannot reindex global temporary tables > > postgres=# alter table gtt2 add c2 int default nextval('seq'); > ERROR: cannot reindex global temporary tables > > reindex GTT is already supported > > Please check global_temporary_table_v20-pg13.patch > > > Wenjing > > > > > *Note*: We are getting this error if we have a key column(PK/UNIQUE) in a > GTT, and trying to add a column with a default sequence into it. > > -- > > With Regards, > Prabhat Kumar Sahu > EnterpriseDB: http://www.enterprisedb.com > > > -- With Regards, Prabhat Kumar Sahu EnterpriseDB: http://www.enterprisedb.com
Re: Built-in connection pooler
Hi Konstantin, On 11/14/19 2:06 AM, Konstantin Knizhnik wrote: Attached please find rebased patch with this bug fixed. This patch no longer applies: http://cfbot.cputube.org/patch_27_2067.log CF entry has been updated to Waiting on Author. Regards, -- -David da...@pgmasters.net
Re: Additional improvements to extended statistics
On Tue, 24 Mar 2020 at 01:08, Tomas Vondra wrote: > > Hmmm. So let's consider a simple OR clause with two arguments, both > covered by single statistics object. Something like this: > >CREATE TABLE t (a int, b int); >INSERT INTO t SELECT mod(i, 10), mod(i, 10) > FROM generate_series(1,10); >CREATE STATISTICS s (mcv) ON a,b FROM t; > >SELECT * FROM t WHERE a = 0 OR b = 0; > > Which is estimated correctly... > Hmm, the reason that is estimated correctly is that the MCV values cover all values in the table, so mcv_totalsel is 1 (or pretty close to 1), and other_sel is capped to nearly 0, and so the result is basically just mcv_sel -- i.e., in this case the MCV estimates are returned more-or-less as-is, rather than being applied as a correction, and so the result is independent of how many times extended stats are applied. The more interesting (and probably more realistic) case is where the MCV values do not cover the all values in the table, as in the new mcv_lists_partial examples in the regression tests, for example this test case, which produces a significant overestimate: SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 0 although actually, I think there's another reason for that (in addition to the extended stats correction being applied twice) -- the way the MCV code makes use of base selectivity doesn't seem really appropriate for OR clauses, because the way base_frequency is computed is really based on the assumption that every column would be matched. I'm not sure that there's any easy answer to that though. I feels like what's needed when computing the match bitmaps in mcv.c, is to produce a bitmap (would it fit in 1 byte?) per value, to indicate which columns match, and base_frequency values per column. That would be significantly more work though, so almost certainly isn't feasible for PG13. > IIUC the problem you have in mind is that we end up calling > statext_mcv_clauselist_selectivity twice, once for the top-level AND > clause with a single argument, and then recursively for the expanded OR > clause. Indeed, that seems to be applying the correction twice. > > > >I'm not sure what's the best way to resolve that. Perhaps > >statext_mcv_clauselist_selectivity() / statext_is_compatible_clause() > >should ignore OR clauses from an AND-list, on the basis that they will > >get processed recursively later. Or perhaps estimatedclauses can > >somehow be used to prevent this, though I'm not sure exactly how that > >would work. > > I don't know. I feel uneasy about just ignoring some of the clauses, > because what happens for complex clauses, where the OR is not directly > in the AND clause, but is negated or something like that? > > Isn't it the case that clauselist_selectivity_simple (and the OR > variant) should ignore extended stats entirely? That is, we'd need to > add a flag (or _simple variant) to clause_selectivity, so that it calls > causelist_selectivity_simple_or. So the calls would look like this: > >clauselist_selectivity > statext_clauselist_selectivity >statext_mcv_clauselist_selectivity > clauselist_selectivity_simple <--- disable extended stats >clause_selectivity > clauselist_selectivity_simple_or >clause_selectivity >clause_selectivity > mcv_clauselist_selectivity > clauselist_selectivity_simple >already estimated > > I've only quickly hacked clause_selectivity, but it does not seems very > invasive (of course, it means disruption to clause_selectivity callers, > but I suppose most call clauselist_selectivity). > Sounds like a reasonable approach, but I think it would be better to preserve the current public API by having clauselist_selectivity() become a thin wrapper around a new function that optionally applies extended stats. Regards, Dean
Re: Fastpath while arranging the changes in LSN order in logical decoding
On Tue, Mar 24, 2020 at 6:16 PM Amit Kapila wrote: > > On Mon, Mar 9, 2020 at 11:07 PM Andres Freund wrote: > > > > On 2020-03-07 11:15:27 +0530, Dilip Kumar wrote: > > > IMHO, if we conclude that because there is no performance gain so we > > > don't want to add one extra path in the code then we might want to > > > remove that TODO from the code so that we don't spend time for > > > optimizing this in the future. > > > > +1 > > > > Dilip, are you planning to do more tests for this? Anyone else wants > to do more tests? If not, based on current results, we can remove that > TODO and in future, if someone comes with a test case to show benefit > for adding fastpath, then we can consider the patch proposed by Dilip. IMHO, I have tried the best case but did not see any performance gain so I am not planning to test this further. I have attached the patch for removing the TODO. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com 0001-Remove-TODO-comments-for-fast-path.patch Description: Binary data
Re: error context for vacuum to include block number
On Tue, 24 Mar 2020 at 18:19, Amit Kapila wrote: > > On Tue, Mar 24, 2020 at 2:37 PM Masahiko Sawada > wrote: > > > > On Tue, 24 Mar 2020 at 13:53, Amit Kapila wrote: > > > > > > On Tue, Mar 24, 2020 at 9:46 AM Justin Pryzby > > > wrote: > > > > > > > > On Mon, Mar 23, 2020 at 02:25:14PM +0530, Amit Kapila wrote: > > > > > > Yea, and it would be misleading if we reported "while scanning > > > > > > block..of > > > > > > relation" if we actually failed while writing its FSM. > > > > > > > > > > > > My previous patches did this: > > > > > > > > > > > > + case VACUUM_ERRCB_PHASE_VACUUM_FSM: > > > > > > + errcontext("while vacuuming free space map > > > > > > of relation \"%s.%s\"", > > > > > > + cbarg->relnamespace, > > > > > > cbarg->relname); > > > > > > + break; > > > > > > > > > > > > > > > > In what kind of errors will this help? > > > > > > > > If there's an I/O error on an _fsm file, for one. > > > > > > > > > > If there is a read or write failure, then we give error like below > > > which already has required information. > > > ereport(ERROR, > > > (errcode_for_file_access(), > > > errmsg("could not read block %u in file \"%s\": %m", > > > blocknum, FilePathName(v->mdfd_vfd; > > > > Yeah, you're right. We, however, cannot see that the error happened > > while recording freespace or while vacuuming freespace map but perhaps > > we can see the situation by seeing the error message in most cases. So > > I'm okay with the current set of phases. > > > > I'll also test the current version of patch today. > > > > okay, thanks. I've read through the latest version patch (v31), here are my comments: 1. +/* Update error traceback information */ +olderrcbarg = *vacrelstats; +update_vacuum_error_cbarg(vacrelstats, + VACUUM_ERRCB_PHASE_TRUNCATE, new_rel_pages, NULL, + false); + /* * Scan backwards from the end to verify that the end pages actually * contain no tuples. This is *necessary*, not optional, because * other backends could have added tuples to these pages whilst we * were vacuuming. */ new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); We need to set the error context after setting new_rel_pages. 2. + vacrelstats->relnamespace = get_namespace_name(RelationGetNamespace(onerel)); + vacrelstats->relname = pstrdup(RelationGetRelationName(onerel)); I think we can pfree these two variables to avoid a memory leak during vacuum on multiple relations. 3. +/* Phases of vacuum during which we report error context. */ +typedef enum +{ + VACUUM_ERRCB_PHASE_UNKNOWN, + VACUUM_ERRCB_PHASE_SCAN_HEAP, + VACUUM_ERRCB_PHASE_VACUUM_INDEX, + VACUUM_ERRCB_PHASE_VACUUM_HEAP, + VACUUM_ERRCB_PHASE_INDEX_CLEANUP, + VACUUM_ERRCB_PHASE_TRUNCATE +} ErrCbPhase; Comparing to the vacuum progress phases, there is not a phase for final cleanup which includes updating the relation stats. Is there any reason why we don't have that phase for ErrCbPhase? The type name ErrCbPhase seems to be very generic name, how about VacErrCbPhase or VacuumErrCbPhase? Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Fastpath while arranging the changes in LSN order in logical decoding
On Mon, Mar 9, 2020 at 11:07 PM Andres Freund wrote: > > On 2020-03-07 11:15:27 +0530, Dilip Kumar wrote: > > IMHO, if we conclude that because there is no performance gain so we > > don't want to add one extra path in the code then we might want to > > remove that TODO from the code so that we don't spend time for > > optimizing this in the future. > > +1 > Dilip, are you planning to do more tests for this? Anyone else wants to do more tests? If not, based on current results, we can remove that TODO and in future, if someone comes with a test case to show benefit for adding fastpath, then we can consider the patch proposed by Dilip. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Improve checking for pg_index.xmin
On Sun, Jan 12, 2020 at 3:33 AM Alexander Korotkov wrote: > > On Wed, Jan 8, 2020 at 4:37 PM Tom Lane wrote: > > Heikki Linnakangas writes: > > > On 01/11/2019 01:50, Alexander Korotkov wrote: > > >> This happens so, because we're checking that there is no broken HOT > > >> chains after index creation by comparison pg_index.xmin and > > >> TransactionXmin. So, we check that pg_index.xmin is in the past for > > >> current transaction in lossy way by comparison just xmins. Attached > > >> patch changes this check to XidInMVCCSnapshot(). > > >> With patch the issue is gone. My doubt about this patch is that it > > >> changes check with TransactionXmin to check with GetActiveSnapshot(), > > >> which might be more recent. However, query shouldn't be executer with > > >> older snapshot than one it was planned with. > > > > > Hmm. Maybe you could construct a case like that with a creative mix of > > > stable and volatile functions? Using GetOldestSnapshot() would be safer. > > > > I really wonder if this is safe at all. > > > > (1) Can we assume that the query will be executed with same-or-newer > > snapshot as what was used by the planner? There's no such constraint > > in the plancache, I'm pretty sure. > > > > (2) Is committed-ness of the index-creating transaction actually > > sufficient to ensure that none of the broken HOT chains it saw are > > a problem for the onlooker transaction? This is, at best, really > > un-obvious. Some of those HOT chains could involve xacts that were > > still not committed when the index build finished, I believe. > > > > (3) What if the index was made with CREATE INDEX CONCURRENTLY --- > > which xid is actually on the pg_index row, and how does that factor > > into (1) and (2)? > > Thank you for pointing. I'll investigate these issues in detail. > Are you planning to work on this patch [1] for current CF? If not, then I think it is better to either move this to the next CF or mark it as RWF. [1] - https://commitfest.postgresql.org/27/2337/ -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Adding a test for speculative insert abort case
On Wed, Feb 12, 2020 at 6:50 AM Melanie Plageman wrote: > > > On Tue, Feb 11, 2020 at 4:45 PM Andres Freund wrote: >> >> >> I additionally restricted the controller_print_speculative_locks step to >> the current database and made a bunch of debug output more >> precise. Survived ~150 runs locally. >> >> Lets see what the buildfarm says... >> > > Thanks so much for finishing the patch and checking for race > conditions! > Can we change the status of CF entry for this patch [1] to committed or is there any work pending? [1] - https://commitfest.postgresql.org/27/2200/ -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Silence compiler warnings with Python 3.9
On 2020-03-02 14:22, Peter Eisentraut wrote: Starting with Python 3.9, the Python headers contain inline functions that fall afoul of our -Wdeclaration-after-statement coding style. In order to silence those warnings, I've added some GCC-specific contortions to disable that warning for Python.h only. Clang doesn't appear to warn about this at all; maybe it recognizes that this is an external header file. We could also write a configure check for this if we want to be more flexible. (Attempts to convince upstream to change the coding style were unsuccessful (https://bugs.python.org/issue39615).) My fix in cpython was accepted after all and the issue is no longer present in the latest alpha (3.9.0a5), so this can be considered closed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: truncating timestamps on arbitrary intervals
On Sun, Mar 15, 2020 at 2:26 PM I wrote: > > To get more logical behavior, perhaps the optional parameter is better > as an offset instead of an origin. Alternatively (or additionally), > the function could do the math on int64 timestamps directly. For v6, I changed the algorithm to use pg_tm for months and years, and int64 for all smaller units. Despite the split, I think it's easier to read now, and certainly shorter. This has the advantage that now mixing units is allowed, as long as you don't mix months/years with days or smaller, which often doesn't make sense and is not very practical. (not yet documented) One consequence of this is that when operating on months/years, and the origin contains smaller units, the smaller units are ignored. Example: select date_trunc_interval('12 months'::interval, timestamp '2012-03-01 01:21:01', timestamp '2011-03-22'); date_trunc_interval - 2012-03-01 00:00:00 (1 row) Even though not quite a full year has passed, it ignores the days in the origin time and detects a difference in 12 calendar months. That might be fine, although we could also throw an error and say origins must be in the form of '-01-01 00:00:00' when truncating on months and/or years. I added a sketch of documentation for the origin parameter and more tests. On Fri, Mar 13, 2020 at 7:48 PM Isaac Morland wrote: > I'm confused by this. If my calendars are correct, both 1900-01-02 and > 2020-02-11 are Tuesdays. So if the date being adjusted and the origin are > both Tuesday, shouldn't the day part be left alone when truncating to 7 days? > Also, I'd like to confirm that the default starting point for 7 day periods > (weeks) is Monday, per ISO. This is fixed. select date_trunc_interval('7 days'::interval, timestamp '2020-02-11 01:01:01.0', TIMESTAMP '1900-01-02'); date_trunc_interval - 2020-02-11 00:00:00 (1 row) select date_trunc_interval('7 days'::interval, d), count(*) from generate_series( '2020-02-01'::timestamp, '2020-03-31', '1 day') d group by 1 order by 1; date_trunc_interval | count -+--- 2020-01-27 00:00:00 | 2 2020-02-03 00:00:00 | 7 2020-02-10 00:00:00 | 7 2020-02-17 00:00:00 | 7 2020-02-24 00:00:00 | 7 2020-03-02 00:00:00 | 7 2020-03-09 00:00:00 | 7 2020-03-16 00:00:00 | 7 2020-03-23 00:00:00 | 7 2020-03-30 00:00:00 | 2 (10 rows) > Perhaps the starting point for dates should be either 0001-01-01 (the > proleptic beginning of the CE calendar) or 2001-01-01 (the beginning of the > current 400-year repeating cycle of leap years and weeks, and a Monday, > giving the appropriate ISO result for truncating to 7 day periods). I went ahead with 2001-01-01 for the time being. On Thu, Mar 19, 2020 at 4:20 PM Artur Zakirov wrote: > > =# select date_trunc_interval('1.1 year'::interval, TIMESTAMP > '2020-02-01 01:21:01'); > ERROR: only one interval unit allowed for truncation For any lingering cases like this (i don't see any), maybe an error hint is in order. The following works now, as expected for 1 year 1 month: select date_trunc_interval('1.1 year'::interval, timestamp '2002-05-01 01:21:01'); date_trunc_interval - 2002-02-01 00:00:0 I'm going to look into implementing timezone while awaiting comments on v6. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services v6-datetrunc_interval.patch Description: Binary data