Re: Strange coding in _mdfd_openseg()
On Thu, Apr 4, 2019 at 4:16 PM Kyotaro HORIGUCHI wrote: > At Wed, 3 Apr 2019 13:47:46 -0700, Andres Freund wrote > in <20190403204746.2yumq7c2mirmo...@alap3.anarazel.de> > > Yea, I totally agree it's weird. I'm not sure if I'd go for an assertion > > of equality, or just invert the >= (which I agree I probably just > > screwed up and didn't notice when reviewing the patch because it looked > > close enough to correct and it didn't have a measurable effect). > > I looked there and agreed. _mdfd_openseg is always called just to > add one new segment after the last opened segment by the two > callers. So I think == is better. Thanks. Some other little things I noticed while poking around in this area: Callers of _mdgetseg(EXTENSION_RETURN_NULL) expect errno to be set if it returns NULL, and it expects the same of mdopen(EXTERNSION_RETURN_NULL), and yet the latter does: fd = PathNameOpenFile(path, O_RDWR | PG_BINARY); if (fd < 0) { if ((behavior & EXTENSION_RETURN_NULL) && FILE_POSSIBLY_DELETED(errno)) { pfree(path); return NULL; } 1. I guess that needs save_errno treatment to protect it from being clobbered by pfree()? 2. It'd be nice if function documentation explicitly said which functions set errno on error (and perhaps which syscalls). 3. Why does some code use "file < 0" and other code "file <= 0" to detect errors from fd.c functions that return File? -- Thomas Munro https://enterprisedb.com
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi, On 2019-04-05 08:38:34 +0300, Darafei "Komяpa" Praliaskouski wrote: > On Fri, Apr 5, 2019 at 6:58 AM Tom Lane wrote: > > > Andres Freund writes: > > > I think the right approach would be to do all of this in heap_insert and > > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > > > is specified, remember whether it is either currently empty, or is > > > already marked as all-visible. If previously empty, mark it as all > > > visible at the end. If already all visible, there's no need to change > > > that. If not yet all-visible, no need to do anything, since it can't > > > have been inserted with COPY FREEZE. Do you see any problem doing it > > > that way? > > > > Do we want to add overhead to these hot-spot routines for this purpose? > > > > Sizing the overhead: workflows right now don't end with COPY FREEZE - you > need another VACUUM to set maps. > Anything that lets you skip that VACUUM (and faster than that VACUUM > itself) is helping. You specifically asked for it to be skippable with > FREEZE anyway. Tom's point was that the routines I was suggesting to adapt above aren't just used for COPY FREEZE. Greetings, Andres Freund
Re: GiST VACUUM
Hi! > 4 апр. 2019 г., в 20:15, Heikki Linnakangas написал(а): > > On 25/03/2019 15:20, Heikki Linnakangas wrote: >> On 24/03/2019 18:50, Andrey Borodin wrote: >>> I was working on new version of gist check in amcheck and understand one >>> more thing: >>> >>> /* Can this page be recycled yet? */ >>> bool >>> gistPageRecyclable(Page page) >>> { >>> return PageIsNew(page) || >>> (GistPageIsDeleted(page) && >>> TransactionIdPrecedes(GistPageGetDeleteXid(page), >>> RecentGlobalXmin)); >>> } >>> >>> Here RecentGlobalXmin can wraparound and page will become unrecyclable for >>> half of xid cycle. Can we prevent it by resetting PageDeleteXid to >>> InvalidTransactionId before doing RecordFreeIndexPage()? >>> (Seems like same applies to GIN...) >> True, and B-tree has the same issue. I thought I saw a comment somewhere >> in the B-tree code about that earlier, but now I can't find it. I >> must've imagined it. >> We could reset it, but that would require dirtying the page. That would >> be just extra I/O overhead, if the page gets reused before XID >> wraparound. We could avoid that if we stored the full XID+epoch, not >> just XID. I think we should do that in GiST, at least, where this is >> new. In the B-tree, it would require some extra code to deal with >> backwards-compatibility, but maybe it would be worth it even there. > > I suggest that we do the attached. It fixes this for GiST. The patch changes > expands the "deletion XID" to 64-bits, and changes where it's stored. Instead > of storing it pd_prune_xid, it's stored in the page contents. Luckily, a > deleted page has no real content. So, we store full xid right after page header? +static inline void +GistPageSetDeleteXid(Page page, FullTransactionId deletexid) +{ + Assert(PageIsEmpty(page)); + ((PageHeader) page)->pd_lower = MAXALIGN(SizeOfPageHeaderData) + sizeof(FullTransactionId); + + *((FullTransactionId *) PageGetContents(page)) = deletexid; +} Usually we leave one ItemId (located at invalid offset number) untouched. I do not know is it done for a reason or not Also, I did not understand this optimization: + /* +* We can skip this if the page was deleted so long ago, that no scan can possibly +* still see it, even in a standby. One measure might be anything older than the +* table's frozen-xid, but we don't have that at hand here. But anything older than +* 2 billion, from the next XID, is surely old enough, because you would hit XID +* wraparound at that point. +*/ + nextxid = ReadNextFullTransactionId(); + diff = U64FromFullTransactionId(nextxid) - U64FromFullTransactionId(latestRemovedXid); + if (diff < 0x7fff) + return; Standby can be lagging months from primary, and, theoretically, close the gap in one sudden WAL leap... Also, I think, that comparison sign should be >, not <. Best regards, Andrey Borodin.
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
On Fri, Apr 5, 2019 at 6:58 AM Tom Lane wrote: > Andres Freund writes: > > I think the right approach would be to do all of this in heap_insert and > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > > is specified, remember whether it is either currently empty, or is > > already marked as all-visible. If previously empty, mark it as all > > visible at the end. If already all visible, there's no need to change > > that. If not yet all-visible, no need to do anything, since it can't > > have been inserted with COPY FREEZE. Do you see any problem doing it > > that way? > > Do we want to add overhead to these hot-spot routines for this purpose? > Sizing the overhead: workflows right now don't end with COPY FREEZE - you need another VACUUM to set maps. Anything that lets you skip that VACUUM (and faster than that VACUUM itself) is helping. You specifically asked for it to be skippable with FREEZE anyway. -- Darafei Praliaskouski Support me: http://patreon.com/komzpa
Re: Why does "toast_tuple_target" allow values below TOAST_TUPLE_TARGET?
On Fri, 5 Apr 2019 at 18:25, Amit Langote wrote: > > On 2019/04/05 14:09, David Rowley wrote: > > Or rather, why does the reloption allow values below the compile-time > > constant? > > Maybe there is already a discussion in progress on the topic? > > * Caveats from reloption toast_tuple_target * > https://www.postgresql.org/message-id/flat/CABOikdMt%3DmOtzW_ax_8pa9syEPo5Lji%3DLJrN2dunht8K-SLWzg%40mail.gmail.com Doh. Okay, thanks. I'll drop this thread then. Seems what I wanted to ask and say has been asked and said already. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Why does "toast_tuple_target" allow values below TOAST_TUPLE_TARGET?
On 2019/04/05 14:09, David Rowley wrote: > Or rather, why does the reloption allow values below the compile-time > constant? Maybe there is already a discussion in progress on the topic? * Caveats from reloption toast_tuple_target * https://www.postgresql.org/message-id/flat/CABOikdMt%3DmOtzW_ax_8pa9syEPo5Lji%3DLJrN2dunht8K-SLWzg%40mail.gmail.com Thanks, Amit
Re: [GSoC] application ideas
Hi! We are discussing GSoC details offlist, but I'll put some recommendations on your proposal to the list. > 3 апр. 2019 г., в 2:53, pantilimonov misha написал(а): > > Andrey, thank you for your reply. > >> 24.03.2019, 12:12, "Andrey Borodin" : >> >> I like the idea of more efficient BufferTag->Page data structure. I'm not >> sure cache locality is a real problem there, but I believe this idea >> deserves giving it a shot. >> I'd happily review your proposal and co-mentor project, if it will be chosen >> for GSoC. > > Here it is: > > https://docs.google.com/document/d/1HmhOs07zE8Q1TX1pOdtjxHSjjAUjaO2tp9NSmry8muY/edit?usp=sharing While your project is clearly research-oriented, it is planned within PostgreSQL development process. Can you please add some information about which patches are your going to put on commitfest? Also, please plan to review one or more patches. This is important for integrating into community. BTW, there is somewhat related IntegerSet data structure added recently [0]. In my version it was implemented as radix tree. I think it is a good example how generic data structure can be presented and then reused by BufferManager. Best regards, Andrey Borodin. [0] https://github.com/postgres/postgres/commit/df816f6ad532ad685a3897869a2e64d3a53fe312
Why does "toast_tuple_target" allow values below TOAST_TUPLE_TARGET?
Or rather, why does the reloption allow values below the compile-time constant? It looks like we currently allow values as low as 128. The problem there is that heap_update() and heap_prepare_insert() both only bother calling toast_insert_or_update() when the tuple's length is above TOAST_TUPLE_TARGET, so it seems to have no effect when set to a lower value. I don't think we can change heap_update() and heap_prepare_insert() to do "tup->t_len > RelationGetToastTupleTarget(relation, TOAST_TUPLE_TARGET)" instead as such a table might not even have a TOAST relation since needs_toast_table() will return false if it thinks the tuple length can't be above TOAST_TUPLE_TARGET. It does not seem possible to add/remote the toast table when the reloption is changed either as we're only obtaining a ShareUpdateExclusiveLock to set it. We'd likely need to upgrade that to an AccessExclusiveLock to do that. I understand from reading [1] that Simon was mostly interested in keeping values inline for longer. I saw no mention of moving them out of line sooner. [1] https://postgr.es/m/CANP8+jKsVmw6CX6YP9z7zqkTzcKV1+Uzr3XjKcZW=2ya00o...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Block level parallel vacuum
On Fri, Apr 5, 2019 at 4:51 AM Robert Haas wrote: > > On Thu, Apr 4, 2019 at 6:28 AM Masahiko Sawada wrote: > > These patches conflict with the current HEAD. Attached the updated patches. > > They'll need another rebase. > Thank you for the notice. Rebased. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center From 87061bbc5b0c2d7c47b820ed97e6d738fbd1781a Mon Sep 17 00:00:00 2001 From: Masahiko Sawada Date: Thu, 4 Apr 2019 11:42:25 +0900 Subject: [PATCH v23 1/2] Add parallel option to VACUUM command This change adds PARALLEL option to VACUUM command that enable us to perform index vacuuming and index cleanup with parallel workers. Indivisual indexes are processed by one vacuum process. Therefore parallel vacuum can be used when the table has at least two indexes and we cannot specify larger parallel degree than the number of indexes that the table has. The parallel degree is either specified by user or determined based on the number of indexes that the table has, and further limited by max_parallel_maintenance_workers. The table size and index size don't affect it. --- doc/src/sgml/config.sgml | 14 +- doc/src/sgml/ref/vacuum.sgml | 31 ++ src/backend/access/heap/vacuumlazy.c | 890 ++ src/backend/access/transam/parallel.c | 4 + src/backend/commands/vacuum.c | 27 ++ src/backend/parser/gram.y | 1 + src/backend/postmaster/autovacuum.c | 1 + src/bin/psql/tab-complete.c | 2 +- src/include/access/heapam.h | 2 + src/include/commands/vacuum.h | 5 + src/test/regress/expected/vacuum.out | 12 +- src/test/regress/sql/vacuum.sql | 6 + 12 files changed, 889 insertions(+), 106 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index bc1d0f7..0b65d9b 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2226,13 +2226,13 @@ include_dir 'conf.d' Sets the maximum number of parallel workers that can be - started by a single utility command. Currently, the only - parallel utility command that supports the use of parallel - workers is CREATE INDEX, and only when - building a B-tree index. Parallel workers are taken from the - pool of processes established by , limited by . Note that the requested + started by a single utility command. Currently, the parallel + utility commands that support the use of parallel workers are + CREATE INDEX only when building a B-tree index, + and VACUUM without FULL + option. Parallel workers are taken from the pool of processes + established by , limited + by . Note that the requested number of workers may not actually be available at run time. If this occurs, the utility operation will run with fewer workers than expected. The default value is 2. Setting this diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml index fdd8151..a0dd997 100644 --- a/doc/src/sgml/ref/vacuum.sgml +++ b/doc/src/sgml/ref/vacuum.sgml @@ -33,6 +33,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ boolean ] SKIP_LOCKED [ boolean ] INDEX_CLEANUP [ boolean ] +PARALLEL [ integer ] and table_and_columns is: @@ -144,6 +145,22 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ integer background + workers (for the detail of each vacuum phases, please refer to + ). Only one worker can be used per index. So + parallel workers are launched only when there are at least 2 + indexes in the table. Workers for vacuum launches before starting each phases + and exit at the end of the phase. These behaviors might change in a future release. + This option can not use with FULL option. + + + + + DISABLE_PAGE_SKIPPING @@ -219,6 +236,20 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ integer + + + Specifies parallel degree for PARALLEL option. The + value must be at least 1. If the parallel degree + integer is omitted, then + VACUUM decides the number of workers based on number of + indexes on the relation which further limited by + . + + + + + table_name diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index c9d8312..ae077ab 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -22,6 +22,20 @@ * of index scans performed. So we don't use maintenance_work_mem memory for * the TID array, just enough to hold as many heap tuples as fit on one page. * + * Lazy vacuum supports parallel execution with parallel worker processes. In + * parallel lazy vacuum, we perform both index vacuuming and index cleanup with + * parallel worker processes. Individual indexes is
RE: Timeout parameters
On Thursday, April 4, 2019 5:20PM (GMT+9), Ryohei Nagaura wrote: > > From: Fabien COELHO > > * About socket_timeout v12 patch, I'm not sure there is a consensus. > I think so too. I just made the patch being able to be committed anytime. > > Finally, I rebased all the patches because they have come not to be applied > to the updated PG. I just checked and confirmed that the TCP USER TIMEOUT patch set v20 works. Although you should capitalize "linux" to "Linux" as already mentioned before. The committer can also just fix that very minor part, if patch is deemed committable. Note to committer: The "Ready for Committer" status is mainly intended for tcp user timeout parameter. OTOH, unless there is consensus with the socket_timeout, for now the socket_timeout patch status still remains as "Needs Review". Regards, Kirk Jamison
Re: Caveats from reloption toast_tuple_target
On Thu, Apr 4, 2019 at 11:36 AM Michael Paquier wrote: > > > I mean that toast_tuple_target is broken as-is, because it should be > used on the new tuples of a relation as a threshold to decide if this > tuple should be toasted or not, but we don't actually use the > reloption value for that decision-making: the default threshold > TOAST_TUPLE_THRESHOLD gets used instead all the time! The code does > not even create a toast table in some cases. > I think the problem with the existing code is that while it allows users to set toast_tuple_target to be less than TOAST_TUPLE_THRESHOLD, the same is not honoured while deciding whether to toast a row or not. AFAICS it works ok when toast_tuple_target is greater than or equal to TOAST_TUPLE_THRESHOLD i.e. it won't toast the rows unless they are larger than toast_tuple_target. IMV it makes sense to simply cap the lower limit of toast_tuple_target to the compile time default and update docs to reflect that. Otherwise, we need to deal with the possibility of dynamically creating the toast table if the relation option is lowered after creating the table. Your proposed patch handles the case when the toast_tuple_target is specified at CREATE TABLE, but we would still have problem with ALTER TABLE, no? But there might be side effects of changing the lower limit for pg_dump/pg_restore. So we would need to think about that too. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi, On 2019-04-04 23:57:58 -0400, Tom Lane wrote: > Andres Freund writes: > > I think the right approach would be to do all of this in heap_insert and > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > > is specified, remember whether it is either currently empty, or is > > already marked as all-visible. If previously empty, mark it as all > > visible at the end. If already all visible, there's no need to change > > that. If not yet all-visible, no need to do anything, since it can't > > have been inserted with COPY FREEZE. Do you see any problem doing it > > that way? > > Do we want to add overhead to these hot-spot routines for this purpose? For heap_multi_insert I can't see it being a problem - it's only used from copy.c, and the cost should be "smeared" over many tuples. I'd assume that compared with locking a page, WAL logging, etc, it'd not even meaningfully show up for heap_insert. Especially because we already have codepaths for options & HEAP_INSERT_FROZEN in heap_prepare_insert(), and I'd assume those could be combined. I think we should measure it, but I don't think that one or two additional, very well predictd, branches are going to be measurable in in those routines. The patch, as implemented, has modifications in RelationGetBufferForTuple(), that seems like they'd be more expensive. Greetings, Andres Freund
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi, On 2019-04-05 09:20:36 +0530, Pavan Deolasee wrote: > On Fri, Apr 5, 2019 at 9:05 AM Andres Freund wrote: > > I think the right approach would be to do all of this in heap_insert and > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > > is specified, remember whether it is either currently empty, or is > > already marked as all-visible. If previously empty, mark it as all > > visible at the end. If already all visible, there's no need to change > > that. If not yet all-visible, no need to do anything, since it can't > > have been inserted with COPY FREEZE. > > > We're doing roughly the same. If we are running INSERT_FROZEN, whenever > we're about to switch to a new page, we check if the previous page should > be marked all-frozen and do it that way. The special code in copy.c is > necessary to take care of the last page which we don't get to handle in the > regular code path. Well, it's not the same, because you need extra code from copy.c, extra lock cycles, and extra WAL logging. > Or are you suggesting that we don't even rescan the page for all-frozen > tuples at the end and just simply mark it all-frozen at the start, when the > first tuple is inserted and then don't touch the PD_ALL_VISIBLE/visibility > map bit as we go on inserting more tuples in the page? Correct. If done right that should be cheaper (no extra scans, less WAL logging), without requiring some new dispatch logic from copy.c. > Anyways, if major architectural changes are required then it's probably too > late to consider this for PG12, even though it's more of a bug fix and a > candidate for back-patching too. Let's just see how bad it looks? I don't feel like we'd need to be super strict about it. If it looks simple enough I'd e.g. be ok to merge this soon after freeze, and backpatch around maybe 12.1 or such. Greetings, Andres Freund
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Andres Freund writes: > I think the right approach would be to do all of this in heap_insert and > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > is specified, remember whether it is either currently empty, or is > already marked as all-visible. If previously empty, mark it as all > visible at the end. If already all visible, there's no need to change > that. If not yet all-visible, no need to do anything, since it can't > have been inserted with COPY FREEZE. Do you see any problem doing it > that way? Do we want to add overhead to these hot-spot routines for this purpose? regards, tom lane
Re: fix the spelling mistakes of comments
Michael Paquier writes: > On Thu, Apr 04, 2019 at 07:57:02AM -0400, Robert Haas wrote: >> On Wed, Apr 3, 2019 at 6:55 AM Liu, Huailing >> wrote: * This module contains the code for waiting and release★ of backends. >>> I think the word marked★ should be 'releasing'. >> It could be changed, but I don't think it's really wrong as written. > Indeed. Well, the problem is the lack of grammatical agreement between "waiting" and "release". It's poor English at least. regards, tom lane
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
On Fri, Apr 5, 2019 at 8:37 AM Andres Freund wrote: > Hi, > > On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote: > > > > > Hmm, isn't there already a critical section in visibilitymap_set itself? > > There is, but the proposed code sets all visible on the page, and marks > the buffer dirty, before calling visibilitymap_set. > How's it any different than what we're doing at vacuumlazy.c:1322? We set the page-level bit, mark the buffer dirty and then call visibilitymap_set(), all outside a critical section. 1300 /* mark page all-visible, if appropriate */ 1301 if (all_visible && !all_visible_according_to_vm) 1302 { 1303 uint8 flags = VISIBILITYMAP_ALL_VISIBLE; 1304 1305 if (all_frozen) 1306 flags |= VISIBILITYMAP_ALL_FROZEN; 1307 1308 /* 1309 * It should never be the case that the visibility map page is set 1310 * while the page-level bit is clear, but the reverse is allowed 1311 * (if checksums are not enabled). Regardless, set the both bits 1312 * so that we get back in sync. 1313 * 1314 * NB: If the heap page is all-visible but the VM bit is not set, 1315 * we don't need to dirty the heap page. However, if checksums 1316 * are enabled, we do need to make sure that the heap page is 1317 * dirtied before passing it to visibilitymap_set(), because it 1318 * may be logged. Given that this situation should only happen in 1319 * rare cases after a crash, it is not worth optimizing. 1320 */ 1321 PageSetAllVisible(page); 1322 MarkBufferDirty(buf); 1323 visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr, 1324 vmbuffer, visibility_cutoff_xid, flags); 1325 } As the first para in that comment says, I thought it's ok for page-level bit to be set but the visibility bit to be clear, but not the vice versa. The proposed code does not introduce any new behaviour AFAICS. But I might be missing something. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi, On Fri, Apr 5, 2019 at 9:05 AM Andres Freund wrote: > > > I think the right approach would be to do all of this in heap_insert and > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > is specified, remember whether it is either currently empty, or is > already marked as all-visible. If previously empty, mark it as all > visible at the end. If already all visible, there's no need to change > that. If not yet all-visible, no need to do anything, since it can't > have been inserted with COPY FREEZE. We're doing roughly the same. If we are running INSERT_FROZEN, whenever we're about to switch to a new page, we check if the previous page should be marked all-frozen and do it that way. The special code in copy.c is necessary to take care of the last page which we don't get to handle in the regular code path. Or are you suggesting that we don't even rescan the page for all-frozen tuples at the end and just simply mark it all-frozen at the start, when the first tuple is inserted and then don't touch the PD_ALL_VISIBLE/visibility map bit as we go on inserting more tuples in the page? Anyways, if major architectural changes are required then it's probably too late to consider this for PG12, even though it's more of a bug fix and a candidate for back-patching too. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi, On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote: > On 2019-Apr-04, Andres Freund wrote: > > I still think this is the wrong architecture. > > Hmm. I think the right approach would be to do all of this in heap_insert and heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN is specified, remember whether it is either currently empty, or is already marked as all-visible. If previously empty, mark it as all visible at the end. If already all visible, there's no need to change that. If not yet all-visible, no need to do anything, since it can't have been inserted with COPY FREEZE. Do you see any problem doing it that way? Greetings, Andres Freund
Re: speeding up planning with partitions
On 2019/04/05 12:18, David Rowley wrote: > On Fri, 5 Apr 2019 at 16:09, Amit Langote > wrote: >> Although, we still have ways >> to go in terms of scaling generic plan execution to larger partition >> counts, solution(s) for which have been proposed by David but haven't made >> it into master yet. > > Is that a reference to the last paragraph in [1]? That idea has not > gone beyond me writing that text yet! :-( It was more of a passing > comment on the only way I could think of to solve the problem. > > [1] > https://www.postgresql.org/message-id/CAKJS1f-y1HQK+VjG7=C==vGcLnzxjN8ysD5NmaN8Wh4=vsy...@mail.gmail.com Actually, I meant to refer to the following: https://commitfest.postgresql.org/22/1897/ Of course, we should pursue all available options. :) Thanks, Amit
Re: Refactoring the checkpointer's fsync request queue
On Thu, Apr 4, 2019 at 11:47 PM Thomas Munro wrote: > Pushed. Thanks for all the work on this! I managed to break this today while testing with RELSEG_SIZE set to 1 block (= squillions of 8kb files). The problem is incorrect arguments to _mdfd_getseg(), in code added recently by me. Without the EXTENSION_DONT_CHECK_SIZE flag, it refuses to open segments following segments that have been truncated, leading to a checkpointer fsync panic. It's also passing segnum where a blocknum is wanted. It should have used exactly the same arguments as in the old code, but didn't. I will push a fix shortly. -- Thomas Munro https://enterprisedb.com
Re: speeding up planning with partitions
On Fri, 5 Apr 2019 at 16:09, Amit Langote wrote: > Although, we still have ways > to go in terms of scaling generic plan execution to larger partition > counts, solution(s) for which have been proposed by David but haven't made > it into master yet. Is that a reference to the last paragraph in [1]? That idea has not gone beyond me writing that text yet! :-( It was more of a passing comment on the only way I could think of to solve the problem. [1] https://www.postgresql.org/message-id/CAKJS1f-y1HQK+VjG7=C==vGcLnzxjN8ysD5NmaN8Wh4=vsy...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
RE: Problem during Windows service start
Hi, Thank you for picking up this and I'm sorry for delay reply. >> If wait_for_postmaster returns POSTMASTER_STILL_STARTING will it >> be correct to set the status of windows service to SERVICE_START_PENDING ? Yes, I think this is the best. Currently, I do not have good solution to change the status from SERVICE_START_PENDING to SERVICE_RUNNING after recovery is completed. >> I would like to take this up if no one is working on this. Thank you for your proposing! I would like to ask if possible. I do not have much time to think about this topic now... Regards, Higuchi
Re: speeding up planning with partitions
On 2019/04/05 6:59, David Rowley wrote: > On Fri, 5 Apr 2019 at 07:33, Floris Van Nee wrote: >> I had a question about the performance of pruning of functions like now() >> and current_date. I know these are handled differently, as they cannot be >> excluded during the first phases of planning. However, curerntly, this new >> patch makes the performance difference between the static timestamp variant >> and now() very obvious (even more than before). Consider >> select * from partitioned_table where ts >= now() >> or >> select * from partitioned_table where ts >= '2019-04-04' >> >> The second plans in less than a millisecond, whereas the first takes +- >> 180ms for a table with 1000 partitions. Both end up with the same plan. > > The patch here only aims to improve the performance of queries to > partitioned tables when partitions can be pruned during planning. The > now() version of the query is unable to do that since we don't know > what that value will be during the execution of the query. In that > version, you're most likely seeing "Subplans Removed: ". This means > run-time pruning did some pruning and the planner generated subplans > for what you see plus others. Since planning for all partitions is > still slow, you're getting a larger performance difference than > before, but only due to the fact that the other plan is now faster to > generate. Yeah, the time for generating plan for a query that *can* use pruning but not during planning is still very much dependent on the number of partitions, because access plans must be created for all partitions, even if only one of those plans will actually be used and the rest pruned away during execution. > If you're never using prepared statements, Or if using prepared statements is an option, the huge planning cost mentioned above need not be paid repeatedly. Although, we still have ways to go in terms of scaling generic plan execution to larger partition counts, solution(s) for which have been proposed by David but haven't made it into master yet. Thanks, Amit
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi, On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote: > On 2019-Apr-04, Andres Freund wrote: > > > On 2019-04-04 12:23:08 -0700, Andres Freund wrote: > > > Also, how is this code even close to correct? > > > CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and > > > there's no WAL logging? Without even a comment arguing why that's OK (I > > > don't think it is)? > > > > Peter Geoghegan just reminded me over IM that there's actually logging > > inside log_heap_visible(), called from visibilitymap_set(). Still lacks > > a critical section though. > > Hmm, isn't there already a critical section in visibilitymap_set itself? There is, but the proposed code sets all visible on the page, and marks the buffer dirty, before calling visibilitymap_set. Greetings, Andres Freund
Re: [PATCH v20] GSSAPI encryption support
Greetings, * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > Kerberos tests are now failing for me (macOS). I'm seeing > > psql: error: could not connect to server: Over-size error packet sent by > the server. > not ok 3 - GSS encryption without auth > > # Failed test 'GSS encryption without auth' > # at t/002_enc.pl line 170. > # got: '2' > # expected: '0' > > (and repeated for several other tests). Alright, that over-size error was a bug in the error-handling code, which I've just pushed a fix for. That said... * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > On 2019-04-04 17:35, Stephen Frost wrote: > > Ok, it looks like there's a server-side error happening here, and it > > would be good to see what that is, so can you send the server logs? > > These errors appear several times in the server logs: > > FATAL: GSSAPI context error > DETAIL: Miscellaneous failure (see text): Decrypt integrity check > failed for checksum type hmac-sha1-96-aes256, key type > aes256-cts-hmac-sha1-96 > > FATAL: accepting GSS security context failed > DETAIL: Miscellaneous failure (see text): Decrypt integrity check > failed for checksum type hmac-sha1-96-aes256, key type > aes256-cts-hmac-sha1-96 This looks like it's a real issue and it's unclear what's going on here. I wonder- are you certain that you're using all the same Kerberos libraries for the KDC, the server, and psql? If you go back to before the GSSAPI encryption patch, does it work..? I've certainly seen interesting issues on MacOS, in particular, due to different Kerberos libraries/tools being installed and I wonder if that's what is going on here. Maybe you could check klist vs. psql wrt what libraries are linked in? Thanks, Stephen signature.asc Description: PGP signature
RE: libpq debug log
Hi, > The basic idea being: > > - Each line is a whole message. > - The line begins with <<< for a message received and >>> for a message sent. > - Strings in single quotes are those sent/received as a fixed number of bytes. > - Strings in double quotes are those sent/received as a string. > - 4-byte integers are printed unadorned. > - 2-byte integers are prefixed by #. > - I guess 1-byte integers would need some other prefix, maybe @ or ##. I created v1 patch to improve PQtrace(); output log message in one line. Please find my attached patch. Log output examples; In current PQtrace log: To backend> Msg Q To backend> "SELECT pg_catalog.set_config('search_path', '', false)" To backend> Msg complete, length 60 I changed like this: 2019-04-04 02:39:51.488 UTC > Query 59 "SELECT pg_catalog.set_config('search_path', '', false)" In current PQtrace log: >From backend> T >From backend (#4)> 35 >From backend (#2)> 1 >From backend> "set_config" >From backend (#4)> 0 >From backend (#2)> 0 >From backend (#4)> 25 >From backend (#2)> 65535 >From backend (#4)> -1 >From backend (#2)> 0 I changed like this: 2019-04-04 02:39:51.489 UTC < RowDescription 35 #1 "set_config" 0 #0 25 #65535 -1 #0 > I would like to add timestamp per line and when command processing function > start/end. > I think it is useful to know the application process start/end for diagnosis. > So I will implement like this; > > 2019-03-03 07:24:54.142 UTC PQgetResult start > 2019-03-03 07:24:54.143 UTC < 'T' 35 1 "set_config" 0 #0 25 #65535 -1 #0 > 2019-03-03 07:24:54.144 UTC PQgetResult end I would like to add this in next patch if there are not any disagreement. Regards, Aya Iwata v1-0001-libpq-PQtrace-output_one_line.patch Description: v1-0001-libpq-PQtrace-output_one_line.patch
RE: Speed up transaction completion faster after many relations are accessed in a transaction
On Fri, Apr 5, 2019 at 0:05 AM, Tsunakawa, Takayuki wrote: > From: Peter Eisentraut [mailto:peter.eisentr...@2ndquadrant.com] > > I can't detect any performance improvement with the patch applied to > > current master, using the test case from Yoshikazu Imai (2019-03-19). > > That's strange... Peter, Imai-san, can you compare your test procedures? Just for make sure, I described my test procedures in detail. I install and setup HEAD and patched as follows. [HEAD(413ccaa)] (git pull) ./configure --prefix=/usr/local/pgsql-dev --enable-depend make clean make make install su postgres export PATH=/usr/local/pgsql-dev/bin:$PATH initdb -D /var/lib/pgsql/data-dev vi /var/lib/pgsql/data-dev/postgresql.conf port = 44201 plan_cache_mode = 'auto' or 'force_custom_plan' max_parallel_workers = 0 max_parallel_workers_per_gather = 0 max_locks_per_transaction = 4096 pg_ctl -D /var/lib/pgsql/data-dev start [HEAD(413ccaa) + patch] (git pull) patch -u -p1 < 0002.patch ./configure --prefix=/usr/local/pgsql-locallock --enable-depend make clean make make install su postgres export PATH=/usr/local/pgsql-locallock/bin:$PATH initdb -D /var/lib/pgsql/data-locallock vi /var/lib/pgsql/data-locallock/postgresql.conf port = 44301 plan_cache_mode = 'auto' or 'force_custom_plan' max_parallel_workers = 0 max_parallel_workers_per_gather = 0 max_locks_per_transaction = 4096 pg_ctl -D /var/lib/pgsql/data-locallock start And I tested as follows. (creating partitioned table for port 44201) (creating partitioned table for port 44301) (creating select4096.sql) for i in `seq 1 5`; do pgbench -n -f select4096.sql -T 60 -M prepared -p 44201 | grep including; pgbench -n -f select4096.sql -T 60 -M prepared -p 44301 | grep including; done tps = 8146.039546 (including connections establishing) tps = 9021.340872 (including connections establishing) tps = 8011.186017 (including connections establishing) tps = 8926.191054 (including connections establishing) tps = 8006.769690 (including connections establishing) tps = 9028.716806 (including connections establishing) tps = 8057.709961 (including connections establishing) tps = 9017.713714 (including connections establishing) tps = 7956.332863 (including connections establishing) tps = 9126.650533 (including connections establishing) Thanks -- Yoshikazu Imai
RE: Speed up transaction completion faster after many relations are accessed in a transaction
Hi Amit-san, Imai-snan, From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp] > I was able to detect it as follows. > plan_cache_mode = auto > >HEAD: 1915 tps > Patched: 2394 tps > > plan_cache_mode = custom (non-problematic: generic plan is never created) > >HEAD: 2402 tps > Patched: 2393 tps Thanks a lot for very quick confirmation. I'm relieved to still see the good results. Regards Takayuki Tsunakawa
Re: fix the spelling mistakes of comments
On Thu, Apr 04, 2019 at 07:57:02AM -0400, Robert Haas wrote: > On Wed, Apr 3, 2019 at 6:55 AM Liu, Huailing > wrote: >> * This module contains the code for waiting and release★ of backends. >> * All code in this module executes on the primary. The core streaming >> * replication transport remains within WALreceiver/WALsender modules. >> >> I think the word marked★ should be 'releasing'. > > It could be changed, but I don't think it's really wrong as written. Indeed. > > * Walk★ the specified queue from head. Set the state of any backends that > > > > I think the word marked★ should be 'Wake'. > > I think it's correct as written. Yes, it's correct as-is. The code *walks* through the shmem queue to *wake* some of its elements. -- Michael signature.asc Description: PGP signature
Re: pg_rewind vs superuser
On Thu, Apr 04, 2019 at 01:19:44PM +0200, Magnus Hagander wrote: > All of it, or just the checkpoint part? I assume just the checkpoint part? > AFAIK it does require superuser in those earlier versions? I meant of course the checkpoint part down to 9.5, and the rest down to 11, so done this way. -- Michael signature.asc Description: PGP signature
RE: Speed up transaction completion faster after many relations are accessed in a transaction
On Fri, Apr 5, 2019 at 1:31 AM, Amit Langote wrote: > On 2019/04/05 5:42, Peter Eisentraut wrote: > > On 2019-04-04 06:58, Amit Langote wrote: > >> Also, since the "speed up partition planning" patch went in > >> (428b260f8), it might be possible to see the performance boost even > >> with the partitioning example you cited upthread. > > > > I can't detect any performance improvement with the patch applied to > > current master, using the test case from Yoshikazu Imai (2019-03-19). > > I was able to detect it as follows. > > * partitioned table setup: > > $ cat ht.sql > drop table ht cascade; > create table ht (a int primary key, b int, c int) partition by hash (a); > select 'create table ht' || x::text || ' partition of ht for values with > (modulus 8192, remainder ' || (x)::text || ');' from generate_series(0, > 8191) x; > \gexec > > * pgbench script: > > $ cat select.sql > \set param random(1, 8192) > select * from ht where a = :param > > * pgbench (5 minute run with -M prepared) > > pgbench -n -M prepared -T 300 -f select.sql > > * tps: > > plan_cache_mode = auto > >HEAD: 1915 tps > Patched: 2394 tps > > plan_cache_mode = custom (non-problematic: generic plan is never created) > >HEAD: 2402 tps > Patched: 2393 tps Amit-san, thanks for testing this. I also re-ran my tests(3/19) with HEAD(413ccaa) and HEAD(413ccaa) + patched, and I can still detect the performance difference with plan_cache_mode = auto. Thanks -- Yoshikazu Imai
Re: Server Crash due to assertion failure in _bt_check_unique()
On Thu, Apr 4, 2019 at 10:12 PM Peter Geoghegan wrote: > On Thu, Apr 4, 2019 at 4:06 AM Ashutosh Sharma > wrote: > > Attached is the patch with above changes. Please let me know if my > understanding is wrong. Thanks. > > You have it right. This bug slipped in towards the end of development, > when the insertstate struct was introduced. > > I have pushed something very close to the patch you posted. (I added > an additional assertion, and tweaked the comments.) > > Thanks Peter :) > Thanks for the report and patch! > -- > Peter Geoghegan > -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
Re: Speed up transaction completion faster after many relations are accessed in a transaction
On 2019/04/05 5:42, Peter Eisentraut wrote: > On 2019-04-04 06:58, Amit Langote wrote: >> Also, since the "speed up partition planning" patch went in (428b260f8), >> it might be possible to see the performance boost even with the >> partitioning example you cited upthread. > > I can't detect any performance improvement with the patch applied to > current master, using the test case from Yoshikazu Imai (2019-03-19). I was able to detect it as follows. * partitioned table setup: $ cat ht.sql drop table ht cascade; create table ht (a int primary key, b int, c int) partition by hash (a); select 'create table ht' || x::text || ' partition of ht for values with (modulus 8192, remainder ' || (x)::text || ');' from generate_series(0, 8191) x; \gexec * pgbench script: $ cat select.sql \set param random(1, 8192) select * from ht where a = :param * pgbench (5 minute run with -M prepared) pgbench -n -M prepared -T 300 -f select.sql * tps: plan_cache_mode = auto HEAD: 1915 tps Patched: 2394 tps plan_cache_mode = custom (non-problematic: generic plan is never created) HEAD: 2402 tps Patched: 2393 tps Thanks, Amit
Re: New vacuum option to do only freezing
On Fri, Apr 5, 2019 at 4:06 AM Robert Haas wrote: > > On Wed, Apr 3, 2019 at 10:32 PM Masahiko Sawada wrote: > > Attached the updated version patch. > > Committed with a little bit of documentation tweaking. > Thank you for committing them! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation
On Thu, Apr 4, 2019 at 10:07 PM Julien Rouhaud wrote: > > On Thu, Apr 4, 2019 at 1:23 PM Masahiko Sawada wrote: > > > > On Thu, Apr 4, 2019 at 1:26 PM Tsunakawa, Takayuki > > wrote: > > > > > > From: Fujii Masao [mailto:masao.fu...@gmail.com] > > > > reloption for TOAST is also required? > > > > > > # I've come back to the office earlier than planned... > > > > > > Hm, there's no reason to not provide toast.vacuum_shrink_enabled. Done > > > with the attached patch. > > > > > > > Thank you for updating the patch! > > +1! > > > +vacuum_shrink_enabled, > > toast.vacuum_shrink_enabled > > (boolean) > > + > > + > > + Enables or disables shrinking the table when it's vacuumed. > > + This also applies to autovacuum. > > + The default is true. If true, VACUUM frees empty pages at the > > end of the table. > > > > "VACUUM" needs or "vacuum" is more appropriate here? > > also, the documentation should point out that freeing is not > guaranteed. Something like > > + The default is true. If true, VACUUM will try to free empty > pages at the end of the table. +1 > > > I'm not sure the consensus we got here but we don't make the vacuum > > command option for this? > > I don't think here's a clear consensus, but my personal vote is to add > it, with SHRINK_TABLE = [ force_on | force_off | default ] (unless a > better proposal has been made already) As INDEX_CLEANUP option has been added by commit a96c41f, the new option for this feature could also accept zero or one boolean argument, that is SHRINK_TABLE [true|false] and true by default. Explicit options on VACUUM command overwrite options set by reloptions. And if the boolean argument is omitted the option depends on the reloptions. FWIW, I also would like to defer to committer on the naming new option but an another possible comment on that could be that the term 'truncate' might be more suitable rather than 'shrink' in the context of lazy vacuum. As Tsunakawa-san mentioned the term 'shrink' is used in PostgreSQL documentation but we use it mostly in the context of VACUUM FULL. I found two paragraphs that use the term 'shrink'. vacuum.sgml: The FULL option is not recommended for routine use, but might be useful in special cases. An example is when you have deleted or updated most of the rows in a table and would like the table to physically shrink to occupy less disk space and allow faster table scans. VACUUM FULL will usually shrink the table more than a plain VACUUM would. maintenance.sgml Although VACUUM FULL can be used to shrink a table back to its minimum size and return the disk space to the operating system, there is not much point in this if the table will just grow again in the future. Thus, moderately-frequent standard VACUUM runs are a better approach than infrequent VACUUM FULL runs for maintaining heavily-updated tables. On the other hand, we use the term 'truncate' in the progress reporting of lazy vacuum (see documentation of pg_stat_progress_vacuum). So I'm concerned that if we use the term 'shrink' users will think that this option prevents VACUUM FULL from working. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
RE: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation
From: Masahiko Sawada [mailto:sawada.m...@gmail.com] > "VACUUM" needs or "vacuum" is more appropriate here? Looking at the same file and some other files, "vacuum" looks appropriate because it represents the vacuum action, not the specific VACUUM command. > The format of the documentation of new option is a bit weird. Could it > be possible to adjust it around 80 characters per line like other > description? Ah, fixed. It's easy to overlook the style with the screen reader software... I've been wondering if there are good settings for editing .sgml in Emacs that, for example, puts appropriate number of spaces at the beginning of each line when is pressed, automatically break the long line and put spaces, etc. From: Julien Rouhaud [mailto:rjuju...@gmail.com] > also, the documentation should point out that freeing is not > guaranteed. Something like > > + The default is true. If true, VACUUM will try to free empty > pages at the end of the table. That's nice. Done. > > I'm not sure the consensus we got here but we don't make the vacuum > > command option for this? > > I don't think here's a clear consensus, but my personal vote is to add > it, with SHRINK_TABLE = [ force_on | force_off | default ] (unless a > better proposal has been made already) IMO, which I mentioned earlier, I don't think the VACUUM option is necessary because: (1) this is a table property which is determined based on the expected workload, not the one that people want to change per VACUUM operation (2) if someone wants to change the behavior for a particular VACUUM operation, he can do it using ALTER TABLE SET. Anyway, we can add the VACUUM option separately if we want it by all means. I don't it to be the blocker for this feature to be included in PG 12, because the vacuum truncaton has been bothering us like others said in this and other threads... Regards Takayuki Tsunakawa disable-vacuum-truncation_v6.patch Description: disable-vacuum-truncation_v6.patch
Re: Protect syscache from bloating with negative cache entries
Thank you for the comment. At Thu, 4 Apr 2019 15:44:35 -0400, Robert Haas wrote in > On Thu, Apr 4, 2019 at 8:53 AM Kyotaro HORIGUCHI > wrote: > > So it seems to me that the simplest "Full" version wins. The > > attached is rebsaed version. dlist_move_head(entry) is removed as > > mentioned above in that patch. > > 1. I really don't think this patch has any business changing the > existing logic. You can't just assume that the dlist_move_head() > operation is unimportant for performance. Ok, it doesn't show significant performance gain so removed that. > 2. This patch still seems to add a new LRU list that has to be > maintained. That's fairly puzzling. You seem to have concluded that > the version without the additional LRU wins, but the sent a new copy > of the version with the LRU version. Sorry, I attached wrong one. The attached is the right one, which doesn't adds the new dlist. > 3. I don't think adding an additional call to GetCurrentTimestamp() in > start_xact_command() is likely to be acceptable. There has got to be > a way to set this up so that the maximum number of new > GetCurrentTimestamp() is limited to once per N seconds, vs. the > current implementation that could do it many many many times per > second. The GetCurrentTimestamp() is called only once at very early in the backend's life in InitPostgres. Not in start_xact_command. What I did in the function is just copying stmtStartTimstamp, not GetCurrentTimestamp(). > 4. The code in CatalogCacheCreateEntry seems clearly unacceptable. In > a pathological case where CatCacheCleanupOldEntries removes exactly > one element per cycle, it could be called on every new catcache > allocation. It may be a problem, if just one entry was created in the duration longer than by catalog_cache_prune_min_age and resize interval, or all candidate entries except one are actually in use at the pruning moment. Is it realistic? > I think we need to punt this patch to next release. We're not > converging on anything committable very fast. Yeah, maybe right. This patch had several month silence several times, got comments and modified taking in the comments for more than two cycles, and finally had a death sentence (not literaly, actually postpone) at very close to this third cycle end. I anticipate the same continues in the next cycle. By the way, I found the reason of the wrong result of the previous benchmark. The test 3_0/1 needs to update catcacheclock midst of the loop. I'm going to fix it and rerun it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 596d6b018e1b7ddd5828298bfaba3ee405eb2604 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 1 Mar 2019 13:32:51 +0900 Subject: [PATCH] Remove entries that haven't been used for a certain time Catcache entries happen to be left alone for several reasons. It is not desirable that such useless entries eat up memory. Catcache pruning feature removes entries that haven't been accessed for a certain time before enlarging hash array. --- doc/src/sgml/config.sgml | 19 + src/backend/tcop/postgres.c | 2 + src/backend/utils/cache/catcache.c| 103 +- src/backend/utils/misc/guc.c | 12 +++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/utils/catcache.h | 16 6 files changed, 150 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index bc1d0f7bfa..819b252029 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1677,6 +1677,25 @@ include_dir 'conf.d' + + catalog_cache_prune_min_age (integer) + + catalog_cache_prune_min_age configuration + parameter + + + + + Specifies the minimum amount of unused time in seconds at which a + system catalog cache entry is removed. -1 indicates that this feature + is disabled at all. The value defaults to 300 seconds (5 + minutes). The entries that are not used for the duration + can be removed to prevent catalog cache from bloating with useless + entries. + + + + max_stack_depth (integer) diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 44a59e1d4f..a0efac86bc 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -71,6 +71,7 @@ #include "tcop/pquery.h" #include "tcop/tcopprot.h" #include "tcop/utility.h" +#include "utils/catcache.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/ps_status.h" @@ -2577,6 +2578,7 @@ start_xact_command(void) * not desired, the timeout has to be disabled explicitly. */ enable_statement_timeout(); + SetCatCacheClock(GetCurrentStatementStartTimestamp()); } static void diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index d05930bc4c..03c2d8524c 100644 ---
Re: COPY FROM WHEN condition
Hi, On 2019-04-05 12:59:06 +1300, David Rowley wrote: > I read through the final commit to see what had changed and I noticed > that I had forgotten to remove nbuffers from CopyMultiInsertInfo when > changing from a hash table to a List. > > Patch to fix is attached. Pushed, thanks.
RE: Speed up transaction completion faster after many relations are accessed in a transaction
Hi Peter, Imai-san, From: Peter Eisentraut [mailto:peter.eisentr...@2ndquadrant.com] > I can't detect any performance improvement with the patch applied to > current master, using the test case from Yoshikazu Imai (2019-03-19). That's strange... Peter, Imai-san, can you compare your test procedures? Peter, can you check and see the performance improvement with David's method on March 26 instead? Regards Takayuki Tsunakawa
Re: COPY FROM WHEN condition
On Fri, 5 Apr 2019 at 12:32, Andres Freund wrote: > I've pushed this now. Besides those naming changes, I'd to re-add the > zero initialization (I did end up seing compiler warnings when compiling > with optimizations). Also some comment fixes. Thanks for pushing. > I added one more flush location, when inserting into a partition that > cannot use batching - there might e.g. be triggers looking at rows in > other partitions or such. Good catch. I read through the final commit to see what had changed and I noticed that I had forgotten to remove nbuffers from CopyMultiInsertInfo when changing from a hash table to a List. Patch to fix is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_unused_variable_in_copy_buffers.patch Description: Binary data
Re: Changes to pg_dump/psql following collation "C" in the catalog
"Daniel Verite" writes: > I think psql and pg_dump need to adjust, just like the 3rd party tools > will, at least those that support collation-aware search in the catalog. > PFA a patch that implements the slight changes needed. > I'll add an entry for it in the next CF. Hm, if that's as much as we have to touch, I think there's a good argument for squeezing it into v12 rather than waiting. The point here is mostly to avoid a behavior change from pre-v12, but if we allow v12 to have a different behavior, it's questionable whether we'd want v13 to change it back. Just looking at the patch, I wonder whether it doesn't need some server-version checks. At the very least this would break with pre-9.1 servers, which lack COLLATE altogether. regards, tom lane
Re: COPY FROM WHEN condition
Hi, On 2019-04-04 12:04:28 -0700, Andres Freund wrote: > On 2019-04-03 22:20:00 -0700, Andres Freund wrote: > > On 2019-04-03 20:00:09 +1300, David Rowley wrote: > > > Oops, I forgot about that one. v4 attached. > > > > I'm pretty happy with this. I'm doing some minor changes (e.g. don't > > like the function comment formatting that much, the tableam callback > > needs docs, stuff like that), and then I'm going to push it tomorrow. > > After another read, I also think I'm going to rename the functions a > bit. CopyMultiInsertInfo_SetupBuffer, with its mix of camel-case and > underscores, just doesn't seem to mix well with copy.c and also just > generally other postgres code. I feel we already have too many different > naming conventions... I've pushed this now. Besides those naming changes, I'd to re-add the zero initialization (I did end up seing compiler warnings when compiling with optimizations). Also some comment fixes. I added one more flush location, when inserting into a partition that cannot use batching - there might e.g. be triggers looking at rows in other partitions or such. I found some pretty minor slowdown for COPYing narrow rows into an unlogged unpartitioned table. That got better by combining the loops in CopyMultiInsertBufferFlush() (previously there were stalls due to the indirect function calls for ExecCleartuple()). I think there's still a tiny slowdown left in that scenario, but given that it's unlogged and very narrow rows, I think that's ok. Greetings, Andres Freund
Re: Changes to pg_dump/psql following collation "C" in the catalog
>> "Daniel Verite" writes: >>> One consequence of using the "C" collation in the catalog versus >>> the db collation As an intrigued Person Following At Home, I was happy when I found this little three-message thread had more context in [1] and [2]. :) -Chap [1] https://postgr.es/m/15938.1544377...@sss.pgh.pa.us [2] https://postgr.es/m/5978.1544030...@sss.pgh.pa.us
Re: PostgreSQL Buildfarm Client Release 10
On Thu, Apr 4, 2019 at 6:18 PM Tom Lane wrote: > > Andrew Dunstan writes: > > The release can be downloaded from > > https://github.com/PGBuildFarm/client-code/archive/REL_10.tar.gz or > > https://buildfarm.postgresql.org/downloads/latest-client.tgz > > I don't actually see it on the buildfarm.postgresql.org server? > > It's there. I checked the link with wget. But I think the web cache might be having an issue. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PostgreSQL Buildfarm Client Release 10
Andrew Dunstan writes: > The release can be downloaded from > https://github.com/PGBuildFarm/client-code/archive/REL_10.tar.gz or > https://buildfarm.postgresql.org/downloads/latest-client.tgz I don't actually see it on the buildfarm.postgresql.org server? regards, tom lane
Re: Inadequate executor locking of indexes
On Fri, 5 Apr 2019 at 08:26, Tom Lane wrote: > Pushed with some minor tweaking, mostly comments. Thanks for tweaking and pushing this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: speeding up planning with partitions
On Fri, 5 Apr 2019 at 07:33, Floris Van Nee wrote: > I had a question about the performance of pruning of functions like now() and > current_date. I know these are handled differently, as they cannot be > excluded during the first phases of planning. However, curerntly, this new > patch makes the performance difference between the static timestamp variant > and now() very obvious (even more than before). Consider > select * from partitioned_table where ts >= now() > or > select * from partitioned_table where ts >= '2019-04-04' > > The second plans in less than a millisecond, whereas the first takes +- 180ms > for a table with 1000 partitions. Both end up with the same plan. The patch here only aims to improve the performance of queries to partitioned tables when partitions can be pruned during planning. The now() version of the query is unable to do that since we don't know what that value will be during the execution of the query. In that version, you're most likely seeing "Subplans Removed: ". This means run-time pruning did some pruning and the planner generated subplans for what you see plus others. Since planning for all partitions is still slow, you're getting a larger performance difference than before, but only due to the fact that the other plan is now faster to generate. If you're never using prepared statements, i.e, always planning right before execution, then you might want to consider using "where ts >= 'today'::timestamp". This will evaluate to the current date during parse and make the value available to the planner. You'll need to be pretty careful with that though, as if you do prepare queries or change to do that in the future then the bugs in your application could be very subtle and only do the wrong thing just after midnight on some day when the current time progresses over your partition boundary. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Refactoring the checkpointer's fsync request queue
On Fri, Apr 5, 2019 at 10:53 AM Thomas Munro wrote: > Ok, here is a patch that adds a one-typedef header and uses > SegmentIndex to replace all cases of BlockNumber and int holding a > segment number (where as an "index" or a "count"). (sorry, I meant "SegmentNumber", not "SegmentIndex") -- Thomas Munro https://enterprisedb.com
PostgreSQL Buildfarm Client Release 10
Announcing Release 10 of the PostgreSQL Buildfarm client Principal feature: support for non-standard repositories: . support multi-element branch names, such as “dev/featurename” or “bug/ticket_number/branchname” . provide a get_branches() method in SCM module . support regular expression branches of interest. This is matched against the list of available branches . prune branches when doing git fetch. This feature and some server side changes will be explored in detail in my presentation at pgCon in Ottawa next month. The feature doesn’t affect owners of animals in our normal public Build Farm. However, the items below are of use to them. Other features/ behaviour changes: . support for testing cross version upgrade extended back to 9.2 . support for core Postgres changes: . extended support for USE_MODULE_DB . new extra_float_digits regime . removal of user table oid support . removal of abstime and friends . changed log file locations . don’t search for valgrind messages unless valgrind is configured. . make detection of when NO_TEMP_INSTALL is allowed more bulletproof There are also various minor bug fixes and code improvements. The release can be downloaded from https://github.com/PGBuildFarm/client-code/archive/REL_10.tar.gz or https://buildfarm.postgresql.org/downloads/latest-client.tgz -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Refactoring the checkpointer's fsync request queue
On Fri, Apr 5, 2019 at 2:03 AM Alvaro Herrera wrote: > On 2019-Apr-04, Thomas Munro wrote: > > I don't think it's project policy to put a single typedef into its own > > header like that, and I'm not sure where else to put it. > > shrug. Looks fine to me. I suppose if we don't have it anywhere, it's > just because we haven't needed that particular trick yet. Creating a > file with a lone typedef seems better than using uint32 to me. It was commit 9fac5fd7 that gave me that idea. Ok, here is a patch that adds a one-typedef header and uses SegmentIndex to replace all cases of BlockNumber and int holding a segment number (where as an "index" or a "count"). -- Thomas Munro https://enterprisedb.com 0001-Introduce-SegmentNumber-typedef-for-relation-segment.patch Description: Binary data
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi, On 2019-04-04 12:23:08 -0700, Andres Freund wrote: > Also, how is this code even close to correct? > CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and > there's no WAL logging? Without even a comment arguing why that's OK (I > don't think it is)? Peter Geoghegan just reminded me over IM that there's actually logging inside log_heap_visible(), called from visibilitymap_set(). Still lacks a critical section though. I still think this is the wrong architecture. Greetings, Andres Freund PS: We're going to have to revamp visibilitymap_set() soon-ish - the fact that it directly calls heap routines inside is bad, means that additional AMs e.g. zheap has to reimplement that routine.
Re: New vacuum option to do only freezing
On 4/4/19, 12:06 PM, "Robert Haas" wrote: > Committed with a little bit of documentation tweaking. Thanks! I noticed a very small typo in the new documentation. diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml index fdd8151220..c652f8b0bc 100644 --- a/doc/src/sgml/ref/vacuum.sgml +++ b/doc/src/sgml/ref/vacuum.sgml @@ -199,7 +199,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [
Re: [proposal] Add an option for returning SQLSTATE in psql error message
didier writes: > [ 0001-Add-sqlstate-output-mode-to-VERBOSITY_v1.patch ] Pushed with some mostly-cosmetic adjustments. The main non-cosmetic thing I did was to adjust the logic so that if no SQLSTATE is available, it acts like TERSE mode. Otherwise, we'd print nothing at all except "ERROR:", which seems both quite useless and contrary to our message style guidelines. Moreover, documenting it this way makes the behavior not inconsistent with what happens for libpq-generated errors and errors from protocol-version-2 servers. regards, tom lane
Re: query logging of prepared statements
On Thu, Apr 04, 2019 at 03:07:04PM -0300, Alvaro Herrera wrote: > which does not look good -- the statement is nowhere to be seen. The commit > message quotes this as desirable output: Good catch. Unnamed statements sent behind the scenes by pqExecParams weren't being logged. I specifically handled unnamed statements in my v1 patch, and tested that in 20190215145704.gw30...@telsasoft.com, but for some reason dropped that logic in v2, which was intended to only remove behavior conditional on log_error_verbosity. Previous patches also never logged pqPrepare with named prepared statements (unnamed prepared statements were handled in v1 and SQL PREPARE was handled as a simple statement). On Thu, Apr 04, 2019 at 03:26:30PM -0300, Alvaro Herrera wrote: > With this patch (pretty much equivalent to reinstanting the > errdetail_execute for that line), That means the text of the prepared statement is duplicated for each execute, which is what we're trying to avoid, no ? Attached patch promotes message to LOG in exec_parse_message. Parse is a protocol-layer message, and I think it's used by (only) pqPrepare and pqExecParams. testlibpq3 now shows: |LOG: parse : SELECT * FROM test1 WHERE t = $1 |LOG: execute |DETAIL: parameters: $1 = 'joe''s place' |LOG: parse : SELECT * FROM test1 WHERE i = $1::int4 |LOG: execute |DETAIL: parameters: $1 = '2' Compare unpatched v11.2 , the text of the prepared statement was shown in "parse" phase rather than in each execute: |LOG: execute : SELECT * FROM test1 WHERE t = $1 |DETAIL: parameters: $1 = 'joe''s place' |LOG: execute : SELECT * FROM test1 WHERE i = $1::int4 |DETAIL: parameters: $1 = '2' Justin >From 0826f7f90eb8f169244140a22db0bbbcb0d2b269 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 9 Feb 2019 19:20:43 -0500 Subject: [PATCH v4] Avoid repetitive log of PREPARE during EXECUTE of prepared statements --- src/backend/tcop/postgres.c | 70 - 1 file changed, 18 insertions(+), 52 deletions(-) diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 44a59e1..348343c 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -182,7 +182,6 @@ static int ReadCommand(StringInfo inBuf); static void forbidden_in_wal_sender(char firstchar); static List *pg_rewrite_query(Query *query); static bool check_log_statement(List *stmt_list); -static int errdetail_execute(List *raw_parsetree_list); static int errdetail_params(ParamListInfo params); static int errdetail_abort(void); static int errdetail_recovery_conflict(void); @@ -1041,8 +1040,7 @@ exec_simple_query(const char *query_string) { ereport(LOG, (errmsg("statement: %s", query_string), - errhidestmt(true), - errdetail_execute(parsetree_list))); + errhidestmt(true))); was_logged = true; } @@ -1292,8 +1290,7 @@ exec_simple_query(const char *query_string) ereport(LOG, (errmsg("duration: %s ms statement: %s", msec_str, query_string), - errhidestmt(true), - errdetail_execute(parsetree_list))); + errhidestmt(true))); break; } @@ -1326,6 +1323,7 @@ exec_parse_message(const char *query_string, /* string to execute */ bool is_named; bool save_log_statement_stats = log_statement_stats; char msec_str[32]; + bool was_logged = false; /* * Report query to various monitoring facilities. @@ -1339,11 +1337,6 @@ exec_parse_message(const char *query_string, /* string to execute */ if (save_log_statement_stats) ResetUsage(); - ereport(DEBUG2, - (errmsg("parse %s: %s", - *stmt_name ? stmt_name : "", - query_string))); - /* * Start up a transaction command so we can run parse analysis etc. (Note * that this will normally change current memory context.) Nothing happens @@ -1404,6 +1397,14 @@ exec_parse_message(const char *query_string, /* string to execute */ Query *query; bool snapshot_set = false; + if (check_log_statement(parsetree_list)) { + ereport( LOG, // else log as DEBUG2 ? +(errmsg("parse %s: %s", + *stmt_name ? stmt_name : "", + query_string))); + was_logged = true; + } + raw_parse_tree = linitial_node(RawStmt, parsetree_list); /* @@ -1544,7 +1545,7 @@ exec_parse_message(const char *query_string, /* string to execute */ /* * Emit duration logging if appropriate. */ - switch (check_log_duration(msec_str, false)) + switch (check_log_duration(msec_str, was_logged)) { case 1: ereport(LOG, @@ -1920,12 +1921,11 @@ exec_bind_message(StringInfo input_message) break; case 2: ereport(LOG, - (errmsg("duration: %s ms bind %s%s%s: %s", + (errmsg("duration: %s ms bind %s%s%s", msec_str, *stmt_name ? stmt_name : "", *portal_name ? "/" : "", - *portal_name ? portal_name : "", - psrc->query_string), + *portal_name ? portal_name : ""), errhidestmt(true), errdetail_params(params))); break;
Re: propagating replica identity to partitions
On 2019-Mar-29, Peter Eisentraut wrote: > On 2019-03-28 18:16, Simon Riggs wrote: > > SET TABLESPACE should not recurse because it copies the data, while > > holding long locks. If that was ever fixed so it happened concurrently, > > I would agree this could recurse by default. > > Since a partitioned table has no storage, what is the meaning of moving > a partitioned table to a different tablespace without recursing? Changing the tablespace of a partitioned table currently means to set a default tablespace for partitions created after the change. Robert proposed to change it so that it means to move every partition to that tablespace, unless ONLY is specified. Simon objects to that change. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Speed up transaction completion faster after many relations are accessed in a transaction
On 2019-04-04 06:58, Amit Langote wrote: > Also, since the "speed up partition planning" patch went in (428b260f8), > it might be possible to see the performance boost even with the > partitioning example you cited upthread. I can't detect any performance improvement with the patch applied to current master, using the test case from Yoshikazu Imai (2019-03-19). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: proposal: pg_restore --convert-to-text
I just pushed it with trivial changes: * rebased for cc8d41511721 * changed wording in the error message * added a new test for the condition in t/001_basic.pl * Added the "-" in the --help line of -f. Andrew G. never confirmed that this change is sufficient to appease users being confused by the previous behavior. I hope it is ... Thanks! -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Block level parallel vacuum
On Thu, Apr 4, 2019 at 6:28 AM Masahiko Sawada wrote: > These patches conflict with the current HEAD. Attached the updated patches. They'll need another rebase. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
On Thu, Apr 4, 2019 at 11:52 AM Antonin Houska wrote: > Robert Haas wrote: > > I'm willing to put some effort into trying to get this into v13 if > > you're willing to keep hacking on it, but there's probably a fair > > amount to do and a year can go by in a hurry. > > That'd be appreciated, especially by my boss. It doesn't seem likely that in > this situation I'll be assigned many other tasks of higher priority. So yes, I > expect to have quite some time for this patch. Thanks. Great. I think it will be appreciated by my boss, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Protect syscache from bloating with negative cache entries
On Thu, Apr 4, 2019 at 8:53 AM Kyotaro HORIGUCHI wrote: > So it seems to me that the simplest "Full" version wins. The > attached is rebsaed version. dlist_move_head(entry) is removed as > mentioned above in that patch. 1. I really don't think this patch has any business changing the existing logic. You can't just assume that the dlist_move_head() operation is unimportant for performance. 2. This patch still seems to add a new LRU list that has to be maintained. That's fairly puzzling. You seem to have concluded that the version without the additional LRU wins, but the sent a new copy of the version with the LRU version. 3. I don't think adding an additional call to GetCurrentTimestamp() in start_xact_command() is likely to be acceptable. There has got to be a way to set this up so that the maximum number of new GetCurrentTimestamp() is limited to once per N seconds, vs. the current implementation that could do it many many many times per second. 4. The code in CatalogCacheCreateEntry seems clearly unacceptable. In a pathological case where CatCacheCleanupOldEntries removes exactly one element per cycle, it could be called on every new catcache allocation. I think we need to punt this patch to next release. We're not converging on anything committable very fast. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Inadequate executor locking of indexes
David Rowley writes: > Wrong patch. Here's what I meant to send. Pushed with some minor tweaking, mostly comments. Some notes: * We now have a general convention that queries always take the same lock type on indexes as on their parent tables, but that convention is not respected by index DDL. I'm not sure if there is any additional cleanup that should be done there. The DDL code intends to block concurrent execution of other DDL on the same index, in most cases, so it may be fine. At the very least, the different lock levels for those cases are clearly intentional, while I don't think that was true for DML. * I dropped the extra assertion you'd added to infer_arbiter_indexes, as it didn't seem necessary or appropriate. That function's just interested in inspecting the index, so it doesn't need to assume anything about how strong the lock is. I think the comment that was there was just trying to justify the shortcut of hard-wiring the lock level as RowExclusiveLock. * I went ahead and changed gin_clean_pending_list() to take RowExclusiveLock not AccessShareLock on its target index. I'm not quite sure that AccessShareLock is an actual bug there; it may be that GIN's internal conventions are such that that's safe. But it sure seemed darn peculiar, and at risk of causing future problems. regards, tom lane
Re: Changes to pg_dump/psql following collation "C" in the catalog
Tom Lane wrote: > "Daniel Verite" writes: > > One consequence of using the "C" collation in the catalog versus > > the db collation is that pg_dump -t with a regexp may not find > > the same tables as before. It happens when these conditions are > > all met: > > - the collation of the database is not "C" > > - the regexp has locale-dependant parts > > - the names to match include characters that are sensitive to > > locale-dependant matching > > Hm, interesting. > > > It seems that to fix that, we could qualify the references to columns such > > as "relname" and "schema_name" with COLLATE "default" clauses in the > > queries that use pattern-matching in client-side tools, AFAICS > > pg_dump and psql. > > Seems reasonable. I was initially worried that this might interfere with > query optimization, but some experimentation says that the planner > successfully derives prefix index clauses anyway (which is correct, > because matching a fixed regex prefix doesn't depend on locale). > > It might be better to attach the COLLATE clause to the pattern constant > instead of the column name; that'd be less likely to break if sent to > an older server. > > > Before going any further with this idea, is there agreement that it's an > > issue to address and does this look like the best way to do that? > > That is a question worth asking. We're going to be forcing people to get > used to this when working directly in SQL, so I don't know if masking it > in a subset of tools is really a win or not. I think psql and pg_dump need to adjust, just like the 3rd party tools will, at least those that support collation-aware search in the catalog. PFA a patch that implements the slight changes needed. I'll add an entry for it in the next CF. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c index 5c1732a..69ac6f9 100644 --- a/src/fe_utils/string_utils.c +++ b/src/fe_utils/string_utils.c @@ -808,6 +808,8 @@ appendReloptionsArray(PQExpBuffer buffer, const char *reloptions, * to limit the set of objects returned. The WHERE clauses are appended * to the already-partially-constructed query in buf. Returns whether * any clause was added. + * The pattern matching uses the collation of the database through explicit + * COLLATE "default" clauses. * * conn: connection query will be sent to (consulted for escaping rules). * buf: output parameter. @@ -971,17 +973,18 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern, appendPQExpBuffer(buf, "(%s OPERATOR(pg_catalog.~) ", namevar); appendStringLiteralConn(buf, namebuf.data, conn); +appendPQExpBufferStr(buf, " COLLATE \"default\""); appendPQExpBuffer(buf, "\nOR %s OPERATOR(pg_catalog.~) ", altnamevar); appendStringLiteralConn(buf, namebuf.data, conn); -appendPQExpBufferStr(buf, ")\n"); +appendPQExpBufferStr(buf, " COLLATE \"default\" )\n"); } else { appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", namevar); appendStringLiteralConn(buf, namebuf.data, conn); -appendPQExpBufferChar(buf, '\n'); +appendPQExpBufferStr(buf, " COLLATE \"default\"\n"); } } } @@ -997,7 +1000,7 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern, WHEREAND(); appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", schemavar); appendStringLiteralConn(buf, schemabuf.data, conn); - appendPQExpBufferChar(buf, '\n'); + appendPQExpBufferStr(buf, " COLLATE \"default\"\n"); } } else
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi, On 2019-04-04 16:15:54 -0300, Alvaro Herrera wrote: > On 2019-Apr-04, Andres Freund wrote: > > > I'm totally not OK with this from a layering > > POV. CheckAndSetAllVisibleBulkInsertState is entirely heap specific > > (without being named such), whereas all the heap specific bits are > > getting moved below tableam. > > This is a fair complaint, but on the other hand the COPY changes for > table AM are still being developed, so there's no ground on which to > rebase this patch. Do you have a timeline on getting the COPY one > committed? ~2h. Just pondering the naming of some functions etc. Don't think there's a large interdependency though. But even if tableam weren't committed, I'd still argue that it's structurally done wrong in the patch right now. FWIW, I actually think this whole approach isn't quite right - this shouldn't be done as a secondary action after we'd already inserted, with a separate lock-unlock cycle etc. Also, how is this code even close to correct? CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and there's no WAL logging? Without even a comment arguing why that's OK (I don't think it is)? Greetings, Andres Freund
Re: Row Level Security − leakproof-ness and performance implications
On 2019-03-18 20:08, Peter Eisentraut wrote: > I agree that it would be useful to document and discuss better which > built-in operators are leak-proof and which are not. But I don't think > the CREATE POLICY reference page is the place to do it. Note that the > leak-proofness mechanism was originally introduced for security-barrier > views (an early form of RLS if you will), so someone could also > reasonably expect a discussion there. It sounds like this will need significant additional work, so setting as returned with feedback for now. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: New vacuum option to do only freezing
On Wed, Apr 3, 2019 at 10:32 PM Masahiko Sawada wrote: > Attached the updated version patch. Committed with a little bit of documentation tweaking. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: query logging of prepared statements
On 2019-Apr-04, Alvaro Herrera wrote: > I think we could improve on this by setting a "logged" flag in the > portal; if the Parse logs the statement, then don't include the > statement in further lines, otherwise do. Also: I think such a flag could help the case of a query that takes long enough to execute to exceed the log_min_duration_statement, but not long enough to parse. Right now, log_min_duration_statement=500 shows 2019-04-04 15:59:39 -03 [6353-1] LOG: duration: 2002.298 ms execute 2019-04-04 15:59:39 -03 [6353-2] DETAIL: parameters: $1 = 'joe''s place' if I change the testlibpq3 query to be "SELECT * FROM test1 WHERE t = $1 and pg_sleep(1) is not null", Also, if you parse once and bind/execute many times, IMO the statement should be logged exactly once. I think you could that with the flag I propose. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: speeding up planning with partitions
Hi all, First of all I would like to thank everyone involved in this patch for their hard work on this. This is a big step forward. I've done some performance and functionality testing with the patch that was committed to master and it looks very good. I had a question about the performance of pruning of functions like now() and current_date. I know these are handled differently, as they cannot be excluded during the first phases of planning. However, curerntly, this new patch makes the performance difference between the static timestamp variant and now() very obvious (even more than before). Consider select * from partitioned_table where ts >= now() or select * from partitioned_table where ts >= '2019-04-04' The second plans in less than a millisecond, whereas the first takes +- 180ms for a table with 1000 partitions. Both end up with the same plan. I'm not too familiar with the code that handles this, but is there a possibility for improvement in this area? Or is the stage at which exclusion for now()/current_date occurs already too far in the process to make any good improvements to this? My apologies if this is considered off-topic for this patch, but I ran into this issue specifically when I was testing this patch, so I thought I'd ask here about it. I do think a large number of use-cases for tables with a large number of partitions involve a timestamp for partition key, and naturally people will start writing queries for this that use functions such as now() and current_date. Thanks again for your work on this patch! -Floris From: Amit Langote Sent: Tuesday, April 2, 2019 7:50 AM To: Tom Lane Cc: David Rowley; Imai Yoshikazu; jesper.peder...@redhat.com; Imai, Yoshikazu; Amit Langote; Alvaro Herrera; Robert Haas; Justin Pryzby; Pg Hackers Subject: Re: speeding up planning with partitions [External] Thanks for taking a look. On 2019/04/02 2:34, Tom Lane wrote: > Amit Langote writes: >> On 2019/03/30 0:29, Tom Lane wrote: >>> That seems like probably an independent patch --- do you want to write it? > >> Here is that patch. >> It revises get_relation_constraints() such that the partition constraint >> is loaded in only the intended cases. > > So I see the problem you're trying to solve here, but I don't like this > patch a bit, because it depends on root->inhTargetKind which IMO is a > broken bit of junk that we need to get rid of. Here is an example of > why, with this patch applied: > > regression=# create table p (a int) partition by list (a); > CREATE TABLE > regression=# create table p1 partition of p for values in (1); > CREATE TABLE > regression=# set constraint_exclusion to on; > SET > regression=# explain select * from p1 where a = 2; > QUERY PLAN > -- > Result (cost=0.00..0.00 rows=0 width=0) >One-Time Filter: false > (2 rows) > > So far so good, but watch what happens when we include the same case > in an UPDATE on some other partitioned table: > > regression=# create table prtab (a int, b int) partition by list (a); > CREATE TABLE > regression=# create table prtab2 partition of prtab for values in (2); > CREATE TABLE > regression=# explain update prtab set b=b+1 from p1 where prtab.a=p1.a and > p1.a=2; > QUERY PLAN > --- > Update on prtab (cost=0.00..82.30 rows=143 width=20) >Update on prtab2 >-> Nested Loop (cost=0.00..82.30 rows=143 width=20) > -> Seq Scan on p1 (cost=0.00..41.88 rows=13 width=10) >Filter: (a = 2) > -> Materialize (cost=0.00..38.30 rows=11 width=14) >-> Seq Scan on prtab2 (cost=0.00..38.25 rows=11 width=14) > Filter: (a = 2) > (8 rows) > > No constraint exclusion, while in v10 you get > > Update on prtab (cost=0.00..0.00 rows=0 width=0) >-> Result (cost=0.00..0.00 rows=0 width=0) > One-Time Filter: false > > The reason is that this logic supposes that root->inhTargetKind describes > *all* partitioned tables in the query, which is obviously wrong. > > Now maybe we could make it work by doing something like > > if (rel->reloptkind == RELOPT_BASEREL && > (root->inhTargetKind == INHKIND_NONE || > rel->relid != root->parse->resultRelation)) Ah, you're right. inhTargetKind has to be checked in conjunction with checking whether the relation is the target relation. > but I find that pretty messy, plus it's violating the concept that we > shouldn't be allowing messiness from inheritance_planner to leak into > other places. I'm afraid that we'll have to live with this particular hack as long as we have inheritance_planner(), but we maybe could somewhat reduce the extent to which the hack is spread into other planner files. How about we move the part of get_relation_constraints() that loads the partition
Re: query logging of prepared statements
On 2019-Apr-04, Alvaro Herrera wrote: > However, turning duration logging off and using log_statement=all, this is > what > I get: > > 2019-04-04 14:58:42.564 -03 [31685] LOG: statement: SET search_path = > testlibpq3 > 2019-04-04 14:58:42.565 -03 [31685] LOG: execute > 2019-04-04 14:58:42.565 -03 [31685] DETAIL: parameters: $1 = 'joe''s place' > 2019-04-04 14:58:42.565 -03 [31685] LOG: execute > 2019-04-04 14:58:42.565 -03 [31685] DETAIL: parameters: $1 = '2' With this patch (pretty much equivalent to reinstanting the errdetail_execute for that line), diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index dbc7b797c6e..fd73d5e9951 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2056,7 +2056,6 @@ exec_execute_message(const char *portal_name, long max_rows) prepStmtName, *portal_name ? "/" : "", *portal_name ? portal_name : ""), -errhidestmt(true), errdetail_params(portalParams))); was_logged = true; } I get what seems to be pretty much what is wanted for this case: 2019-04-04 15:18:16.817 -03 [4559] LOG: statement: SET search_path = testlibpq3 2019-04-04 15:18:16.819 -03 [4559] LOG: execute 2019-04-04 15:18:16.819 -03 [4559] DETAIL: parameters: $1 = 'joe''s place' 2019-04-04 15:18:16.819 -03 [4559] STATEMENT: SELECT * FROM test1 WHERE t = $1 2019-04-04 15:18:16.820 -03 [4559] LOG: execute 2019-04-04 15:18:16.820 -03 [4559] DETAIL: parameters: $1 = '2' 2019-04-04 15:18:16.820 -03 [4559] STATEMENT: SELECT * FROM test1 WHERE i = $1::int4 However, by setting both log_statement=all and log_min_duration_statement=0 and that patch (I also added %l to log_line_prefix), I get this: 2019-04-04 15:23:45 -03 [5208-1] LOG: statement: SET search_path = testlibpq3 2019-04-04 15:23:45 -03 [5208-2] LOG: duration: 0.441 ms 2019-04-04 15:23:45 -03 [5208-3] LOG: duration: 1.127 ms parse : SELECT * FROM test1 WHERE t = $1 2019-04-04 15:23:45 -03 [5208-4] LOG: duration: 0.789 ms bind 2019-04-04 15:23:45 -03 [5208-5] DETAIL: parameters: $1 = 'joe''s place' 2019-04-04 15:23:45 -03 [5208-6] LOG: execute 2019-04-04 15:23:45 -03 [5208-7] DETAIL: parameters: $1 = 'joe''s place' 2019-04-04 15:23:45 -03 [5208-8] STATEMENT: SELECT * FROM test1 WHERE t = $1 2019-04-04 15:23:45 -03 [5208-9] LOG: duration: 0.088 ms 2019-04-04 15:23:45 -03 [5208-10] LOG: duration: 0.363 ms parse : SELECT * FROM test1 WHERE i = $1::int4 2019-04-04 15:23:45 -03 [5208-11] LOG: duration: 0.206 ms bind 2019-04-04 15:23:45 -03 [5208-12] DETAIL: parameters: $1 = '2' 2019-04-04 15:23:45 -03 [5208-13] LOG: execute 2019-04-04 15:23:45 -03 [5208-14] DETAIL: parameters: $1 = '2' 2019-04-04 15:23:45 -03 [5208-15] STATEMENT: SELECT * FROM test1 WHERE i = $1::int4 2019-04-04 15:23:45 -03 [5208-16] LOG: duration: 0.053 ms Note that line 5208-8 is duplicative of 5208-3. I think we could improve on this by setting a "logged" flag in the portal; if the Parse logs the statement, then don't include the statement in further lines, otherwise do. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: query logging of prepared statements
On 2019-Mar-04, Justin Pryzby wrote: > Thanks for reviewing. I'm also interested in discussion about whether this > change is undesirable for someone else for some reason ? For me, the existing > output seems duplicative and "denormalized". :) Some digging turned up that the function you're removing was added by commit 893632be4e17. The commit message mentions output for testlibpq3, so I ran that against a patched server. With log_min_duration_statement=0 I get this, which looks good: 2019-04-04 14:59:15.529 -03 [31723] LOG: duration: 0.108 ms statement: SET search_path = testlibpq3 2019-04-04 14:59:15.530 -03 [31723] LOG: duration: 0.303 ms parse : SELECT * FROM test1 WHERE t = $1 2019-04-04 14:59:15.530 -03 [31723] LOG: duration: 0.231 ms bind 2019-04-04 14:59:15.530 -03 [31723] DETAIL: parameters: $1 = 'joe''s place' 2019-04-04 14:59:15.530 -03 [31723] LOG: duration: 0.016 ms execute 2019-04-04 14:59:15.530 -03 [31723] DETAIL: parameters: $1 = 'joe''s place' 2019-04-04 14:59:15.530 -03 [31723] LOG: duration: 0.096 ms parse : SELECT * FROM test1 WHERE i = $1::int4 2019-04-04 14:59:15.530 -03 [31723] LOG: duration: 0.053 ms bind 2019-04-04 14:59:15.530 -03 [31723] DETAIL: parameters: $1 = '2' 2019-04-04 14:59:15.530 -03 [31723] LOG: duration: 0.007 ms execute 2019-04-04 14:59:15.530 -03 [31723] DETAIL: parameters: $1 = '2' An unpatched server emits this: 2019-04-04 15:03:01.176 -03 [1165] LOG: duration: 0.163 ms statement: SET search_path = testlibpq3 2019-04-04 15:03:01.176 -03 [1165] LOG: duration: 0.475 ms parse : SELECT * FROM test1 WHERE t = $1 2019-04-04 15:03:01.177 -03 [1165] LOG: duration: 0.403 ms bind : SELECT * FROM test1 WHERE t = $1 2019-04-04 15:03:01.177 -03 [1165] DETAIL: parameters: $1 = 'joe''s place' 2019-04-04 15:03:01.177 -03 [1165] LOG: duration: 0.028 ms execute : SELECT * FROM test1 WHERE t = $1 2019-04-04 15:03:01.177 -03 [1165] DETAIL: parameters: $1 = 'joe''s place' 2019-04-04 15:03:01.177 -03 [1165] LOG: duration: 0.177 ms parse : SELECT * FROM test1 WHERE i = $1::int4 2019-04-04 15:03:01.177 -03 [1165] LOG: duration: 0.096 ms bind : SELECT * FROM test1 WHERE i = $1::int4 2019-04-04 15:03:01.177 -03 [1165] DETAIL: parameters: $1 = '2' 2019-04-04 15:03:01.177 -03 [1165] LOG: duration: 0.014 ms execute : SELECT * FROM test1 WHERE i = $1::int4 2019-04-04 15:03:01.177 -03 [1165] DETAIL: parameters: $1 = '2' Note that with your patch we no longer get the statement in the "bind" and "execute" lines, which seems good; that was far too noisy. However, turning duration logging off and using log_statement=all, this is what I get: 2019-04-04 14:58:42.564 -03 [31685] LOG: statement: SET search_path = testlibpq3 2019-04-04 14:58:42.565 -03 [31685] LOG: execute 2019-04-04 14:58:42.565 -03 [31685] DETAIL: parameters: $1 = 'joe''s place' 2019-04-04 14:58:42.565 -03 [31685] LOG: execute 2019-04-04 14:58:42.565 -03 [31685] DETAIL: parameters: $1 = '2' which does not look good -- the statement is nowhere to be seen. The commit message quotes this as desirable output: LOG: statement: execute : SELECT * FROM test1 WHERE t = $1 DETAIL: parameters: $1 = 'joe''s place' LOG: statement: execute : SELECT * FROM test1 WHERE i = $1::int4 DETAIL: parameters: $1 = '2' which is approximately what I get with an unpatched server: 2019-04-04 15:04:25.718 -03 [1235] LOG: statement: SET search_path = testlibpq3 2019-04-04 15:04:25.719 -03 [1235] LOG: execute : SELECT * FROM test1 WHERE t = $1 2019-04-04 15:04:25.719 -03 [1235] DETAIL: parameters: $1 = 'joe''s place' 2019-04-04 15:04:25.720 -03 [1235] LOG: execute : SELECT * FROM test1 WHERE i = $1::int4 2019-04-04 15:04:25.720 -03 [1235] DETAIL: parameters: $1 = '2' So I think this needs a bit more work. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH v20] GSSAPI encryption support
Tom Lane writes: > I wrote: >> Stephen Frost writes: >>> So I'm a bit surprised that it's taking 4 minutes for you. I wonder if >>> there might be an issue related to the KDC wanting to get some amount of >>> random data and the system you're on isn't producing random bytes very >>> fast..? > >> Not sure. This is my usual development box and it also does mail, DNS, >> etc for my household, so I'd expect it to have plenty of entropy. >> But it's running a pretty old kernel, and old Kerberos too, so maybe >> the explanation is in there somewhere. > > Same test on a laptop running Fedora 28 takes a shade under 5 seconds. > The laptop has a somewhat better geekbench rating than my workstation, > but certainly not 50x better. And I really doubt it's got more entropy > sources than the workstation. Gotta be something about the kernel. > > Watching the test logs, I see that essentially all the time on the RHEL6 > machine is consumed by the two > > # Running: /usr/sbin/kdb5_util create -s -P secret0 > > steps. Is there a case for merging the two scripts so we only have to > do that once? Maybe not, if nobody else sees this. I think that would be a good idea! Unfortunately I don't speak perl well enough to do that, so I'd just copied-and-modified. Thanks, --Robbie signature.asc Description: PGP signature
Re: [PATCH v20] GSSAPI encryption support
Tom Lane writes: > Stephen Frost writes: >> * Tom Lane (t...@sss.pgh.pa.us) wrote: >>> Well, if the caller thinks what is being passed back is an int, >>> it will do a 32-to-64-bit widening, which is almost certainly >>> going to result in a corrupted pointer. > >> Oh, good point. Interesting that it still works then. > > There must be something about the x86_64 ABI that allows this to > accidentally work -- maybe integers are presumed to be sign-extended > to 64 bits by callee not caller? I added some logging and verified > that pgstat.c is seeing the correct string value, so it's working > somehow. > >> I've got a fix for the missing prototypes, I hadn't noticed the issue >> previously due to always building with SSL enabled as well. > > Yeah, I'd just come to the conclusion that it's because I didn't > include --with-openssl, and libpq-be.h's #ifdef nest doesn't expect > that. > > BTW, the kerberos test suite takes nearly 4 minutes for me, is > it supposed to be so slow? My guess is entropy problems as well. If available, configuring /dev/urandom passthrough from the host is a generally helpful thing to do. My (Fedora, Centos/RHEL 7+) krb5 builds use getrandom() for entropy, so they shouldn't be slow; I believe Debian also has started doing so recently as well. I don't know what other distros/OSs do for this. Thanks, --Robbie signature.asc Description: PGP signature
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi, I've been looking at this patch for a while, and it seems pretty much RFC, so barring objections I'll take care of that once I do a bit more testing and review. Unless someone else wants to take care of that. FWIW I wonder if we should add the code for partitioned tables to CopyFrom, considering that's unsupported and so can't be tested etc. It's not a huge amount of code, of course. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH v20] GSSAPI encryption support
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> Watching the test logs, I see that essentially all the time on the RHEL6 >> machine is consumed by the two >> # Running: /usr/sbin/kdb5_util create -s -P secret0 >> steps. Is there a case for merging the two scripts so we only have to >> do that once? Maybe not, if nobody else sees this. > I do think that mergeing them would be a good idea and I can look into > that, though at least locally that step takes less than a second.. I > wonder if you might strace (or whatever is appropriate) that kdb5_util > and see what's taking so long. I seriously doubt it's the actual > kdb5_util code and strongly suspect it's some kernel call. "strace -r" pins the blame pretty firmly on /dev/random: 0.76 open("/dev/random", O_RDONLY) = 3 0.000227 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 0.61 fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 8), ...}) = 0 0.68 read(3, "\336&\301\310V\344q\217\264-\262\320w-", 64) = 14 0.91 read(3, "\326\353I\371$\361", 50) = 6 15.328306 read(3, "\214\301\313]I\325", 44) = 6 17.418929 read(3, "z\251\37\275\365\24", 38) = 6 13.366997 read(3, "6\257I\315f\3", 32) = 6 11.457994 read(3, "\370\275\2765\31(", 26) = 6 23.472194 read(3, "\226\r\314\373\2014", 20) = 6 11.746848 read(3, "\335\336BR\30\322", 14) = 6 20.823940 read(3, "\366\214\r\211\0267", 8) = 6 14.429214 read(3, ",g", 2) = 2 15.494835 close(3) = 0 There's no other part of the trace that takes more than ~ 0.1s. So this boils down to the old bugaboo about how much entropy there really is in /dev/random. regards, tom lane
Re: [GSoC 2019] Proposal: Develop Performance Farm Database and Website
Hi Ilaria, Edited for bottom posting. :) On Fri, Mar 29, 2019 at 03:01:05PM +0100, Ilaria wrote: > > Am 29.03.2019 um 13:52 schrieb Peter Eisentraut > > : > > > >> On 2019-03-29 13:04, Robert Haas wrote: > >>> On Tue, Mar 26, 2019 at 9:10 AM Ila B. wrote: > >>> I am Ilaria Battiston, an aspiring GSoC student, and I would love to have > >>> a feedback on the first draft of my Google Summer of Code proposal. The > >>> project is "Develop Performance Farm Database and Website”. You can find > >>> any other detail in the attached PDF file :) > >> > >> I think there's probably a very large amount of work to be done in > >> this area. Nobody is going to finish it in a summer. Still, there's > >> probably some useful things you could get done in a summer. I think a > >> lot will depend on finding a good mentor who is familiar with these > >> areas (which I am not). Has anyone expressed an interest? > > > > Moreover, I have a feeling that have been hearing about work on a > > performance farm for many years. Perhaps it should be investigated what > > became of that work and what the problems were getting it to a working > > state. > Hello, > > Thanks for the answer. This project is on the official PostgreSQL project > list of GSoC 2019, and potential mentors are stated there. > > I trust mentors’ judgement on outlining the work and the tasks to be done in > three months, and there is the previous student’s work to use as example if > needed. The project consists in building a database and a website on top of > it for users to browse performance data. > > Let me know whether there are any specific issues you’re concerned about. Hongyuan, our student last summer, put together a summary of his progress in a GitHub issue: https://github.com/PGPerfFarm/pgperffarm/issues/22 We have systems for proofing (from OSUOSL) and you can also see the prototype here: http://140.211.168.111/ For Phase 1, I'd recommend getting familiar with the database schema in place now. Perhaps it can use some tweaking, but I just mean to suggest that it might not be necessary to rebuild it from scratch. In Phase 2, we had some difficulty last year about getting the authentication/authorization completely integrated. I think the main issue was how to integrate this app while using resources outside of the community infrastructure. We may have to continue working around that. Otherwise, I think the rest make sense. Let us know if you have any more questions. Regards, Mark -- Mark Wong 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Re: [PATCH v20] GSSAPI encryption support
I wrote: > Stephen Frost writes: >> So I'm a bit surprised that it's taking 4 minutes for you. I wonder if >> there might be an issue related to the KDC wanting to get some amount of >> random data and the system you're on isn't producing random bytes very >> fast..? > Not sure. This is my usual development box and it also does mail, DNS, > etc for my household, so I'd expect it to have plenty of entropy. > But it's running a pretty old kernel, and old Kerberos too, so maybe > the explanation is in there somewhere. Same test on a laptop running Fedora 28 takes a shade under 5 seconds. The laptop has a somewhat better geekbench rating than my workstation, but certainly not 50x better. And I really doubt it's got more entropy sources than the workstation. Gotta be something about the kernel. Watching the test logs, I see that essentially all the time on the RHEL6 machine is consumed by the two # Running: /usr/sbin/kdb5_util create -s -P secret0 steps. Is there a case for merging the two scripts so we only have to do that once? Maybe not, if nobody else sees this. regards, tom lane
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
Robert Haas wrote: > I'm willing to put some effort into trying to get this into v13 if > you're willing to keep hacking on it, but there's probably a fair > amount to do and a year can go by in a hurry. That'd be appreciated, especially by my boss. It doesn't seem likely that in this situation I'll be assigned many other tasks of higher priority. So yes, I expect to have quite some time for this patch. Thanks. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: [PATCH v20] GSSAPI encryption support
Greetings, * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > Kerberos tests are now failing for me (macOS). I'm seeing > > psql: error: could not connect to server: Over-size error packet sent by > the server. > not ok 3 - GSS encryption without auth > > # Failed test 'GSS encryption without auth' > # at t/002_enc.pl line 170. > # got: '2' > # expected: '0' > > (and repeated for several other tests). Ok, it looks like there's a server-side error happening here, and it would be good to see what that is, so can you send the server logs? Thanks! Stephen signature.asc Description: PGP signature
Re: [PATCH v20] GSSAPI encryption support
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> There must be something about the x86_64 ABI that allows this to >> accidentally work -- maybe integers are presumed to be sign-extended >> to 64 bits by callee not caller? I added some logging and verified >> that pgstat.c is seeing the correct string value, so it's working >> somehow. > Huh, I'm not sure. That's certainly interesting though. Oh, no, it's simpler than that: the pointer values that be_gssapi_get_princ() is returning just happen to be less than 2^31 on my system. I'd dismissed that as being unlikely, but it's the truth. > So I'm a bit surprised that it's taking 4 minutes for you. I wonder if > there might be an issue related to the KDC wanting to get some amount of > random data and the system you're on isn't producing random bytes very > fast..? Not sure. This is my usual development box and it also does mail, DNS, etc for my household, so I'd expect it to have plenty of entropy. But it's running a pretty old kernel, and old Kerberos too, so maybe the explanation is in there somewhere. regards, tom lane
Re: [PATCH v20] GSSAPI encryption support
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > >> Well, if the caller thinks what is being passed back is an int, > >> it will do a 32-to-64-bit widening, which is almost certainly > >> going to result in a corrupted pointer. > > > Oh, good point. Interesting that it still works then. > > There must be something about the x86_64 ABI that allows this to > accidentally work -- maybe integers are presumed to be sign-extended > to 64 bits by callee not caller? I added some logging and verified > that pgstat.c is seeing the correct string value, so it's working > somehow. Huh, I'm not sure. That's certainly interesting though. > > I've got a fix for the missing prototypes, I hadn't noticed the issue > > previously due to always building with SSL enabled as well. > > Yeah, I'd just come to the conclusion that it's because I didn't > include --with-openssl, and libpq-be.h's #ifdef nest doesn't expect > that. Right, that should be fixed now with the commit I just pushed. > BTW, the kerberos test suite takes nearly 4 minutes for me, is > it supposed to be so slow? Unfortunately, the kerberos test suite requires building a KDC to get tickets from and that takes a bit of time. On my laptop it takes about 8s: make -s check 4.67s user 0.85s system 70% cpu 7.819 total So I'm a bit surprised that it's taking 4 minutes for you. I wonder if there might be an issue related to the KDC wanting to get some amount of random data and the system you're on isn't producing random bytes very fast..? Thanks! Stephen signature.asc Description: PGP signature
Re: [PATCH v20] GSSAPI encryption support
On 2019-04-04 17:16, Tom Lane wrote: > BTW, the kerberos test suite takes nearly 4 minutes for me, is > it supposed to be so slow? I've seen this on some virtualized machines that didn't have a lot of entropy. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH v20] GSSAPI encryption support
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> Well, if the caller thinks what is being passed back is an int, >> it will do a 32-to-64-bit widening, which is almost certainly >> going to result in a corrupted pointer. > Oh, good point. Interesting that it still works then. There must be something about the x86_64 ABI that allows this to accidentally work -- maybe integers are presumed to be sign-extended to 64 bits by callee not caller? I added some logging and verified that pgstat.c is seeing the correct string value, so it's working somehow. > I've got a fix for the missing prototypes, I hadn't noticed the issue > previously due to always building with SSL enabled as well. Yeah, I'd just come to the conclusion that it's because I didn't include --with-openssl, and libpq-be.h's #ifdef nest doesn't expect that. BTW, the kerberos test suite takes nearly 4 minutes for me, is it supposed to be so slow? regards, tom lane
Re: GiST VACUUM
On 25/03/2019 15:20, Heikki Linnakangas wrote: On 24/03/2019 18:50, Andrey Borodin wrote: I was working on new version of gist check in amcheck and understand one more thing: /* Can this page be recycled yet? */ bool gistPageRecyclable(Page page) { return PageIsNew(page) || (GistPageIsDeleted(page) && TransactionIdPrecedes(GistPageGetDeleteXid(page), RecentGlobalXmin)); } Here RecentGlobalXmin can wraparound and page will become unrecyclable for half of xid cycle. Can we prevent it by resetting PageDeleteXid to InvalidTransactionId before doing RecordFreeIndexPage()? (Seems like same applies to GIN...) True, and B-tree has the same issue. I thought I saw a comment somewhere in the B-tree code about that earlier, but now I can't find it. I must've imagined it. We could reset it, but that would require dirtying the page. That would be just extra I/O overhead, if the page gets reused before XID wraparound. We could avoid that if we stored the full XID+epoch, not just XID. I think we should do that in GiST, at least, where this is new. In the B-tree, it would require some extra code to deal with backwards-compatibility, but maybe it would be worth it even there. I suggest that we do the attached. It fixes this for GiST. The patch changes expands the "deletion XID" to 64-bits, and changes where it's stored. Instead of storing it pd_prune_xid, it's stored in the page contents. Luckily, a deleted page has no real content. I think we should fix this in a similar manner in B-tree, too, but that can be done separately. For B-tree, we need to worry about backwards-compatibility, but that seems simple enough; we just need to continue to understand old deleted pages, where the deletion XID is stored in the page opaque field. - Heikki >From b7897577c83a81ec04394ce7113d1d8a47804086 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Thu, 4 Apr 2019 18:06:48 +0300 Subject: [PATCH 1/2] Refactor checks for deleted GiST pages. The explicit check in gistScanPage() isn't currently really necessary, as a deleted page is always empty, so the loop would fall through without doing anything, anyway. But it's a marginal optimization, and it gives a nice place to attach a comment to explain how it works. --- src/backend/access/gist/gist.c| 40 --- src/backend/access/gist/gistget.c | 14 +++ 2 files changed, 29 insertions(+), 25 deletions(-) diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c index 2db790c840..028b06b264 100644 --- a/src/backend/access/gist/gist.c +++ b/src/backend/access/gist/gist.c @@ -693,14 +693,15 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, continue; } - if (stack->blkno != GIST_ROOT_BLKNO && - stack->parent->lsn < GistPageGetNSN(stack->page)) + if ((stack->blkno != GIST_ROOT_BLKNO && + stack->parent->lsn < GistPageGetNSN(stack->page)) || + GistPageIsDeleted(stack->page)) { /* - * Concurrent split detected. There's no guarantee that the - * downlink for this page is consistent with the tuple we're - * inserting anymore, so go back to parent and rechoose the best - * child. + * Concurrent split or page deletion detected. There's no + * guarantee that the downlink for this page is consistent with + * the tuple we're inserting anymore, so go back to parent and + * rechoose the best child. */ UnlockReleaseBuffer(stack->buffer); xlocked = false; @@ -719,9 +720,6 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, GISTInsertStack *item; OffsetNumber downlinkoffnum; - /* currently, internal pages are never deleted */ - Assert(!GistPageIsDeleted(stack->page)); - downlinkoffnum = gistchoose(state.r, stack->page, itup, giststate); iid = PageGetItemId(stack->page, downlinkoffnum); idxtuple = (IndexTuple) PageGetItem(stack->page, iid); @@ -842,12 +840,13 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, * leaf/inner is enough to recognize split for root */ } -else if (GistFollowRight(stack->page) || - stack->parent->lsn < GistPageGetNSN(stack->page)) +else if ((GistFollowRight(stack->page) || + stack->parent->lsn < GistPageGetNSN(stack->page)) && + GistPageIsDeleted(stack->page)) { /* - * The page was split while we momentarily unlocked the - * page. Go back to parent. + * The page was split or deleted while we momentarily + * unlocked the page. Go back to parent. */ UnlockReleaseBuffer(stack->buffer); xlocked = false; @@ -856,18 +855,6 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, } } - /* - * The page might have been deleted after we scanned the parent - * and saw the downlink. - */ - if (GistPageIsDeleted(stack->page)) - { -UnlockReleaseBuffer(stack->buffer); -xlocked = false; -state.stack = stack = stack->parent; -continue; -
Re: POC: GROUP BY optimization
> On Thu, Jan 31, 2019 at 12:24 PM Andres Freund wrote: > > As nothing has happened since, I'm marking this as returned with > feedback. This patch was on my radar for some time in the past and we've seen use cases where it could be pretty useful (probably even without the incremental sort patch). I would like to make some progress here and see if it's possible to continue it's development. I've attached the rebased version with a small changes, e.g. I've created a separate patch with group by reordering tests to make it easy to see what changes were introduced, and after some experiments removed part that seems to duplicate "group by" reordering to follow "order by". Also looks like it's possible to make these patches independent by having a base patch with the isolated group_keys_reorder_by_pathkeys (they're connected via n_preordered), but I haven't done this yet. I went through the thread to summarize the objections, that were mentioned so far. Most of them are related to the third patch in the series, where reordering based on "ndistincs" is implemented, and are about cost_sort (all the possible problems that could happen without proper cost estimation due to non uniform distribution, different comparison costs and so on) and figuring out how to limit number of possible combinations of pathkeys to compare. I haven't looked at the proposed backtracking approach, but taking into account that suggested patch for cost_sort [1] is RWF, I wonder what would be the best strategy to proceed? [1]: https://commitfest.postgresql.org/21/1706/ v11-0001-Add-tests-for-group-by-optimization.patch Description: Binary data v11-0002-Reorder-to-match-ORDER-BY-or-index.patch Description: Binary data v11-0003-Reorder-by-values-distribution.patch Description: Binary data
Re: [PATCH v20] GSSAPI encryption support
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > >> I'm not very sure why the integer/pointer confusion in pgstat_bestart > >> doesn't cause hard crashes when using gss auth --- or does > >> this suite not actually test that? > > > Isn't it just saying that because of the implicit declaration..? > > Once that's fixed, the integer/pointer warning will go away, but > > it's actually a pointer in either case, hence why it isn't crashing. > > Well, if the caller thinks what is being passed back is an int, > it will do a 32-to-64-bit widening, which is almost certainly > going to result in a corrupted pointer. Oh, good point. Interesting that it still works then. I've got a fix for the missing prototypes, I hadn't noticed the issue previously due to always building with SSL enabled as well. I'm testing with a non-SSL build and will push the fix shortly. Thanks! Stephen signature.asc Description: PGP signature
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
On Thu, Apr 4, 2019 at 9:57 AM Antonin Houska wrote: > I think I finally understand. Originally I thought the question is how to > compute correct page checksum while the hint bits can be changed w/o exclusive > lock on the buffer. Now I realize that it's more about *recovery*: if the hint > bit change is followed by a torn page write, the hint bit can get changed on > disk but the checksum might not get updated. The wrong checksum is detected > during recovery, but if XLOG does not contain the corresponding full page > image, we're not able to recover. > > And with encryption, the consequence is even worse because torn page write > causes not only wrong checksum of otherwise useful page, but really damaged > page. Correct. > I'll enforce the FPW in the next version of the patch. Cool. I'm willing to put some effort into trying to get this into v13 if you're willing to keep hacking on it, but there's probably a fair amount to do and a year can go by in a hurry. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH v20] GSSAPI encryption support
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> I'm not very sure why the integer/pointer confusion in pgstat_bestart >> doesn't cause hard crashes when using gss auth --- or does >> this suite not actually test that? > Isn't it just saying that because of the implicit declaration..? > Once that's fixed, the integer/pointer warning will go away, but > it's actually a pointer in either case, hence why it isn't crashing. Well, if the caller thinks what is being passed back is an int, it will do a 32-to-64-bit widening, which is almost certainly going to result in a corrupted pointer. > The test suite does test GSS authentication and GSS encryption. Hm. I'll poke at this more closely. regards, tom lane
Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid
On Wed, Apr 03, 2019 at 07:05:43PM -0700, Noah Misch wrote: > On Mon, Apr 01, 2019 at 08:19:56AM +, Daniel Gustafsson wrote: > > On Monday, April 1, 2019 12:42 AM, Noah Misch wrote: > > > On Fri, Mar 29, 2019 at 09:53:51AM +, Daniel Gustafsson wrote: > > > > > > This seems like a case where it would be useful to log a shmdt() error > > > > or do > > > > an Assert() around the success of the operation perhaps? > > > > > > I'll add the same elog(LOG) we have at other shmdt() sites. I can't think > > > of > > > a site where we Assert() about the results of a system call. While shmdt() > > > might be a justified exception, elog(LOG) seems reasonable. > > > > Agreed, seems reasonable. > > Pushed, but that broke two buildfarm members: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus=2019-04-04%2000%3A33%3A14 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis=2019-04-04%2000%3A33%3A13 > > I think the problem arose because these animals run on the same machine, and > their test execution was synchronized to the second. Two copies of the new > test ran concurrently. It doesn't tolerate that, owing to expectations about > which shared memory keys are in use. My initial thought is to fix this by > having a third postmaster that runs throughout the test and represents > ownership of a given port. If that postmaster gets something other than the > first shm key pertaining to its port, switch ports and try again. > > I'll also include fixes for the warnings Andres reported on the > pgsql-committers thread. This thread's 2019-04-03 patches still break buildfarm members in multiple ways. I plan to revert them. I'll wait a day or two before doing that, in case more failure types show up.
Re: pg_upgrade: Pass -j down to vacuumdb
Peter Eisentraut writes: > I think the real fix is to have pg_upgrade copy over the statistics. > They are right there after all, just take them. Well, it's not "just take them", we need to develop some infrastructure that will permit coping with cross-version differences in the stats storage. This isn't insoluble, but it will take some work. We had a prior discussion that trailed off without much result: https://www.postgresql.org/message-id/flat/724322880.K8vzik8zPz%40abook regards, tom lane
Re: [PATCH v20] GSSAPI encryption support
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > On Thu, Apr 4, 2019 at 05:20 Peter Eisentraut < > > peter.eisentr...@2ndquadrant.com> wrote: > >> Kerberos tests are now failing for me (macOS). > > > Interesting, they work locally for me on Ubuntu. Unfortunately, I don’t > > have macOS. This only happens when encryption is being used, presumably? > > GSS authentication is still working fine? > > The kerberos test suite passes for me on RHEL6 (kerberos 1.10.3), > but I observe some compiler warnings that need to be dealt with: Interesting, I don't see those with my build. I'll have to figure out why not. Will fix them in any case. > $ ./configure --with-gssapi ... > $ time make -j8 -s > be-secure-gssapi.c:597: warning: no previous prototype for > 'be_gssapi_get_auth' > be-secure-gssapi.c:609: warning: no previous prototype for 'be_gssapi_get_enc' > be-secure-gssapi.c:621: warning: no previous prototype for > 'be_gssapi_get_princ' > pgstat.c: In function 'pgstat_bestart': > pgstat.c:2986: warning: implicit declaration of function 'be_gssapi_get_auth' > pgstat.c:2987: warning: implicit declaration of function 'be_gssapi_get_enc' > pgstat.c:2990: warning: implicit declaration of function 'be_gssapi_get_princ' > pgstat.c:2990: warning: passing argument 2 of 'strlcpy' makes pointer from > integer without a cast > ../../../src/include/port.h:429: note: expected 'const char *' but argument > is of type 'int' > All of PostgreSQL successfully made. Ready to install. > > I'm not very sure why the integer/pointer confusion in pgstat_bestart > doesn't cause hard crashes when using gss auth --- or does > this suite not actually test that? Isn't it just saying that because of the implicit declaration..? Once that's fixed, the integer/pointer warning will go away, but it's actually a pointer in either case, hence why it isn't crashing. The test suite does test GSS authentication and GSS encryption. Thanks, Stephen signature.asc Description: PGP signature
Re: [PATCH v20] GSSAPI encryption support
Stephen Frost writes: > On Thu, Apr 4, 2019 at 05:20 Peter Eisentraut < > peter.eisentr...@2ndquadrant.com> wrote: >> Kerberos tests are now failing for me (macOS). > Interesting, they work locally for me on Ubuntu. Unfortunately, I don’t > have macOS. This only happens when encryption is being used, presumably? > GSS authentication is still working fine? The kerberos test suite passes for me on RHEL6 (kerberos 1.10.3), but I observe some compiler warnings that need to be dealt with: $ ./configure --with-gssapi ... $ time make -j8 -s be-secure-gssapi.c:597: warning: no previous prototype for 'be_gssapi_get_auth' be-secure-gssapi.c:609: warning: no previous prototype for 'be_gssapi_get_enc' be-secure-gssapi.c:621: warning: no previous prototype for 'be_gssapi_get_princ' pgstat.c: In function 'pgstat_bestart': pgstat.c:2986: warning: implicit declaration of function 'be_gssapi_get_auth' pgstat.c:2987: warning: implicit declaration of function 'be_gssapi_get_enc' pgstat.c:2990: warning: implicit declaration of function 'be_gssapi_get_princ' pgstat.c:2990: warning: passing argument 2 of 'strlcpy' makes pointer from integer without a cast ../../../src/include/port.h:429: note: expected 'const char *' but argument is of type 'int' All of PostgreSQL successfully made. Ready to install. I'm not very sure why the integer/pointer confusion in pgstat_bestart doesn't cause hard crashes when using gss auth --- or does this suite not actually test that? regards, tom lane
Re: [HACKERS] generated columns
On Thu, Apr 4, 2019 at 8:01 PM Peter Eisentraut wrote: > > On 2019-04-04 11:42, Amit Langote wrote: > > Hmm, I'm afraid we might get bug reports if we go with this. Why is it OK > > to get null in this case when a user explicitly asked for 'foo'? > > The way stored generated columns work on foreign tables is that the > to-be-stored value is computed, then given to the foreign table handler, > which then has to store it, and then return it later when queried. > However, since the backing store of a foreign table is typically > modifiable directly by the user via other channels, it's possible to > create situations where actually stored data does not satisfy the > generation expression. That is the case here. I don't know of a > principled way to prevent that. It's just one of the problems that can > happen when you store data in a foreign table: You have very little > control over what data actually ends up being stored. OK, thanks for explaining. We do allow DEFAULT to be specified on foreign tables, although locally-defined defaults have little meaning if the FDW doesn't allow inserts. Maybe same thing applies to GENERATED AS columns. Would it make sense to clarify this on CREATE FOREIGN TABLE page? Thanks, Amit
Re: pg_upgrade: Pass -j down to vacuumdb
On 2019-04-03 23:24, Justin Pryzby wrote: > I just did a test on one of our large-but-not-huge customers. With > stats_target=1, analyzing a 145GB partitioned table looks like it'll take > perhaps an hour; they have ~1TB data, so delaying services during ANALYZE > would > nullify the utility of pg_upgrade. I can restore the essential tables from > backup in 15-30 minutes. I think the real fix is to have pg_upgrade copy over the statistics. They are right there after all, just take them. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
Robert Haas wrote: > On Thu, Mar 14, 2019 at 9:26 AM Antonin Houska wrote: > > > Hint bits also rely on this principle. I thought there might be some > > > interaction between this work and wal_log_hints for this reason, but I see > > > nothing of that sort in the patch. > > > > I'm not sure if the hint bit is still a problem if we first copy the shared > > buffer to backend-local memory and encrypt it there. That's what the patch > > does. > > That has the same problem that I described in the previous paragraph, > except for the relation data files rather than the WAL itself. See > the manual's description of wal_log_hints: > > > When this parameter is on, the > PostgreSQL > server writes the entire content of each disk page to WAL during the > first modification of that page after a checkpoint, even for > non-critical modifications of so-called hint bits. > > > For encryption to work, you need that. A hint bit write might convert > the page to garbage, just as in the previous example, and this option > make sure that if that happens, there will be an FPW, allowing you to > recover... I think I finally understand. Originally I thought the question is how to compute correct page checksum while the hint bits can be changed w/o exclusive lock on the buffer. Now I realize that it's more about *recovery*: if the hint bit change is followed by a torn page write, the hint bit can get changed on disk but the checksum might not get updated. The wrong checksum is detected during recovery, but if XLOG does not contain the corresponding full page image, we're not able to recover. And with encryption, the consequence is even worse because torn page write causes not only wrong checksum of otherwise useful page, but really damaged page. I'll enforce the FPW in the next version of the patch. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: COPY FROM WHEN condition
On Thu, 4 Apr 2019 at 18:20, Andres Freund wrote: > I'm pretty happy with this. I'm doing some minor changes (e.g. don't > like the function comment formatting that much, the tableam callback > needs docs, stuff like that), and then I'm going to push it tomorrow. > > I'm planning to attribute it to you, me, and Haribabu Kommi. Sounds good. > > Oh, btw, is there a reason you're memset(0)'ing multiInsertInfo? Seems > unnecessary, given that in all cases we're using it we're going to do > CopyMultiInsertInfo_Init(). And IME valgrind and the compiler are more > helpful when you don't just default initialize, because then they can > tell when you forgot to initialize a field (say like > CopyMultiInsertInfo_Init not initializing nbuffers). At some point when I was hacking at it, I was getting a warning about it being possibly uninitialised. I don't recall exactly how the code was then, but I just tried removing it and I don't get any warning now. So probably fine to remove. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Retronym: s/magnetic disk/main data/
On Thu, Apr 04, 2019 at 05:49:36PM +1300, Thomas Munro wrote: > Hello, > > I propose that in v13 we redefine "MD" (from md.c) to mean "main data" > (instead of "magnetic disk"). That's the standard storage layout for > typical table and index AMs. As opposed to the proposed undo and SLRU > SMGRs that provide layouts specialised for different life cycles. Maybe we could use dd for "diamagnetic data" ;) Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: pg_upgrade version checking questions
On Wednesday, March 27, 2019 1:43 PM, Christoph Berg wrote: > Re: Daniel Gustafsson 2019-03-26 > pC-NMmh4vQLQP76YTwY4AuoD4OdNw9egikekyQpXFpgqmTlGjIZXOTd2W5RDZPpRski5N3ADRrLYgLk6QUuvmuT5fWC9acPAYyDU1AVxJcU=@yesql.se > > > 0003 - Make -B default to CWD and remove the exec_path check > > Defaulting to CWD for the new bindir has the side effect that the default > > sockdir is in the bin/ directory which may be less optimal. > > Hmm, I would have thought that the default for the new bindir is the > directory where pg_upgrade is located, not the CWD, which is likely to > be ~postgres or the like? Yes, thinking on it that's obviously better. The attached v2 repurposes the find_my_exec() check to make the current directory of pg_upgrade the default for new_cluster.bindir (the other two patches are left as they were). cheers ./daniel 0001-Only-allow-upgrades-by-the-same-exact-version-new-v2.patch Description: Binary data 0002-Check-all-used-executables-v2.patch Description: Binary data 0003-Default-new-bindir-to-exec_path-v2.patch Description: Binary data
Re: Inadequate executor locking of indexes
On Fri, 5 Apr 2019 at 02:22, David Rowley wrote: > v2 attached. Wrong patch. Here's what I meant to send. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services use_rellockmode_for_indexes_too_v3.patch Description: Binary data
Re: Inadequate executor locking of indexes
On Thu, 4 Apr 2019 at 15:35, Amit Langote wrote: > BTW, get_actual_variable_range is static to selfuncs.c and other public > functions that are entry point to get_actual_variable_range's > functionality appear to require having built a RelOptInfo first, which > means (even for third party code) having gone through get_relation_info > and opened the indexes. That may be too much wishful thinking though. I think we should do this. We have the CheckRelationLockedByMe Asserts to alert us if this ever gets broken. I think even if something added a get_relation_info_hook to alter the indexes somehow, say to remove one then it should still be okay as get_actual_variable_range() only looks at index that are in that list and the other two functions are dealing with Paths, which couldn't exist for an index that was removed from RelOptInfo->indexlist. > >> Also, I noticed that there is infer_arbiter_indexes() too, which opens the > >> target table's indexes with RowExclusiveLock. I thought for a second > >> that's a index-locking site in the planner that you may have missed, but > >> turns out it might very well be the first time those indexes are locked in > >> a given insert query's processing, because query_planner doesn't need to > >> plan access to the result relation, so get_relation_info is not called. > > > > I skimmed over that and thought that the rellockmode would always be > > RowExclusiveLock anyway, so didn't change it. However, it would make > > sense to use rellockmode for consistency. We're already looking up the > > RangeTblEntry to get the relid, so getting the rellockmode is about > > free. > > Yeah, maybe it'd be a good idea, if only for consistency, to fetch the > rellockmode of the resultRelation RTE and use it for index_open. I've changed infer_arbiter_indexes() too, but decided to use rellockmode rather than NoLock since we're not using RelOptInfo->indexlist. If anything uses get_relation_info_hook to remove indexes and then closes removed ones releasing the lock, then we could end up with problems here. v2 attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services use_rellockmode_for_indexes_too_v2.patch Description: Binary data
Re: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation
On Thu, Apr 4, 2019 at 1:23 PM Masahiko Sawada wrote: > > On Thu, Apr 4, 2019 at 1:26 PM Tsunakawa, Takayuki > wrote: > > > > From: Fujii Masao [mailto:masao.fu...@gmail.com] > > > reloption for TOAST is also required? > > > > # I've come back to the office earlier than planned... > > > > Hm, there's no reason to not provide toast.vacuum_shrink_enabled. Done > > with the attached patch. > > > > Thank you for updating the patch! +1! > +vacuum_shrink_enabled, > toast.vacuum_shrink_enabled > (boolean) > + > + > + Enables or disables shrinking the table when it's vacuumed. > + This also applies to autovacuum. > + The default is true. If true, VACUUM frees empty pages at the > end of the table. > > "VACUUM" needs or "vacuum" is more appropriate here? also, the documentation should point out that freeing is not guaranteed. Something like + The default is true. If true, VACUUM will try to free empty pages at the end of the table. > I'm not sure the consensus we got here but we don't make the vacuum > command option for this? I don't think here's a clear consensus, but my personal vote is to add it, with SHRINK_TABLE = [ force_on | force_off | default ] (unless a better proposal has been made already)
Re: Refactoring the checkpointer's fsync request queue
On 2019-Apr-04, Thomas Munro wrote: > I don't think it's project policy to put a single typedef into its own > header like that, and I'm not sure where else to put it. shrug. Looks fine to me. I suppose if we don't have it anywhere, it's just because we haven't needed that particular trick yet. Creating a file with a lone typedef seems better than using uint32 to me. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services