Re: [HACKERS] Setting pd_lower in GIN metapage
On Fri, Nov 3, 2017 at 1:10 AM, Amit Kapilawrote: > On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane wrote: >> I've marked the CF entry closed. However, I'm not sure if we're quite >> done with this thread. Weren't we going to adjust nbtree and hash to >> be more aggressive about labeling their metapages as REGBUF_STANDARD? > > I have already posted the patches [1] for the same in this thread and > those are reviewed [2][3] as well. I have adjusted the comments as per > latest commit. Please find updated patches attached. Confirmed. Setting those makes sense even if REGBUF_WILL_INIT is set, at least for page masking. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - use enum for meta commands
[ pgbench-enum-meta-2.patch ] Pushed with some trivial cosmetic adjustments (pgindent changed it more than I did). Ok. Thanks. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench - option to build using ppoll() for larger connection counts
Hello, Could you rebase the v11 patch? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
On 3 November 2017 at 03:26, Craig Ringerwrote: > On 2 November 2017 at 22:22, David Rowley > wrote: >> Maybe, but the new implementation is not going to do well with places >> where we perform lcons(). Probably many of those places could be >> changed to lappend(), but I bet there's plenty that need prepend. > > Yeah, and it's IMO significant that pretty much every nontrivial > system with ADTs offers multiple implementations of list data > structures, often wrapped with a common API. > > Java's Collections, the STL, you name it. I've never really looked at much Java, but I've done quite a bit of dotnet stuff in my time and they have a common interface ICollection and various data structures that implement that interface. Which is implemented by a bunch of classes, something like: ConcurrentDictionary (hash table) Dictionary (hash table) HashSet (hash table) LinkedList (some kinda linked list) List (Arrays, probably) SortedDictionary (bst, I think) SortedList (no idea, perhaps a btree) SortedSet Bag (no idea, but does not care about any order) Probably there are more, but the point is that they obviously don't believe there's a one-size-fits-all type. I don't think we should do that either. However, I've proved nothing on the performance front yet, so that should probably be my next step. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL and Encryption
On Fri, Nov 3, 2017 at 3:19 AM, Craig Ringerwrote: > This is probably off topic for pgsql-hackers. > > For password crypto please go read the SCRAM thread and the PostgreSQL > 10 release notes. The SCRAM discussion is spread across two threads mainly with hundreds of emails, which may discourage even the bravest. Here are links to the important documentation: https://www.postgresql.org/docs/current/static/auth-methods.html#auth-password https://www.postgresql.org/docs/10/static/sasl-authentication.html And PostgreSQL implements SCRAM-SHA-256 following RFCs 7677 and 5802: https://tools.ietf.org/html/rfc5802 https://tools.ietf.org/html/rfc7677 -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 31 October 2017 at 21:43, Amit Langotewrote: > Attached updated version of the patches addressing some of your comments I've spent a bit of time looking at these but I'm out of time for now. So far I have noted down the following; 1. This comment seem wrong. /* * Since the clauses in rel->baserestrictinfo should all contain Const * operands, it should be possible to prune partitions right away. */ How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ? baserestrictinfo in this case will contain a single RestrictInfo with an OpExpr containing two Var args and it'll come right through that function too. 2. This code is way more complex than it needs to be. if (num_parts > 0) { int j; all_indexes = (int *) palloc(num_parts * sizeof(int)); j = 0; if (min_part_idx >= 0 && max_part_idx >= 0) { for (i = min_part_idx; i <= max_part_idx; i++) all_indexes[j++] = i; } if (!bms_is_empty(other_parts)) while ((i = bms_first_member(other_parts)) >= 0) all_indexes[j++] = i; if (j > 1) qsort((void *) all_indexes, j, sizeof(int), intcmp); } It looks like the min/max partition stuff is just complicating things here. If you need to build this array of all_indexes[] anyway, I don't quite understand the point of the min/max. It seems like min/max would probably work a bit nicer if you didn't need the other_parts BitmapSet, so I recommend just getting rid of min/max completely and just have a BitmapSet with bit set for each partition's index you need, you'd not need to go to the trouble of performing a qsort on an array and you could get rid of quite a chunk of code too. The entire function would then not be much more complex than: partindexes = get_partitions_from_clauses(parent, partclauses); while ((i = bms_first_member(partindexes)) >= 0) { AppendRelInfo *appinfo = rel->part_appinfos[i]; result = lappend(result, appinfo); } Then you can also get rid of your intcmp() function too. 3. Following code has the wrong size calculation: memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType *)); should be PARTITION_MAX_KEYS * sizeof(NullTestType). It might have worked on your machine if you're compiling as 32 bit. I'll continue on with the review in the next few days. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL and Encryption
On 3 November 2017 at 11:16, chiru rwrote: > Hi , > > Please suggest the best chiper suite to configure openSSL for PostgreSQL > Server and client?. > > How to use other than md5 encryption algorithm to encrypt the passwords in > PostgreSQL? This is probably off topic for pgsql-hackers. For password crypto please go read the SCRAM thread and the PostgreSQL 10 release notes. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SSL and Encryption
Hi , Please suggest the best chiper suite to configure openSSL for PostgreSQL Server and client?. How to use other than md5 encryption algorithm to encrypt the passwords in PostgreSQL? Thanks, Chiru
Re: [HACKERS] path toward faster partition pruning
On 31 October 2017 at 21:43, Amit Langotewrote: > Attached updated version of the patches addressing some of your comments > above and fixing a bug that Rajkumar reported [1]. As mentioned there, > I'm including here a patch (the 0005 of the attached) to tweak the default > range partition constraint to be explicit about null values that it might > contain. So, there are 6 patches now and what used to be patch 0005 in > the previous set is patch 0006 in this version of the set. Hi Amit, I've been looking over this. I see the latest patches conflict with cf7ab13bf. Can you send patches rebased on current master? Thanks -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
Sokolov Yurawrites: > [ 0001-Improve-compactify_tuples.patch, v5 or thereabouts ] I started to review this patch. I spent a fair amount of time on beautifying the code, because I found it rather ugly and drastically undercommented. Once I had it to the point where it seemed readable, I went to check the shellsort algorithm against Wikipedia's entry, and found that this appears to be an incorrect implementation of shellsort: where pg_shell_sort_pass has for (_i = off; _i < _n; _i += off) \ it seems to me that we need to have for (_i = off; _i < _n; _i += 1) \ or maybe just _i++. As-is, this isn't h-sorting the whole file, but just the subset of entries that have multiple-of-h indexes (ie, the first of the h distinct subfiles that should get sorted). The bug is masked by the final pass of plain insertion sort, but we are not getting the benefit we should get from the earlier passes. However, I'm a bit dubious that it's worth fixing that; instead my inclination would be to rip out the shellsort implementation entirely. The code is only using it for the nitems <= 48 case (which makes the first three offset steps certainly no-ops) and I am really unconvinced that it's worth expending the code space for a shellsort rather than plain insertion sort in that case, especially when we have good reason to think that the input data is nearly sorted. BTW, the originally given test case shows no measurable improvement on my box. I was eventually able to convince myself by profiling that the patch makes us spend less time in compactify_tuples, but this test case isn't a very convincing one. So, quite aside from the bug, I'm not excited about committing the attached as-is. I think we should remove pg_shell_sort and just use pg_insertion_sort. If somebody can show a test case that provides a measurable speed improvement from the extra code, I could be persuaded to reconsider. I also wonder if the nitems <= 48 cutoff needs to be reconsidered in light of this. But since I can hardly measure any benefit from the patch at all, I'm not in the best position to test different values for that cutoff. Have not looked at the 0002 patch yet. regards, tom lane diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index 41642eb..1af1b85 100644 *** a/src/backend/storage/page/bufpage.c --- b/src/backend/storage/page/bufpage.c *** *** 18,23 --- 18,24 #include "access/itup.h" #include "access/xlog.h" #include "storage/checksum.h" + #include "utils/inline_sort.h" #include "utils/memdebug.h" #include "utils/memutils.h" *** typedef struct itemIdSortData *** 425,439 } itemIdSortData; typedef itemIdSortData *itemIdSort; ! static int itemoffcompare(const void *itemidp1, const void *itemidp2) { - /* Sort in decreasing itemoff order */ return ((itemIdSort) itemidp2)->itemoff - ((itemIdSort) itemidp1)->itemoff; } /* * After removing or marking some line pointers unused, move the tuples to * remove the gaps caused by the removed items. */ --- 426,542 } itemIdSortData; typedef itemIdSortData *itemIdSort; ! /* Comparator for sorting in decreasing itemoff order */ ! static inline int itemoffcompare(const void *itemidp1, const void *itemidp2) { return ((itemIdSort) itemidp2)->itemoff - ((itemIdSort) itemidp1)->itemoff; } /* + * Sort an array of itemIdSort's on itemoff, descending. + * + * This uses Shell sort. Given that array is small and itemoffcompare + * can be inlined, it is much faster than general-purpose qsort. + */ + static void + sort_itemIds_small(itemIdSort itemidbase, int nitems) + { + pg_shell_sort(itemIdSortData, itemidbase, nitems, itemoffcompare); + } + + /* + * Sort an array of itemIdSort's on itemoff, descending. + * + * This uses bucket sort: + * - single pass of stable prefix sort on high 8 bits of itemoffs + * - then insertion sort on buckets larger than 1 element + */ + static void + sort_itemIds(itemIdSort itemidbase, int nitems) + { + /* number of buckets to use: */ + #define NSPLIT 256 + /* divisor to scale input values into 0..NSPLIT-1: */ + #define PREFDIV (BLCKSZ / NSPLIT) + /* per-bucket counts; we need two extra elements, see below */ + uint16 count[NSPLIT + 2]; + itemIdSortData copy[Max(MaxIndexTuplesPerPage, MaxHeapTuplesPerPage)]; + int i, + max, + total, + pos, + highbits; + + Assert(nitems <= lengthof(copy)); + + /* + * Count how many items in each bucket. We assume all itemoff values are + * less than BLCKSZ, therefore dividing by PREFDIV gives a value less than + * NSPLIT. + */ + memset(count, 0, sizeof(count)); + for (i = 0; i < nitems; i++) + { + highbits = itemidbase[i].itemoff / PREFDIV; + count[highbits]++; + } + + /* + * Now convert counts to bucket position info, placing the buckets in + *
[HACKERS] Skip unneeded temp file in 'make html'
Folks, Please find attached a patch for $Subject. Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate Don't make an unneeded temp file In passing, make a slight correction to the regex. --- doc/src/sgml/Makefile | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile index 428eb569fc..f4ded1b3eb 100644 --- a/doc/src/sgml/Makefile +++ b/doc/src/sgml/Makefile @@ -147,14 +147,12 @@ INSTALL.xml: standalone-profile.xsl standalone-install.xml postgres.xml # if we try to do "make all" in a VPATH build without the explicit # $(srcdir) on the postgres.sgml dependency in this rule. GNU make bug? postgres.xml: $(srcdir)/postgres.sgml $(ALLSGML) - $(OSX) $(SPFLAGS) $(SGMLINCLUDE) -x lower $< >$@.tmp - $(call mangle-xml,book) + $(OSX) $(SPFLAGS) $(SGMLINCLUDE) -x lower $< | $(call mangle-xml,book) define mangle-xml -$(PERL) -p -e 's/\[(aacute|acirc|aelig|agrave|amp|aring|atilde|auml|bull|copy|eacute|egrave|gt|iacute|lt|mdash|nbsp|ntilde|oacute|ocirc|oslash|ouml|pi|quot|scaron|uuml) *\]/\&\1;/gi;' \ +$(PERL) -p -e 's/\[(aacute|acirc|aelig|agrave|amp|aring|atilde|auml|bull|copy|eacute|egrave|gt|iacute|lt|mdash|nbsp|ntilde|oacute|ocirc|oslash|ouml|pi|quot|scaron|uuml) *\]/\&$$1;/gio;' \ -e '$$_ .= qq{http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd;>\n} if $$. == 1;' \ - <$@.tmp > $@ -rm $@.tmp + > $@ endef -- 2.13.6 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Fri, Nov 3, 2017 at 2:24 PM, Peter Geogheganwrote: > Thomas Munro wrote: >> That way you don't have to opt in to BufFile's >> double buffering and segmentation schemes just to get shared file >> clean-up, if for some reason you want direct file handles. > > Is that something that you really think is possible? It's pretty far fetched, but maybe shared temporary relation files accessed via smgr.c/md.c? Or maybe future things that don't want to read/write through a buffer but instead want to mmap it. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Thomas Munrowrote: I'm going to make an item on my personal TODO list for that. No useful insights on that right now, though. I decided to try that, but it didn't really work: fd.h gets included by front-end code, so I can't very well define a struct and declare functions that deal in dsm_segment and slock_t. On the other hand it does seem a bit better to for these shared file sets to work in terms of File, not BufFile. Realistically, fd.h has a number of functions that are really owned by buffile.c already. This sounds fine. That way you don't have to opt in to BufFile's double buffering and segmentation schemes just to get shared file clean-up, if for some reason you want direct file handles. Is that something that you really think is possible? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Wed, Nov 1, 2017 at 2:11 PM, Peter Geogheganwrote: > On Tue, Oct 31, 2017 at 5:07 PM, Thomas Munro > wrote: >> Another complaint is that perhaps fd.c >> knows too much about buffile.c's business. For example, >> RemovePgTempFilesInDir() knows about the ".set" directories created by >> buffile.c, which might be called a layering violation. Perhaps the >> set/directory logic should move entirely into fd.c, so you'd call >> FileSetInit(FileSet *), not BufFileSetInit(BufFileSet *), and then >> BufFileOpenShared() would take a FileSet *, not a BufFileSet *. >> Thoughts? > > I'm going to make an item on my personal TODO list for that. No useful > insights on that right now, though. I decided to try that, but it didn't really work: fd.h gets included by front-end code, so I can't very well define a struct and declare functions that deal in dsm_segment and slock_t. On the other hand it does seem a bit better to for these shared file sets to work in terms of File, not BufFile. That way you don't have to opt in to BufFile's double buffering and segmentation schemes just to get shared file clean-up, if for some reason you want direct file handles. So I in the v24 parallel hash patch set I just posted over in the other thread I have moved it into its own translation unit sharedfileset.c and made it work with File objects. buffile.c knows how to use it as a source of segment files. I think that's better. > If the new standard is that you have temp file names that suggest the > purpose of each temp file, then that may be something that parallel > CREATE INDEX should buy into. Yeah, I guess that could be useful. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Fri, Nov 3, 2017 at 2:54 AM, Tom Lanewrote: > Amit Langote writes: >> On 2017/09/26 16:30, Michael Paquier wrote: >>> Cool, let's switch it back to a ready for committer status then. > >> Sure, thanks. > > Pushed with some cosmetic adjustments --- mostly, making the comments more > explicit about why we need the apparently-redundant assignments to > pd_lower. > > I've marked the CF entry closed. However, I'm not sure if we're quite > done with this thread. Weren't we going to adjust nbtree and hash to > be more aggressive about labeling their metapages as REGBUF_STANDARD? > I have already posted the patches [1] for the same in this thread and those are reviewed [2][3] as well. I have adjusted the comments as per latest commit. Please find updated patches attached. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BTKg6ZK%3DmF14x_wf2KrmOxoMJ6z7YUK3-78acaYLwQ8Q%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAB7nPqTE4-GCaLtDh%3DJBcgUKR6B5WkvRLC-NpOqkgybi4FhHPw%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAB7nPqQBtDW43ABnWEdoHP6A2ToedzDFdpykbGjpO2wuZNiQnw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com change_metapage_usage_btree-v3.patch Description: Binary data change_metapage_usage_hash-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Hash take II
On Mon, Oct 30, 2017 at 1:49 PM, Thomas Munrowrote: > A couple of stupid things outstanding: > > 1. EXPLAIN ANALYZE for Parallel Hash "actual" shows the complete row > count, which is interesting to know (or not? maybe I should show it > somewhere else?), but the costing shows the partial row estimate used > for costing purposes. Fixed. > 2. The BufFileSet's temporary directory gets created even if you > don't need it for batches. Duh. Fixed. I also refactored shared temporary files a bit while looking into this. The shared file ownership mechanism is now promoted to its own translation unit sharedfileset.c and it now works with fd.c files. buffile.c can still make use of it. That seems like a better division of labour. > 3. I don't have a good query rescan regression query yet. I wish I > could write my own query plans to test the executor. I found a query that rescans a parallel-aware hash join and added a couple of variants to the regression tests. I also decluttered the EXPLAIN ANALYZE output for enable_parallel_hash = off a bit. -- Thomas Munro http://www.enterprisedb.com parallel-hash-v24.patchset.tgz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linking libpq statically to libssl
Peter, * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > On 10/27/17 08:24, Daniele Varrazzo wrote: > > I have a problem building binary packages for psycopg2. Binary > > packages ship with their own copies of libpq and libssl; > > Aside from the advice of "don't do that" ... > > > however if > > another python package links to libssl the library will be imported > > twice with conflicting symbols, likely resulting in a segfault (see > > https://github.com/psycopg/psycopg2/issues/543). This happens e.g. if > > a python script both connects to postgres and opens an https resource. > > ... the standard solutions to this problem are symbol versioning and > linker flags to avoid making all symbols globally available. libpq has > symbol versioning. Maybe the libssl you are using does not. Also, for > example, if you dlopen() with RTLD_LOCAL, the symbols will not be > globally available, so there should be no conflicts. Uh, libpq doesn't actually have symbol versioning, at least not on the installed Ubuntu packages for PG10, as shown by objdump -T : 00011d70 gDF .text 0054 Base PQconnectStart and we've certainly not spent effort that I've seen to try to actually make libpq work when multiple versions of libpq are linked into the same running backend. I addressed that in my comments also and I don't believe you could guarantee that things would operate correctly even with symbol versioning being used. Perhaps if you versioned your symbols with something wildly different from what's on the system and hope that what's on the system never ends up with that version then maybe it'd work, but that's quite far from the intended use-case of symbol versioning. > This needs cooperation from various different parties, and the details > will likely be platform dependent. But it's generally a solved problem. Right, it's solved when there's one source of truth for the library which is being maintained consistently and carefully. That allows multiple versions of libraries to be installed concurrently and linked into a running process at the same time, because the group maintaining those different versions of the library make sure that the symbols are versioned with appropriate versions and they make sure to not break ABI for those symbols unless they are also changing the version and putting out a new version. This was all solved with GLIBC a very long time ago, but it's a heck of a lot of work to get it right and correct, and that's when you've got a bunch of people who work on libraries all the time and carefully control all of the versions which are installed on the OS in coordination. Trying to do so when you can't control what's happing with the other library strikes me as highly likely to result in a whole lot of difficulties. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] A hook for session start
On Thu, Nov 2, 2017 at 5:42 AM, Aleksandr Parfenov < a.parfe...@postgrespro.ru> wrote: > > The following review has been posted through the commitfest application: > make installcheck-world: not tested > Implements feature: not tested > Spec compliant: not tested > Documentation:not tested > > Hi, > > Unfortunately, patches 0001 and 0002 don't apply to current master. > > The new status of this patch is: Waiting on Author > Thanks for your review. Rebased versions attached. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 2c7260e..24346fb 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -169,6 +169,9 @@ static ProcSignalReason RecoveryConflictReason; static MemoryContext row_description_context = NULL; static StringInfoData row_description_buf; +/* Hook for plugins to get control at start of session */ +session_start_hook_type session_start_hook = NULL; + /* * decls for routines only used in this file * @@ -3857,6 +3860,9 @@ PostgresMain(int argc, char *argv[], if (!IsUnderPostmaster) PgStartTime = GetCurrentTimestamp(); + if (session_start_hook) + (*session_start_hook) (dbname, username); + /* * POSTGRES main processing loop begins here * diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h index f8c535c..d349592 100644 --- a/src/include/tcop/tcopprot.h +++ b/src/include/tcop/tcopprot.h @@ -35,6 +35,11 @@ extern PGDLLIMPORT const char *debug_query_string; extern int max_stack_depth; extern int PostAuthDelay; +/* Hook for plugins to get control at start of session */ +typedef void (*session_start_hook_type) (const char *dbname, + const char *username); +extern PGDLLIMPORT session_start_hook_type session_start_hook; + /* GUC-configurable parameters */ typedef enum diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 24346fb..0dbbd7b 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -169,9 +169,9 @@ static ProcSignalReason RecoveryConflictReason; static MemoryContext row_description_context = NULL; static StringInfoData row_description_buf; -/* Hook for plugins to get control at start of session */ +/* Hook for plugins to get control at start or end of session */ session_start_hook_type session_start_hook = NULL; - +session_end_hook_type session_end_hook = NULL; /* * decls for routines only used in this file * @@ -196,6 +196,7 @@ static void drop_unnamed_stmt(void); static void log_disconnections(int code, Datum arg); static void enable_statement_timeout(void); static void disable_statement_timeout(void); +static void do_session_end_hook(int code, Datum arg); /* @@ -3864,6 +3865,12 @@ PostgresMain(int argc, char *argv[], (*session_start_hook) (dbname, username); /* + * Setup handler to session end hook + */ + if (IsUnderPostmaster) + on_proc_exit(do_session_end_hook, 0); + + /* * POSTGRES main processing loop begins here * * If an exception is encountered, processing resumes here so we abort the @@ -4615,3 +4622,15 @@ disable_statement_timeout(void) stmt_timeout_active = false; } } + +/* + * on_proc_exit handler to call session end hook + */ +static void +do_session_end_hook(int code, Datum arg) +{ + Port *port = MyProcPort; + + if (session_end_hook) + (*session_end_hook) (port->database_name, port->user_name); +} \ No newline at end of file diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h index d349592..b7fb8c3 100644 --- a/src/include/tcop/tcopprot.h +++ b/src/include/tcop/tcopprot.h @@ -35,10 +35,14 @@ extern PGDLLIMPORT const char *debug_query_string; extern int max_stack_depth; extern int PostAuthDelay; -/* Hook for plugins to get control at start of session */ +/* Hook for plugins to get control at start and end of session */ typedef void (*session_start_hook_type) (const char *dbname, const char *username); +typedef void (*session_end_hook_type) (const char *dbname, + const char *username); + extern PGDLLIMPORT session_start_hook_type session_start_hook; +extern PGDLLIMPORT session_end_hook_type session_end_hook; /* GUC-configurable parameters */ diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index b7ed0af..a3c8c1e 100644 --- a/src/test/modules/Makefile +++
Re: [HACKERS] SQL/JSON in PostgreSQL
On 2017-02-28 20:08, Oleg Bartunov wrote: > Attached patch is an implementation of SQL/JSON data model from SQL-2016 > standard (ISO/IEC 9075-2:2016(E)) I've faintly started looking into this. > We created repository for reviewing (ask for write access) - > https://github.com/postgrespro/sqljson/tree/sqljson > Examples of usage can be found in src/test/regress/sql/sql_json.sql > The whole documentation about json support should be reorganized and added, > and we plan to do this before release. We need help of community here. > The standard describes SQL/JSON path language, which used by SQL/JSON query > operators to query JSON. It defines path language as string literal. We > implemented the path language as JSONPATH data type, since other > approaches are not friendly to planner and executor. I was a bit sad to discover that I can't PREPARE jsq AS SELECT JSON_QUERY('{}', $1); I assume because of this part of the updated grammar: json_path_specification: Sconst { $$ = $1; } ; Would it make sense, fundamentally, to allow variables there? After Andrew Gierth's analysis of this grammar problem, I understand that it's not reasonable to expect JSON_TABLE() to support variable jsonpaths, but maybe it would be feasible for everything else? From Andrew's changes to the new grammar (see attached) it seems to me that at least that part is possible. Or should I forget about trying to implement the other part? diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index adfe9b1..f459996 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -624,6 +624,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); json_table_plan_cross json_table_plan_primary json_table_default_plan + json_path_specification %type json_arguments json_passing_clause_opt @@ -635,8 +636,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type json_returning_clause_opt -%type json_path_specification - json_table_column_path_specification_clause_opt +%type json_table_column_path_specification_clause_opt json_table_path_name json_as_path_name_clause_opt @@ -845,6 +845,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); */ %nonassoc UNBOUNDED /* ideally should have same precedence as IDENT */ %nonassoc ERROR_P EMPTY_P DEFAULT ABSENT /* JSON error/empty behavior */ +%nonassoc COLUMNS FALSE_P KEEP OMIT PASSING TRUE_P UNKNOWN %nonassoc IDENT GENERATED NULL_P PARTITION RANGE ROWS PRECEDING FOLLOWING CUBE ROLLUP %left Op OPERATOR /* multi-character ops and user-defined operators */ %left '+' '-' @@ -14472,7 +14473,7 @@ json_context_item: ; json_path_specification: - Sconst { $$ = $1; } + a_expr { $$ = $1; } ; json_as_path_name_clause_opt: @@ -14802,7 +14803,7 @@ json_table_formatted_column_definition: ; json_table_nested_columns: - NESTED path_opt json_path_specification + NESTED path_opt Sconst json_as_path_name_clause_opt json_table_columns_clause { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote: > Nico Williamswrote: > >A MERGE mapped to a DML like this: > > This is a bad idea. An implementation like this is not at all > maintainable. Assuming the DELETE issue can be addressed, why would this not be maintainable? > >can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE. > > That's not handling concurrency -- it's silently ignoring an error. Who > is to say that the conflict that IGNORE ignored is associated with a row > visible to the MVCC snapshot of the statement? IOW, why should the DELETE > affect any row? Ah, yes, we'd have to make sure the DELETE does not delete rows that could not be inserted. There's... no way to find out what those would have been -- RETURNING won't mention them, though it'd be a nice addition to UPSERT to have a way to do that, and it'd make this mapping feasible. > There are probably a great many reasons why you need a ModifyTable > executor node that keeps around state, and explicitly indicates that a > MERGE is a MERGE. For example, we'll probably want statement level > triggers to execute in a fixed order, regardless of the MERGE, RLS will > probably require explicitly knowledge of MERGE semantics, and so on. Wouldn't those fire anyways in a statement like the one I mentioned? > FWIW, your example doesn't actually have a source (just a target), so it > isn't actually like MERGE. That can be added. I was trying to keep it pithy. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - use enum for meta commands
Fabien COELHOwrites: > [ pgbench-enum-meta-2.patch ] Pushed with some trivial cosmetic adjustments (pgindent changed it more than I did). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GnuTLS support
On 09/18/2017 07:04 PM, Jeff Janes wrote:> You fixed the first issue, but I still get the second one: be-secure-gnutls.c: In function 'get_peer_certificate': be-secure-gnutls.c:667: error: 'GNUTLS_X509_CRT_LIST_SORT' undeclared (first use in this function) be-secure-gnutls.c:667: error: (Each undeclared identifier is reported only once be-secure-gnutls.c:667: error: for each function it appears in.) Thanks again for testing the code. I have now rebased the patch and fixed the second issue. I tested that it works on CentOS 6. Work which remains: - sslinfo - pgcrypto - Documentation - Decide if what I did with the config is a good idea Andreas diff --git a/configure b/configure index 4ecd2e1922..1ba34dfced 100755 --- a/configure +++ b/configure @@ -709,6 +709,7 @@ UUID_EXTRA_OBJS with_uuid with_systemd with_selinux +with_gnutls with_openssl krb_srvtab with_python @@ -837,6 +838,7 @@ with_bsd_auth with_ldap with_bonjour with_openssl +with_gnutls with_selinux with_systemd with_readline @@ -1531,6 +1533,7 @@ Optional Packages: --with-ldap build with LDAP support --with-bonjour build with Bonjour support --with-openssl build with OpenSSL support + --with-gnutls build with GnuTS support --with-selinux build with SELinux support --with-systemd build with systemd support --without-readline do not use GNU Readline nor BSD Libedit for editing @@ -5997,6 +6000,41 @@ fi $as_echo "$with_openssl" >&6; } +# +# GnuTLS +# +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with GnuTLS support" >&5 +$as_echo_n "checking whether to build with GnuTLS support... " >&6; } + + + +# Check whether --with-gnutls was given. +if test "${with_gnutls+set}" = set; then : + withval=$with_gnutls; + case $withval in +yes) + +$as_echo "#define USE_GNUTLS 1" >>confdefs.h + + ;; +no) + : + ;; +*) + as_fn_error $? "no argument expected for --with-gnutls option" "$LINENO" 5 + ;; + esac + +else + with_gnutls=no + +fi + + +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_gnutls" >&5 +$as_echo "$with_gnutls" >&6; } + + # # SELinux # @@ -10164,6 +10202,94 @@ done fi +if test "$with_gnutls" = yes ; then + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing gnutls_init" >&5 +$as_echo_n "checking for library containing gnutls_init... " >&6; } +if ${ac_cv_search_gnutls_init+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_func_search_save_LIBS=$LIBS +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +#ifdef __cplusplus +extern "C" +#endif +char gnutls_init (); +int +main () +{ +return gnutls_init (); + ; + return 0; +} +_ACEOF +for ac_lib in '' gnutls; do + if test -z "$ac_lib"; then +ac_res="none required" + else +ac_res=-l$ac_lib +LIBS="-l$ac_lib $ac_func_search_save_LIBS" + fi + if ac_fn_c_try_link "$LINENO"; then : + ac_cv_search_gnutls_init=$ac_res +fi +rm -f core conftest.err conftest.$ac_objext \ +conftest$ac_exeext + if ${ac_cv_search_gnutls_init+:} false; then : + break +fi +done +if ${ac_cv_search_gnutls_init+:} false; then : + +else + ac_cv_search_gnutls_init=no +fi +rm conftest.$ac_ext +LIBS=$ac_func_search_save_LIBS +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_gnutls_init" >&5 +$as_echo "$ac_cv_search_gnutls_init" >&6; } +ac_res=$ac_cv_search_gnutls_init +if test "$ac_res" != no; then : + test "$ac_res" = "none required" || LIBS="$ac_res $LIBS" + +else + as_fn_error $? "library 'gnutls' is required for GnuTLS" "$LINENO" 5 +fi + + # GnuTLS versions before 3.4.0 do not support sorting incorrectly sorted + # certificate chains. + ac_fn_c_check_decl "$LINENO" "GNUTLS_X509_CRT_LIST_SORT" "ac_cv_have_decl_GNUTLS_X509_CRT_LIST_SORT" "#include +#include + +" +if test "x$ac_cv_have_decl_GNUTLS_X509_CRT_LIST_SORT" = xyes; then : + ac_have_decl=1 +else + ac_have_decl=0 +fi + +cat >>confdefs.h <<_ACEOF +#define HAVE_DECL_GNUTLS_X509_CRT_LIST_SORT $ac_have_decl +_ACEOF + + for ac_func in gnutls_pkcs11_set_pin_function +do : + ac_fn_c_check_func "$LINENO" "gnutls_pkcs11_set_pin_function" "ac_cv_func_gnutls_pkcs11_set_pin_function" +if test "x$ac_cv_func_gnutls_pkcs11_set_pin_function" = xyes; then : + cat >>confdefs.h <<_ACEOF +#define HAVE_GNUTLS_PKCS11_SET_PIN_FUNCTION 1 +_ACEOF + +fi +done + +fi + if test "$with_pam" = yes ; then { $as_echo "$as_me:${as_lineno-$LINENO}: checking for pam_start in -lpam" >&5 $as_echo_n "checking for pam_start in -lpam... " >&6; } @@ -10961,6 +11087,17 @@ else fi +fi + +if test "$with_gnutls" = yes ; then + ac_fn_c_check_header_mongrel "$LINENO" "gnutls/gnutls.h" "ac_cv_header_gnutls_gnutls_h" "$ac_includes_default" +if test
Re: [HACKERS] MERGE SQL Statement for PG11
Nico Williamswrote: A MERGE mapped to a DML like this: WITH updated AS ( UPDATE SET ... WHERE RETURNING ) , inserted AS ( INSERT INTO SELECT ... WHERE NOT IN (SELECT FROM updated) AND .. ON CONFLICT DO NOTHING -- see below! RETURNING ) DELETE FROM WHERE NOT IN (SELECT FROM updated) AND NOT IN (SELECT FROM inserted) AND ...; This is a bad idea. An implementation like this is not at all maintainable. can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE. That's not handling concurrency -- it's silently ignoring an error. Who is to say that the conflict that IGNORE ignored is associated with a row visible to the MVCC snapshot of the statement? IOW, why should the DELETE affect any row? There are probably a great many reasons why you need a ModifyTable executor node that keeps around state, and explicitly indicates that a MERGE is a MERGE. For example, we'll probably want statement level triggers to execute in a fixed order, regardless of the MERGE, RLS will probably require explicitly knowledge of MERGE semantics, and so on. FWIW, your example doesn't actually have a source (just a target), so it isn't actually like MERGE. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 02, 2017 at 02:05:19PM -0700, Peter Geoghegan wrote: > Simon Riggswrote: > >So in your view we should make no attempt to avoid concurrent errors, > >even when we have the capability to do so (in some cases) and doing so > >would be perfectly compliant with the SQLStandard. > > Yes. That's what I believe. I believe this because I can't see a way to > do this that isn't a mess, and because ON CONFLICT DO UPDATE exists and > works well for the cases where we can do better in READ COMMITTED mode. A MERGE mapped to a DML like this: WITH updated AS ( UPDATE SET ... WHERE RETURNING ) , inserted AS ( INSERT INTO SELECT ... WHERE NOT IN (SELECT FROM updated) AND .. ON CONFLICT DO NOTHING -- see below! RETURNING ) DELETE FROM WHERE NOT IN (SELECT FROM updated) AND NOT IN (SELECT FROM inserted) AND ...; can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE. Now, one could write a MERGE that produces conflicts even without concurrency, so adding ON CONFLICT DO NOTHING by default as above... seems not-quite-correct. But presumably one wouldn't write MERGE statements that produce conflicts in the absence of concurrency, so this seems close enough to me. Another thing is that MERGE itself could get an ON CONFLICT clause for the INSERT portion of the MERGE, allowing one to ignore some conflicts and not others, though there would be no need for DO UPDATE, only DO NOTHING for conflict resolution :) This seems better. I do believe this mapping is correct, and could be implemented entirely in src/backend/parser/gram.y! Am I wrong about this? Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrerawrites: > Tom Lane wrote: >> So what would happen if we just don't summarize partial ranges? > Index scan would always have to read all the heap pages for that partial > range. Maybe not a big issue, but when you finish loading a table, it'd > be good to have a mechanism to summarize that partial range ... I guess if the ranges are big this might not be nice. > Rather than remove the capability, I'd be inclined to make > brin_summarize_new_values summarize the final partial range, and have > VACUUM not do it. Would that be too inconsistent? That doesn't really get you out of the problem that this is an abuse of the relation extension lock, and is likely to cause issues when people try to optimize that locking mechanism. Why is it that the regular technique doesn't work, ie create a placeholder tuple and let it get added to by any insertions that happen? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
Amit Langotewrites: > On 2017/09/26 16:30, Michael Paquier wrote: >> Cool, let's switch it back to a ready for committer status then. > Sure, thanks. Pushed with some cosmetic adjustments --- mostly, making the comments more explicit about why we need the apparently-redundant assignments to pd_lower. I've marked the CF entry closed. However, I'm not sure if we're quite done with this thread. Weren't we going to adjust nbtree and hash to be more aggressive about labeling their metapages as REGBUF_STANDARD? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Simon Riggswrote: I think people imagined you had worked out how to make MERGE run concurrently, I certainly did, but in fact you're just saying you don't believe it ever should. I'm certain that they didn't think that at all. But I'll let them speak for themselves. That is strange since the SQL Standard specifically allows the implementation to decide upon concurrent behaviour. And yet nobody else decided to do what you propose with this apparent leeway. (See the UPSERT wiki page for many references that confirm this.) So in your view we should make no attempt to avoid concurrent errors, even when we have the capability to do so (in some cases) and doing so would be perfectly compliant with the SQLStandard. Yes. That's what I believe. I believe this because I can't see a way to do this that isn't a mess, and because ON CONFLICT DO UPDATE exists and works well for the cases where we can do better in READ COMMITTED mode. Did you know that Oracle doesn't have an EPQ style mechanism at all? Instead, it rolls back the entire statement and retries it from scratch. That is not really any further from or closer to the standard than the EPQ stuff, because the standard doesn't say anything about what should happen as far as READ COMMITTED conflict handling goes. My point here is that all of the stuff I'm talking about is only relevant in READ COMMITTED mode, in areas where the standard never provides us with guidance. If you can rely on SSI, then there is no difference between what you propose and what I propose anyway, except that what I propose is more general and will have better performance, especially for batch MERGEs. If READ COMMITTED didn't exist, implementing ON CONFLICT would have been more or less free of controversy. Yes, that certainly will make an easier patch for MERGE. Indeed, it will. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deadlock in ALTER SUBSCRIPTION REFRESH PUBLICATION
On 10/24/17 13:13, Konstantin Knizhnik wrote: > The reason of this deadlock seems to be clear: ALTER SUBSCRIPTION starts > transaction at one node and tries to create slot at other node, which waiting > for completion of all active transaction while building scnapshpot. > Is there any way to avoid this deadlock? I don't see a way to avoid it in general, unless we come up with a novel way of creating replication slots. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints
On Thu, Nov 02, 2017 at 04:20:19PM -0400, Peter Eisentraut wrote: > I haven't really thought about this feature too hard; I just want to > give you a couple of code comments. Thanks! > I think the catalog structure, and relatedly also the parser structures, > could be made more compact. We currently have condeferrable and > condeferred to represent three valid states (NOT DEFERRABLE, DEFERRABLE > INITIALLY IMMEDIATE, DEFERRABLE INITIALLY DEFERRED). You are adding > conalwaysdeferred, but you are adding only additional state (ALWAYS > DEFERRED). So we end up with three bool fields to represent four > states. I think this should all be rolled into one char field with four > states. I thought about this. I couldn't see a way to make the two existing boolean columns have a combination meaning "ALWAYS DEFERRED" that might not break something else. Since (condeferred AND NOT condeferrable) is an impossible combination today, I could use it to mean ALWAYS DEFERRED. I'm not sure how safe that would be. And it does seem like a weird way to express ALWAYS DEFERRED, though it would work. Replacing condeferred and condeferrable with a char columns also occurred to me, and though I assume backwards-incompatible changes to pg_catalog tables are fair game, I assumed everyone would prefer avoiding such changes where possible. Also, a backwards-incompatible change to the table would significantly enlarge the patch, as more version checks would be needed, particularly regarding upgrades (which are otherwise trivial). I felt adding a new column was probably safest. I'll make a backwards- incompatible change if requested, naturally, but I guess I'd want to get wider consensus on that, as I fear others may not agree. That fear may just be due to my ignorance of the community's preference as to pg_catalog backwards-compatibility vs. cleanliness. Hmmm, must I do anything special about _downgrades_? Does PostgreSQL support downgrades? > In psql and pg_dump, if you are query new catalog fields, you need to > have a version check to have a different query for >=PG11. (This would > likely apply whether you adopt my suggestion above or not.) Ah, yes, of course. I will add such a check. > Maybe a test case in pg_dump would be useful. Will do. > Other than that, this looks like a pretty complete patch. Thanks for the review! It's a small-ish patch, and my very first for PG. It was fun writing it. I greatly appreciate that PG source is easy to read. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linking libpq statically to libssl
On 10/27/17 08:24, Daniele Varrazzo wrote: > I have a problem building binary packages for psycopg2. Binary > packages ship with their own copies of libpq and libssl; Aside from the advice of "don't do that" ... > however if > another python package links to libssl the library will be imported > twice with conflicting symbols, likely resulting in a segfault (see > https://github.com/psycopg/psycopg2/issues/543). This happens e.g. if > a python script both connects to postgres and opens an https resource. ... the standard solutions to this problem are symbol versioning and linker flags to avoid making all symbols globally available. libpq has symbol versioning. Maybe the libssl you are using does not. Also, for example, if you dlopen() with RTLD_LOCAL, the symbols will not be globally available, so there should be no conflicts. This needs cooperation from various different parties, and the details will likely be platform dependent. But it's generally a solved problem. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic result sets from procedures
Peter Eisentraut wrote: > CREATE PROCEDURE pdrstest1() > LANGUAGE SQL > AS $$ > DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2; > DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3; > $$; > > CALL pdrstest1(); > > and that returns those two result sets to the client. If applied to plpgsql, to return a dynamic result, the following does work: CREATE PROCEDURE test() LANGUAGE plpgsql AS $$ DECLARE query text:= 'SELECT 1 AS col1, 2 AS col2'; BEGIN EXECUTE 'DECLARE c CURSOR WITH RETURN FOR ' || query; END; $$; This method could be used, for instance, to build a pivot with dynamic columns in a single client-server round-trip, which is not possible today with the query-calling-functions interface. More generally, I guess this should help in the whole class of situations where the client needs polymorphic results, which is awesome. But instead of having procedures not return anything, couldn't they return whatever resultset(s) they want to ("no resultset" being just a particular case of "anything"), so that we could leave out cursors and simply write: CREATE PROCEDURE test() LANGUAGE plpgsql AS $$ RETURN QUERY EXECUTE 'SELECT 1 AS col1, 2 AS col2'; END; $$; Or is that not possible or not desirable? Similarly, for the SQL language, I wonder if the above example could be simplified to: CREATE PROCEDURE pdrstest1() LANGUAGE SQL AS $$ SELECT * FROM cp_test2; SELECT * FROM cp_test3; $$; by which the two result sets would go back to the client again without declaring explicit cursors. Currently, it does not error out and no result set is sent. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 2 November 2017 at 19:16, Peter Geogheganwrote: > Simon Riggs wrote: >> >> So if I understand you correctly, in your view MERGE should just fail >> with an ERROR if it runs concurrently with other DML? > > > That's certainly my opinion on the matter. It seems like that might be > the consensus, too. Given that I only just found out what you've been talking about, I don't believe that anybody else did either. I think people imagined you had worked out how to make MERGE run concurrently, I certainly did, but in fact you're just saying you don't believe it ever should. That is strange since the SQL Standard specifically allows the implementation to decide upon concurrent behaviour. > Without meaning to sound glib: we already did make it work for a > special, restricted case that is important enough to justify introducing > a couple of kludges -- ON CONFLICT DO UPDATE/upsert. > > I do agree that what I propose for MERGE will probably cause confusion; > just look into Oracle's MERGE implementation for examples of this. We > ought to go out of our way to make it clear that MERGE doesn't provide > these guarantees. So in your view we should make no attempt to avoid concurrent errors, even when we have the capability to do so (in some cases) and doing so would be perfectly compliant with the SQLStandard. Yes, that certainly will make an easier patch for MERGE. Or are you arguing against allowing any patch for MERGE? Now we have more clarity, who else agrees with this? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 02, 2017 at 12:51:45PM -0700, Peter Geoghegan wrote: > Nico Williamswrote: > >If you want to ignore conflicts arising from concurrency you could > >always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I > >proposed earlier. Thus a MERGE CONCURRENTLY could just do that. > > > >Is there any reason not to map MERGE as I proposed? > > Performance, for one. MERGE generally has a join that can be optimized > like an UPDATE FROM join. Ah, right, I think my mapping was pessimal. How about this mapping instead then: WITH updated AS ( UPDATE SET ... WHERE RETURNING ) , inserted AS ( INSERT INTO SELECT ... WHERE NOT IN (SELECT FROM updated) AND .. /* * Add ON CONFLICT DO NOTHING here to avoid conflicts in the face * of concurrency. */ RETURNING ) DELETE FROM WHERE NOT IN (SELECT FROM updated) AND NOT IN (SELECT FROM inserted) AND ...; ? If a MERGE has no delete clause, then the mapping would be: WITH updated AS ( UPDATE SET ... WHERE RETURNING ) INSERT INTO SELECT ... WHERE NOT IN (SELECT FROM updated) AND .. /* * Add ON CONFLICT DO NOTHING here to avoid conflicts in the face * of concurrency. */ ; > I haven't studied this question in any detail, but FWIW I think that > using CTEs for merging is morally equivalent to a traditional MERGE > implementation. [...] I agree. So why not do that initially? Optimize later. Such a MERGE mapping could be implemented entirely within src/backend/parser/gram.y ... Talk about cheap to implement, review, and maintain! Also, this would be notionally very simple. Any optimizations to CTE query/DML execution would be generic and applicable to MERGE and other things besides. If mapping MERGE to CTE-using DMLs motivates such optimizations, all the better. > [...]. It may actually be possible to map from CTEs to a MERGE > statement, but I don't think that that's a good approach to implementing > MERGE. Surely not every DML with CTEs can map to MERGE. Maybe I misunderstood your comment? > Most of the implementation time will probably be spent doing things like > making sure MERGE behaves appropriately with triggers, RLS, updatable > views, and so on. That will take quite a while, but isn't particularly > technically challenging IMV. Note that mapping to a DML with CTEs as above gets triggers, RLS, and updateable views right from the get-go, because DMLs with CTEs, and DMLs as CTEs, surely do as well. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints
I haven't really thought about this feature too hard; I just want to give you a couple of code comments. I think the catalog structure, and relatedly also the parser structures, could be made more compact. We currently have condeferrable and condeferred to represent three valid states (NOT DEFERRABLE, DEFERRABLE INITIALLY IMMEDIATE, DEFERRABLE INITIALLY DEFERRED). You are adding conalwaysdeferred, but you are adding only additional state (ALWAYS DEFERRED). So we end up with three bool fields to represent four states. I think this should all be rolled into one char field with four states. In psql and pg_dump, if you are query new catalog fields, you need to have a version check to have a different query for >=PG11. (This would likely apply whether you adopt my suggestion above or not.) Maybe a test case in pg_dump would be useful. Other than that, this looks like a pretty complete patch. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > Alvaro Herrerawrites: > > Tom Lane wrote: > >> If VACUUM and brin_summarize_new_values both ignore the partial > >> range, then what else would request this? Can't we just decree > >> that we don't summarize the partial range, period? > > > brin_summarize_range() can do it. > > So what would happen if we just don't summarize partial ranges? Index scan would always have to read all the heap pages for that partial range. Maybe not a big issue, but when you finish loading a table, it'd be good to have a mechanism to summarize that partial range ... Rather than remove the capability, I'd be inclined to make brin_summarize_new_values summarize the final partial range, and have VACUUM not do it. Would that be too inconsistent? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Simon Riggswrote: So if I understand you correctly, in your view MERGE should just fail with an ERROR if it runs concurrently with other DML? That's certainly my opinion on the matter. It seems like that might be the consensus, too. Obviously there are things that you as a user can do about this on your own, like opt to use a higher isolation level, or manually LOCK TABLE. For some use cases, including bulk loading for OLAP, users might just know that there isn't going to be concurrent activity because it's not an OLTP system. If this still seems odd to you, then consider that exactly the same situation exists with UPDATE. A user could want their UPDATE to affect a row where no row version is actually visible to their MVCC snapshot, because they have an idea about reliably updating the latest row. UPDATE doesn't work like that, of course. Is this unacceptable because the user expects that it should work that way? Bear in mind that ON CONFLICT DO UPDATE *can* actually update a row when there is no version of it visible to the snapshot. It can also update a row where there is a concurrent DELETE + INSERT, and the tuples with the relevant unique index values end up not even being part of the same update chain in each case (MVCC-snapshot-visible vs. latest). IOW, you may end up updating a completely different logical row to the row with the conflicting value that is visible to your MVCC snapshot! i.e. if a race condition between the query and an INSERT runs concurrently with another INSERT We have no interest in making that work? Without meaning to sound glib: we already did make it work for a special, restricted case that is important enough to justify introducing a couple of kludges -- ON CONFLICT DO UPDATE/upsert. I do agree that what I propose for MERGE will probably cause confusion; just look into Oracle's MERGE implementation for examples of this. We ought to go out of our way to make it clear that MERGE doesn't provide these guarantees. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Nico Williamswrote: If you want to ignore conflicts arising from concurrency you could always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I proposed earlier. Thus a MERGE CONCURRENTLY could just do that. Is there any reason not to map MERGE as I proposed? Performance, for one. MERGE generally has a join that can be optimized like an UPDATE FROM join. I haven't studied this question in any detail, but FWIW I think that using CTEs for merging is morally equivalent to a traditional MERGE implementation. It may actually be possible to map from CTEs to a MERGE statement, but I don't think that that's a good approach to implementing MERGE. Most of the implementation time will probably be spent doing things like making sure MERGE behaves appropriately with triggers, RLS, updatable views, and so on. That will take quite a while, but isn't particularly technically challenging IMV. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
On Thu, Nov 02, 2017 at 11:48:44AM -0400, Tom Lane wrote: > Nico Williamswrites: > > With access controls, GUCs could become schema variables, and settings > > from postgresql.conf could move into the database itself (which I think > > would be nice). > > People re-propose some variant of that every so often, but it never works, > because it ignores the fact that some of the GUCs' values are needed > before you can access system catalogs at all, or in places where relying > on system catalog access would be a bad idea. ISTM that it should be possible to break the chicken-egg issue by having the config variables stored in such a way that knowing only the pgdata directory path should suffice to find them. That's effectively the case already in that postgresql.conf is found... there. One could do probably this as a PoC entirely as a SQL-coded VIEW that reads and writes (via the adminpack module's pg_catalog.pg_file_write()) postgresql.conf (without preserving comments, or with some rules regarding comments so that they are effectively attached to params). Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Haribabu Kommiwrites: > The changes are fine and now it reports proper live tuples in both > pg_class and stats. The other issue of continuous vacuum operation > leading to decrease of number of live tuples is not related to this > patch and that can be handled separately. I did not like this patch too much, because it did nothing to fix the underlying problem of confusion between "live tuples" and "total tuples"; in fact, it made that worse, because now the comment on LVRelStats.new_rel_tuples is a lie. And there's at least one usage of that field value where I think we do want total tuples not live tuples: where we pass it to index AM cleanup functions. Indexes don't really care whether heap entries are live or not, only that they're there. Plus the VACUUM VERBOSE log message that uses the field is supposed to be reporting total tuples not live tuples. I hacked up the patch trying to make that better, finding along the way that contrib/pgstattuple shared in the confusion about what it was trying to count. Results attached. However, I'm not sure we're there yet, because there remains a fairly nasty discrepancy even once we've gotten everyone onto the same page about reltuples counting just live tuples: VACUUM and ANALYZE have different definitions of what's "live". In particular they do not treat INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same. Should we try to do something about that? If so, what? It looks like ANALYZE's behavior is pretty tightly constrained, judging by the comments in acquire_sample_rows. Another problem is that it looks like CREATE INDEX will set reltuples to the total number of heap entries it chose to index, because that is what IndexBuildHeapScan counts. Maybe we should adjust that? regards, tom lane diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c index 5bf0613..68211c6 100644 *** a/contrib/pgstattuple/pgstatapprox.c --- b/contrib/pgstattuple/pgstatapprox.c *** statapprox_heap(Relation rel, output_typ *** 68,74 Buffer vmbuffer = InvalidBuffer; BufferAccessStrategy bstrategy; TransactionId OldestXmin; - uint64 misc_count = 0; OldestXmin = GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM); bstrategy = GetAccessStrategy(BAS_BULKREAD); --- 68,73 *** statapprox_heap(Relation rel, output_typ *** 153,177 tuple.t_tableOid = RelationGetRelid(rel); /* ! * We count live and dead tuples, but we also need to add up ! * others in order to feed vac_estimate_reltuples. */ switch (HeapTupleSatisfiesVacuum(, OldestXmin, buf)) { - case HEAPTUPLE_RECENTLY_DEAD: - misc_count++; - /* Fall through */ - case HEAPTUPLE_DEAD: - stat->dead_tuple_len += tuple.t_len; - stat->dead_tuple_count++; - break; case HEAPTUPLE_LIVE: stat->tuple_len += tuple.t_len; stat->tuple_count++; break; ! case HEAPTUPLE_INSERT_IN_PROGRESS: ! case HEAPTUPLE_DELETE_IN_PROGRESS: ! misc_count++; break; default: elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result"); --- 152,172 tuple.t_tableOid = RelationGetRelid(rel); /* ! * We follow VACUUM's lead in counting INSERT_IN_PROGRESS and ! * DELETE_IN_PROGRESS tuples as "live". */ switch (HeapTupleSatisfiesVacuum(, OldestXmin, buf)) { case HEAPTUPLE_LIVE: + case HEAPTUPLE_INSERT_IN_PROGRESS: + case HEAPTUPLE_DELETE_IN_PROGRESS: stat->tuple_len += tuple.t_len; stat->tuple_count++; break; ! case HEAPTUPLE_DEAD: ! case HEAPTUPLE_RECENTLY_DEAD: ! stat->dead_tuple_len += tuple.t_len; ! stat->dead_tuple_count++; break; default: elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result"); *** statapprox_heap(Relation rel, output_typ *** 184,191 stat->table_len = (uint64) nblocks * BLCKSZ; stat->tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned, ! stat->tuple_count + misc_count); /* * Calculate percentages if the relation has one or more pages. --- 179,190 stat->table_len = (uint64) nblocks * BLCKSZ; + /* + * Extrapolate the live-tuple count to the whole table in the same way + * that VACUUM does. (XXX shouldn't we also extrapolate tuple_len?) + */ stat->tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned, ! stat->tuple_count); /* * Calculate percentages if the relation has one or more pages. diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index ef60a58..87bba31 100644 *** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** SCRAM-SHA-256$iteration *** 1739,1746 float4 !Number of rows in the table. This is only an estimate used by the !planner. It is updated by
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 02, 2017 at 06:49:18PM +, Simon Riggs wrote: > On 1 November 2017 at 18:20, Peter Geogheganwrote: > > In Postgres, you can avoid duplicate violations with MERGE by using a > > higher isolation level (these days, those are turned into a > > serialization error at higher isolation levels when no duplicate is > > visible to the xact's snapshot). > > So if I understand you correctly, in your view MERGE should just fail > with an ERROR if it runs concurrently with other DML? > > i.e. if a race condition between the query and an INSERT runs > concurrently with another INSERT > > We have no interest in making that work? If you map MERGE to a DML with RETURNING-DML CTEs as I suggested before, how would that interact with concurrent DMLs? The INSERT DML of the mapped statement could produce conflicts that abort the whole MERGE, correct? If you want to ignore conflicts arising from concurrency you could always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I proposed earlier. Thus a MERGE CONCURRENTLY could just do that. Is there any reason not to map MERGE as I proposed? Such an implementation of MERGE wouldn't be online because CTEs are always implemented sequentially currently. That's probably reason enough to eventually produce a native implementation of MERGE, ... or to revamp the CTE machinery to allow such a mapping to be online. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrerawrites: > Tom Lane wrote: >> If VACUUM and brin_summarize_new_values both ignore the partial >> range, then what else would request this? Can't we just decree >> that we don't summarize the partial range, period? > brin_summarize_range() can do it. So what would happen if we just don't summarize partial ranges? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > Alvaro Herrerawrites: > > 2. when summarization is requested on the partial range at the end of a > > table, we acquire extension lock on the rel, then compute relation size > > and run summarization with the lock held. This guarantees that we don't > > miss any pages. This is bad for concurrency though, so it's only done > > in that specific scenario. > > Hm, I wonder how this will play with the active proposals around > reimplementing relation extension locks. All that work seems to be > assuming that the extension lock is only held for a short time and > nothing much beyond physical extension is done while holding it. > I'm afraid that you may be introducing a risk of e.g. deadlocks > if you do this. Ouch ... yeah, that could be a problem. Another idea I had was to just insert the placeholder tuple while holding the extension lock, then release the lock while the summarization is done. It would be a bit of a break of the current separation of concerns, but I'm not convinced that the current setup is perfect, so maybe that's okay. > If VACUUM and brin_summarize_new_values both ignore the partial > range, then what else would request this? Can't we just decree > that we don't summarize the partial range, period? brin_summarize_range() can do it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
If nothing else, anyone needing MERGE can port their MERGE statements to a DML with DML-containing CTEs... The generic mapping would be something like this, I think: WITH rows AS (SELECT FROM WHERE ) , updated AS ( UPDATE SET ... WHERE IN (SELECT FROM rows) /* matched */ RETURNING ) , inserted AS ( INSERT INTO SELECT ... WHERE NOT IN (SELECT FROM rows) /* not matched */ RETURNING ) DELETE FROM WHERE (...) AND NOT IN (SELECT FROM updated UNION SELECT FROM inserted); Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 1 November 2017 at 18:20, Peter Geogheganwrote: > In Postgres, you can avoid duplicate violations with MERGE by using a > higher isolation level (these days, those are turned into a > serialization error at higher isolation levels when no duplicate is > visible to the xact's snapshot). So if I understand you correctly, in your view MERGE should just fail with an ERROR if it runs concurrently with other DML? i.e. if a race condition between the query and an INSERT runs concurrently with another INSERT We have no interest in making that work? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Robert Haaswrote: And if, in the meantime, MERGE can only handle the cases where there is a unique index, then it can only handle the cases INSERT .. ON CONFLICT UPDATE can cover, which makes it, as far as I can see, syntactic sugar over what we already have. Maybe it's not entirely - you might be planning to make some minor functional enhancements - but it's not clear what those are, and I feel like whatever it is could be done with less work and more elegance by just extending the INSERT .. ON CONFLICT UPDATE syntax. +1 Marko Tiikkaja's INSERT ... ON CONFLICT SELECT patch, which is in the current CF [1], moves things in this direction. I myself have occasionally wondered if it was worth adding an alternative DO DELETE conflict_action. This could appear alongside DO UPDATE, and be applied using MERGE-style conditions. All of these things seem like small adjuncts to ON CONFLICT because they're all just an alternative way of modifying or projecting the tuple that is locked by ON CONFLICT. Everything new would have to happen after the novel ON CONFLICT handling has already completed. The only reason that I haven't pursued this is because it doesn't seem that compelling. I mention it now because It's worth acknowledging that ON CONFLICT could be pushed a bit further in this direction. Of course, this still falls far short of making ON CONFLICT entirely like MERGE. [1] https://commitfest.postgresql.org/15/1241/ -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrerawrites: > 1. in VACUUM or brin_summarize_new_values, we only process fully loaded > ranges, and ignore the partial range at end of table. OK. > 2. when summarization is requested on the partial range at the end of a > table, we acquire extension lock on the rel, then compute relation size > and run summarization with the lock held. This guarantees that we don't > miss any pages. This is bad for concurrency though, so it's only done > in that specific scenario. Hm, I wonder how this will play with the active proposals around reimplementing relation extension locks. All that work seems to be assuming that the extension lock is only held for a short time and nothing much beyond physical extension is done while holding it. I'm afraid that you may be introducing a risk of e.g. deadlocks if you do this. If VACUUM and brin_summarize_new_values both ignore the partial range, then what else would request this? Can't we just decree that we don't summarize the partial range, period? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tomas Vondra wrote: > Unfortunately, I think we still have a problem ... I've been wondering > if we end up producing correct indexes, so I've done a simple test. Here's a proposed patch that should fix this problem (and does, in my testing). Would you please give it a try? This patch changes two things: 1. in VACUUM or brin_summarize_new_values, we only process fully loaded ranges, and ignore the partial range at end of table. 2. when summarization is requested on the partial range at the end of a table, we acquire extension lock on the rel, then compute relation size and run summarization with the lock held. This guarantees that we don't miss any pages. This is bad for concurrency though, so it's only done in that specific scenario. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From a760bfd44afeff8d1629599411ac7ce87acc7d26 Mon Sep 17 00:00:00 2001 From: Alvaro HerreraDate: Thu, 2 Nov 2017 18:35:10 +0100 Subject: [PATCH] Fix summarization concurrent with relation extension --- src/backend/access/brin/brin.c | 91 -- 1 file changed, 70 insertions(+), 21 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index b3aa6d1ced..7696c0700c 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -29,6 +29,7 @@ #include "postmaster/autovacuum.h" #include "storage/bufmgr.h" #include "storage/freespace.h" +#include "storage/lmgr.h" #include "utils/builtins.h" #include "utils/index_selfuncs.h" #include "utils/memutils.h" @@ -68,6 +69,10 @@ static BrinBuildState *initialize_brin_buildstate(Relation idxRel, static void terminate_brin_buildstate(BrinBuildState *state); static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange, double *numSummarized, double *numExisting); +static void determine_summarization_range(BlockNumber pageRange, + BlockNumber heapNumBlocks, + BlockNumber pagesPerRange, + BlockNumber *startBlk, BlockNumber *endBlk); static void form_and_insert_tuple(BrinBuildState *state); static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b); @@ -1253,30 +1258,16 @@ brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange, BlockNumber startBlk; BlockNumber endBlk; - /* determine range of pages to process; nothing to do for an empty table */ - heapNumBlocks = RelationGetNumberOfBlocks(heapRel); - if (heapNumBlocks == 0) - return; - revmap = brinRevmapInitialize(index, , NULL); - if (pageRange == BRIN_ALL_BLOCKRANGES) + /* determine range of pages to process */ + heapNumBlocks = RelationGetNumberOfBlocks(heapRel); + determine_summarization_range(pageRange, heapNumBlocks, pagesPerRange, + , ); + if (startBlk > heapNumBlocks) { - startBlk = 0; - endBlk = heapNumBlocks; - } - else - { - startBlk = (pageRange / pagesPerRange) * pagesPerRange; - /* Nothing to do if start point is beyond end of table */ - if (startBlk > heapNumBlocks) - { - brinRevmapTerminate(revmap); - return; - } - endBlk = startBlk + pagesPerRange; - if (endBlk > heapNumBlocks) - endBlk = heapNumBlocks; + brinRevmapTerminate(revmap); + return; } /* @@ -1287,9 +1278,31 @@ brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange, { BrinTuple *tup; OffsetNumber off; + boolext_lock_held = false; CHECK_FOR_INTERRUPTS(); + /* +* Whenever we're scanning a range that would go past what we know to +* be end-of-relation, we need to ensure we scan to the true end of the +* relation, or we risk missing tuples in recently added pages. To do +* this, we hold the relation extension lock from here till we're done +* summarizing, and compute the relation size afresh now. The logic in +* determine_summarization_range ensures that this is not done in the +* common cases of vacuum or brin_summarize_new_values(), but instead +* only when we're specifically asked to summarize the last range in +* the relation. +*/ + if (heapBlk + pagesPerRange >
Re: [HACKERS] initdb w/ restart
On 9/28/17 15:40, Jesper Pedersen wrote: > In a case of > > initdb /tmp/pgsql > > followed by > > pg_ctl -D /tmp/pgsql/ -l /tmp/logfile restart > > you'll get > > pg_ctl: PID file "/tmp/pgsql/postmaster.pid" does not exist > Is server running? > starting server anyway > pg_ctl: could not read file "/tmp/pgsql/postmaster.opts" > > The attached patch changes the message to "trying to start server > anyway" to highlight it is an attempt, not something that will happen. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
On Thu, Nov 2, 2017 at 9:05 PM, Nico Williamswrote: >> Overloading SET to handle both variables and GUCs seems likely to >> create problems, possibly including security problems. For example, >> maybe a security-definer function could leave behind variables to >> trick the calling code into failing to set GUCs that it intended to >> set. Or maybe creating a variable at the wrong time will just break >> things randomly. > > That's already true of GUCs, since there are no access controls on > set_config()/current_setting(). No, it isn't. Right now, SET always refers to a GUC, never a variable, so there's no possibility of getting confused about whether it's intending to change a GUC or an eponymous variable. Once you make SET able to change either one of two different kinds of objects, then that possibility does exist. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel Plans and Cost of non-filter functions
I'm working on a custom aggregate, that generates a serialized data format. The preparation of the geometry before being formatted is pretty intense, so it is probably a good thing for that work to be done in parallel, in partial aggregates. Here's an example SQL call: EXPLAIN analyze SELECT length(ST_AsMVT(a)) FROM ( SELECT ST_AsMVTGeom(p.geom, ::geometry_literal, 4096, 0, true), gid, fed_num FROM pts_10 p WHERE p.geom && ::geometry_literal AND p.geom IS NOT NULL ) a; The ST_AsMVTGeom() function can be comically expensive, it's really good when it's in partial aggregates. But the cost of the function seems to be ignored. (First note that, in order to consistently get parallel plans I have to brutally suppress parallel_tuple_cost, as described here http://blog.cleverelephant.ca/2017/10/parallel-postgis-2.html) Whether I get a parallel aggregate seems entirely determined by the number of rows, not the cost of preparing those rows. When changing the number of rows in the subquery, with a LIMIT, I can change from a seq scan to a paralllel seq scan and finally to a parallel aggregate, as the number of rows goes up. An odd effect: when I have enough rows to get a paralllel seq scan, I get flip it back to a seq scan, by *increasing* the cost of ST_AsMVTGeom. That seems odd and backwards. Is there anywhere a guide or rough description to how costs are used in determining parallel plans? The empirical approach starts to wear one down after a while :) P.
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Nov 2, 2017 at 10:26 PM, Peter Geogheganwrote: > On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas wrote: >> The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where >> I think things get a lot more dangerous. The problem (as Andres >> pointed out to me this afternoon) is that it seems possible to end up >> with a situation where there should be two HOT chains on a page, and >> because of the weakened xmin/xmax checking rules, we end up thinking >> that the second one is a continuation of the first one, which will be >> all kinds of bad news. That would be particularly likely to happen in >> releases from before we invented HEAP_XMIN_FROZEN, when there's no >> xmin/xmax matching at all, but could happen on later releases if we >> are extraordinarily unlucky (i.e. xmin of the first tuple in the >> second chain happens to be the same as the pre-freeze xmax in the old >> chain, probably because the same XID was used to update the page in >> two consecutive epochs). Fortunately, that commit is (I think) not >> released anywhere. > > FWIW, if you look at the second commit > (22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize > that it doesn't even treat those two cases differently. It was buggy > even on its own terms. The FrozenTransactionId test used an xmin from > HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin(). Oh, wow. You seem to be correct. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Tue, Oct 31, 2017 at 5:14 PM, Simon Riggswrote: > I've proposed a SQL Standard compliant implementation that would do > much more than be new syntax over what we already have. > > So these two claims aren't accurate: "radical difference" and "syntax > sugar over a capability we have". I think those claims are pretty accurate. The design of INSERT .. ON CONFLICT UPDATE and the supporting machinery only work if there is a unique index. It provides radically different behavior than what you get with any other DML statement, with completely novel concurrency behavior, and there is no reasonable way to emulate that behavior in cases where no relevant unique index exists. Therefore, if MERGE eventually uses INSERT .. ON CONFLICT UPDATE when a relevant unique index exists and does something else, such as your proposal of taking a strong lock, or Peter's proposal of doing this in a concurrency-oblivious manner, in other cases, then those two cases will behave very differently. And if, in the meantime, MERGE can only handle the cases where there is a unique index, then it can only handle the cases INSERT .. ON CONFLICT UPDATE can cover, which makes it, as far as I can see, syntactic sugar over what we already have. Maybe it's not entirely - you might be planning to make some minor functional enhancements - but it's not clear what those are, and I feel like whatever it is could be done with less work and more elegance by just extending the INSERT .. ON CONFLICT UPDATE syntax. And it does seem to be your intention to only handle the cases which the INSERT .. ON CONFLICT UPDATE infrastructure can cover, because upthread you wrote this: "I didn't say it but my intention was to just throw an ERROR if no single unique index can be identified." I don't think anybody's putting words into your mouth here. We're just reading what you wrote. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - use enum for meta commands
On Thu, 2 Nov 2017 17:50:52 +0100 (CET) Fabien COELHOwrote: > Updated version attached. . Here is the patch. Sorry for the noise. Everything alright. Patch is ready for commiter. -- Aleksandr Parfenov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Minor patch to correct symbol name in logs
On 9/20/17 01:56, Vaishnavi Prabakaran wrote: > Backend's lo_functions were renamed to avoid conflicting with libpq > prototypes in commit - 6fc547960dbe0b8bd6cefae5ab7ec3605a5c46fc. > > Logs inside those functions still refer to old symbol names, Here is the > small patch to update the same. I think those debug messages refer to the SQL name of the function, so they look correct to me as is. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: passwordcheck: Add test suite
On 9/15/17 00:20, Michael Paquier wrote: > On Fri, Sep 15, 2017 at 12:02 PM, Michael Paquier >wrote: >> On Fri, Sep 15, 2017 at 11:46 AM, Peter Eisentraut wrote: >>> passwordcheck: Add test suite >>> >>> Also improve one error message. >>> >>> Reviewed-by: David Steele >> >> Sorry for showing up late for this topic. >> +REGRESS_OPTS = --temp-config $(srcdir)/passwordcheck.conf >> +REGRESS = passwordcheck >> +# disabled because these tests require setting shared_preload_libraries >> +NO_INSTALLCHECK = 1 >> You could have avoided all that by just issuing "load >> 'passwordcheck';" at the beginning of the test. And you gain support >> for installcheck this way. > > In short you just need the attached ;) committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Nov 2, 2017 at 9:44 AM, Robert Haaswrote: > The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where > I think things get a lot more dangerous. The problem (as Andres > pointed out to me this afternoon) is that it seems possible to end up > with a situation where there should be two HOT chains on a page, and > because of the weakened xmin/xmax checking rules, we end up thinking > that the second one is a continuation of the first one, which will be > all kinds of bad news. That would be particularly likely to happen in > releases from before we invented HEAP_XMIN_FROZEN, when there's no > xmin/xmax matching at all, but could happen on later releases if we > are extraordinarily unlucky (i.e. xmin of the first tuple in the > second chain happens to be the same as the pre-freeze xmax in the old > chain, probably because the same XID was used to update the page in > two consecutive epochs). Fortunately, that commit is (I think) not > released anywhere. FWIW, if you look at the second commit (22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize that it doesn't even treat those two cases differently. It was buggy even on its own terms. The FrozenTransactionId test used an xmin from HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin(). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 2, 2017 at 8:28 AM, Craig Ringerwrote: > On 1 November 2017 at 01:55, Peter Geoghegan wrote: >> The problem here is: Iff the first statement uses ON CONFLICT >> infrastructure, doesn't the absence of WHEN NOT MATCHED imply >> different semantics for the remaining updates and deletes in the >> second version of the query? You've removed what seems like a neat >> adjunct to the MERGE, but it actually changes everything else too when >> using READ COMMITTED. > > Would these concerns be alleviated by adding some kind of Pg-specific > decoration that constrained concurrency-safe MERGEs? > > So your first statement would be > > MERGE CONCURRENTLY ... > > and when you removed the WHEN NOT MATCHED clause it'd ERROR because > that's no longer able to be done with the same concurrency-safe > semantics? > > I don't know if this would be helpful TBH, or if it would negate > Simon's compatibility goals. Just another idea. Yes, that fixes the problem. Of course, it also turns MERGE CONCURRENTLY into syntactic sugar for INSERT ON CONFLICT UPDATE, which brings one back to the question of exactly what we're trying to achieve here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - use enum for meta commands
Updated version attached. . Here is the patch. Sorry for the noise. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 5d8a01c..6bd3e52 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -373,11 +373,21 @@ typedef enum QueryMode static QueryMode querymode = QUERY_SIMPLE; static const char *QUERYMODE[] = {"simple", "extended", "prepared"}; +typedef enum MetaCommand +{ + META_NONE, + META_SET, + META_SETSHELL, + META_SHELL, + META_SLEEP +} MetaCommand; + typedef struct { char *line; /* text of command line */ int command_num; /* unique index of this Command struct */ int type; /* command type (SQL_COMMAND or META_COMMAND) */ + MetaCommand meta; /* meta command identifier, if appropriate */ int argc; /* number of command words */ char *argv[MAX_ARGS]; /* command word list */ PgBenchExpr *expr; /* parsed expression, if needed */ @@ -1721,6 +1731,26 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval } } +/* return meta-command enum identifier */ +static MetaCommand +getMetaCommand(const char * cmd) +{ + MetaCommand mc; + if (cmd == NULL) + mc = META_NONE; + else if (pg_strcasecmp(cmd, "set") == 0) + mc = META_SET; + else if (pg_strcasecmp(cmd, "setshell") == 0) + mc = META_SETSHELL; + else if (pg_strcasecmp(cmd, "shell") == 0) + mc = META_SHELL; + else if (pg_strcasecmp(cmd, "sleep") == 0) + mc = META_SLEEP; + else + mc = META_NONE; + return mc; +} + /* * Run a shell command. The result is assigned to the variable if not NULL. * Return true if succeeded, or false on error. @@ -2214,7 +2244,7 @@ doCustom(TState *thread, CState *st, StatsData *agg) fprintf(stderr, "\n"); } - if (pg_strcasecmp(argv[0], "sleep") == 0) + if (command->meta == META_SLEEP) { /* * A \sleep doesn't execute anything, we just get the @@ -2240,7 +2270,7 @@ doCustom(TState *thread, CState *st, StatsData *agg) } else { - if (pg_strcasecmp(argv[0], "set") == 0) + if (command->meta == META_SET) { PgBenchExpr *expr = command->expr; PgBenchValue result; @@ -2259,7 +2289,7 @@ doCustom(TState *thread, CState *st, StatsData *agg) break; } } - else if (pg_strcasecmp(argv[0], "setshell") == 0) + else if (command->meta == META_SETSHELL) { bool ret = runShellCommand(st, argv[1], argv + 2, argc - 2); @@ -2279,7 +2309,7 @@ doCustom(TState *thread, CState *st, StatsData *agg) /* succeeded */ } } - else if (pg_strcasecmp(argv[0], "shell") == 0) + else if (command->meta == META_SHELL) { bool ret = runShellCommand(st, NULL, argv + 1, argc - 1); @@ -3023,6 +3053,7 @@ process_sql_command(PQExpBuffer buf, const char *source) my_command = (Command *) pg_malloc0(sizeof(Command)); my_command->command_num = num_commands++; my_command->type = SQL_COMMAND; + my_command->meta = META_NONE; initSimpleStats(_command->stats); /* @@ -3091,7 +3122,9 @@ process_backslash_command(PsqlScanState sstate, const char *source) my_command->argv[j++] = pg_strdup(word_buf.data); my_command->argc++; - if (pg_strcasecmp(my_command->argv[0], "set") == 0) + my_command->meta = getMetaCommand(my_command->argv[0]); + + if (my_command->meta == META_SET) { /* For \set, collect var name, then lex the expression. */ yyscan_t yyscanner; @@ -3146,7 +3179,7 @@ process_backslash_command(PsqlScanState sstate, const char *source) expr_scanner_offset(sstate), true); - if (pg_strcasecmp(my_command->argv[0], "sleep") == 0) + if (my_command->meta == META_SLEEP) { if (my_command->argc < 2) syntax_error(source, lineno, my_command->line, my_command->argv[0], @@ -3187,13 +3220,13 @@ process_backslash_command(PsqlScanState sstate, const char *source) my_command->argv[2], offsets[2] - start_offset); } } - else if (pg_strcasecmp(my_command->argv[0], "setshell") == 0) + else if (my_command->meta == META_SETSHELL) { if (my_command->argc < 3) syntax_error(source, lineno, my_command->line, my_command->argv[0], "missing argument", NULL, -1); } - else if (pg_strcasecmp(my_command->argv[0], "shell") == 0) + else if (my_command->meta == META_SHELL) { if (my_command->argc < 2) syntax_error(source, lineno, my_command->line, my_command->argv[0], @@ -3201,6 +3234,7 @@ process_backslash_command(PsqlScanState sstate, const char *source) } else { + /* my_command->meta == META_NONE */ syntax_error(source, lineno, my_command->line, my_command->argv[0], "invalid command", NULL, -1); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - use enum for meta commands
The only thing I'm not quite sure about is a comment "which meta command ...". Maybe it's better to write it without question word, something like "meta command identifier..."? Ok. I agree. Updated version attached. I also added a const on a function parameter. Just a note about the motivation: I want to add the same "\if" syntax added to psql, but it requires to look at the meta command in a number of places to manage the automaton status, and the strcmp solution looked both ugly and inefficient. So this small refactoring is just a preliminary to the "\if" patch, some day, after this one get committed, if it gets committed. The new status of this patch is: Ready for Committer Thanks for the review. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing wal_keep_segments as default configuration in PostgresNode.pm
On Thu, Nov 2, 2017 at 4:47 PM, Peter Eisentrautwrote: > On 9/11/17 21:55, Michael Paquier wrote: >> I tend to think that while all the other parameters make sense to >> deploy instances that need few resources, wal_keep_segments may cause >> up to 350MB of WAL segments to be kept in each pg_wal's instance, >> while max_wal_size is set at 128MB. The only test in the code tree in >> need of wal_keep_segments is actually pg_rewind, which enforces >> checkpoints after the rewind to update the source's control file. >> >> So, thoughts about the attached that reworks this portion of PostgresNode.pm? > > Committed. > > Besides the resource usage, it would probably be bad if a > wal_keep_segments setting papered over problems with replication slots > for example. Thanks! I almost forgot this patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Nov 2, 2017 at 4:20 AM, Andres Freundwrote: > I think the problem is on the pruning, rather than the freezing side. We > can't freeze a tuple if it has an alive predecessor - rather than > weakining this, we should be fixing the pruning to not have the alive > predecessor. Excellent catch. > If the update xmin is actually below the cutoff we can remove the tuple > even if there's live lockers - the lockers will also be present in the > newer version of the tuple. I verified that for me that fixes the > problem. Obviously that'd require some comment work and more careful > diagnosis. I didn't even know that that was safe. > I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in > the face of previously corrupted tuple chains and pg_upgraded clusters - > it can lead to tuples being considered related, even though they they're > from entirely independent hot chains. Especially when upgrading 9.3 post > your fix, to current releases. Frankly, I'm relieved that you got to this. I was highly suspicious of a5736bf754c82d8b86674e199e232096c679201d, even beyond my specific, actionable concern about how it failed to handle the 9.3/FrozenTransactionId xmin case as special. As I went into in the "heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug" thread, these commits left us with a situation where there didn't seem to be a reliable way of knowing whether or not it is safe to interrogate clog for a given heap tuple using a tool like amcheck. And, it wasn't obvious that you couldn't have a codepath that failed to account for pre-cutoff non-frozen tuples -- codepaths that call TransactionIdDidCommit() despite it actually being unsafe. If I'm not mistaken, your proposed fix restores sanity there. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing wal_keep_segments as default configuration in PostgresNode.pm
On 9/11/17 21:55, Michael Paquier wrote: > I tend to think that while all the other parameters make sense to > deploy instances that need few resources, wal_keep_segments may cause > up to 350MB of WAL segments to be kept in each pg_wal's instance, > while max_wal_size is set at 128MB. The only test in the code tree in > need of wal_keep_segments is actually pg_rewind, which enforces > checkpoints after the rewind to update the source's control file. > > So, thoughts about the attached that reworks this portion of PostgresNode.pm? Committed. Besides the resource usage, it would probably be bad if a wal_keep_segments setting papered over problems with replication slots for example. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Nov 2, 2017 at 8:25 PM, Alvaro Herrerawrote: > Pushed the reverts. > > I noticed while doing so that REL_10_STABLE contains the bogus commits. > Does that change our opinion regarding what to do for people upgrading > to a version containing the broken commits? I don't think so, because > > 1) we hope that not many people will trust their data to 10.0 > immediately after release > 2) the bug is very low probability > 3) it doesn't look like we can do a lot about it anyway. Just to be clear, it looks like "Fix freezing of a dead HOT-updated tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0 was stamped, but "Fix traversal of half-frozen update chains" (22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is therefore unreleased at present. Users of 10.0 who hit the code introduced by 46c35116ae1acc8826705ef2a7b5d9110f9d6e84 will have XIDs stored in the xmax fields of tuples that predate relfrozenxid. Those tuples will be hinted-committed. That's not good, but it might not really have much in the way of consequences. *IF* the next VACUUM doesn't get confused by the old XID, then it will prune the tuple then and I think we'll be OK. And I think it won't, because it should just call HeapTupleSatisfiesVacuum() and that should see that HEAP_XMAX_COMMITTED is set and not actually try to consult the old CLOG. If that hint bit can ever get lost - or fail to propagate to a standby - then we have more trouble, but the fact that it's set by a logged operation makes me hope that can't happen. Furthermore, that follow-on VACUUM should indeed arrive in due time, because we will not have marked the page all-visible -- HeapTupleSatisfiesVacuum() will NOT have returned HEAPTUPLE_LIVE when called from lazy_scan_heap(), and therefore we will have set all_visible = false. The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where I think things get a lot more dangerous. The problem (as Andres pointed out to me this afternoon) is that it seems possible to end up with a situation where there should be two HOT chains on a page, and because of the weakened xmin/xmax checking rules, we end up thinking that the second one is a continuation of the first one, which will be all kinds of bad news. That would be particularly likely to happen in releases from before we invented HEAP_XMIN_FROZEN, when there's no xmin/xmax matching at all, but could happen on later releases if we are extraordinarily unlucky (i.e. xmin of the first tuple in the second chain happens to be the same as the pre-freeze xmax in the old chain, probably because the same XID was used to update the page in two consecutive epochs). Fortunately, that commit is (I think) not released anywhere. Personally, I think it would be best to push the release out a week. I think we understand this well enough now that we can fix it relatively easily, but haste makes bugs, and (I know you're all tired of hearing me say this) patches that implicate the on-disk format are scary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrerawrites: > Tom Lane wrote: >> Where are we on this --- do you want me to push the brin_doupdate >> fix I proposed, or were you intending to merge that into a >> larger patch? > Please push your fixes, I'll post my proposed patch for the other bug > afterwards; they are unrelated problems after all. OK, I was not sure if you considered them related or not. Will push. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > Alvaro Herrerawrites: > >> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. > >> But I see this misbehavior too. Looking ... > > > Turns out that this is related to concurrent growth of the table while > > the summarization process is scanning -- so new pages have appeared at > > the end of the table after the end point has been determined. It would > > be a pain to determine number of blocks for each range, so I'm looking > > for a simple way to fix it without imposing so much overhead. > > Where are we on this --- do you want me to push the brin_doupdate > fix I proposed, or were you intending to merge that into a > larger patch? Please push your fixes, I'll post my proposed patch for the other bug afterwards; they are unrelated problems after all. If you prefer me to push your fixes, I can do that -- let me know. > If I'm to do it, is there a reason not to back-patch to all branches > with BRIN? No, all these fixes should go back to 9.5. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrerawrites: >> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. >> But I see this misbehavior too. Looking ... > Turns out that this is related to concurrent growth of the table while > the summarization process is scanning -- so new pages have appeared at > the end of the table after the end point has been determined. It would > be a pain to determine number of blocks for each range, so I'm looking > for a simple way to fix it without imposing so much overhead. Where are we on this --- do you want me to push the brin_doupdate fix I proposed, or were you intending to merge that into a larger patch? If I'm to do it, is there a reason not to back-patch to all branches with BRIN? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Minor comment issue in receivelog.c
Please find a minor comment fix for receivelog.c, HandleCopyStream(). The comments talks about a START_STREAMING command, but i think START_REPLICATION is what's meant here. Bernd diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c index 888458f4a9..befbbb092b 100644 --- a/src/bin/pg_basebackup/receivelog.c +++ b/src/bin/pg_basebackup/receivelog.c @@ -763,7 +763,7 @@ ReadEndOfStreamingResult(PGresult *res, XLogRecPtr *startpos, uint32 *timeline) /* * The main loop of ReceiveXlogStream. Handles the COPY stream after - * initiating streaming with the START_STREAMING command. + * initiating streaming with the START_REPLICATION command. * * If the COPY ends (not necessarily successfully) due a message from the * server, returns a PGresult and sets *stoppos to the last byte written. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Document pgstattuple privileges without ambiguity
On 8/21/17 03:47, Feike Steenbergen wrote: > When installing pgstattuple on 10, the documentation about its > privileges was unclear to me. (Does the pg_stat_scan_tables role get > EXECUTE privileges by default or not?). I agree that this has gotten a bit confusing after apparently being patched around a bit recently. I have rewritten it a bit to make it clearer. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
2017-11-02 13:35 GMT+01:00 Robert Haas: > On Thu, Oct 26, 2017 at 12:51 PM, Pavel Stehule > wrote: > > The variables can be modified by SQL command SET (this is taken from > > standard, and it natural) > > > > SET varname = expression; > > Overloading SET to handle both variables and GUCs seems likely to > create problems, possibly including security problems. For example, > maybe a security-definer function could leave behind variables to > trick the calling code into failing to set GUCs that it intended to > set. Or maybe creating a variable at the wrong time will just break > things randomly. > The syntax CREATE OR REPLACE FUNCTION xxx $$ ... $$ SET GUC=, ... is always related only to GUC. So there should not be any security risk. It is another reason why GUC and variables should be separated. I know so there is risk of possibility of collision. There are two possibilities a) use different keyword - but it is out of SQL/PSM and out of another databases. b) detect possible collision and raise error when assignment is ambiguous. I am thinking about similar solution used in plpgsql, where is a possibility of collision between SQL identifier and plpgsql variable. Regards Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [HACKERS] proposal: schema variables
Nico Williamswrites: > With access controls, GUCs could become schema variables, and settings > from postgresql.conf could move into the database itself (which I think > would be nice). People re-propose some variant of that every so often, but it never works, because it ignores the fact that some of the GUCs' values are needed before you can access system catalogs at all, or in places where relying on system catalog access would be a bad idea. Sure, we could have two completely different configuration mechanisms so that some of the variables could be "inside the database", but that doesn't seem like a net improvement to me. The point of the Grand Unified Configuration mechanism was to be unified, after all. I'm on board with having a totally different mechanism for session variables. The fact that people have been abusing GUC to store user-defined variables doesn't make it a good way to do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
On 2017-11-02 10:38:34 -0400, Tom Lane wrote: > David Rowleywrites: > > On 3 November 2017 at 03:17, Tom Lane wrote: > >> We've jacked up the List API and driven a new implementation underneath > >> once before. Maybe it's time to do that again. > > > Maybe, but the new implementation is not going to do well with places > > where we perform lcons(). Probably many of those places could be > > changed to lappend(), but I bet there's plenty that need prepend. > > [ shrug... ] To me, that means this implementation isn't necessarily > the right solution. > > It seems to me that a whole lot of the complaints about this could be > resolved simply by improving the List infrastructure to allocate ListCells > in batches. That would address the question of "too much palloc traffic" > and greatly improve the it-accesses-all-over-memory situation too. It'll however not solve the problem that pointer chained datastructures have very poor access patterns. > Possibly there are more aggressive changes that could be implemented > without breaking too many places, but I think it would be useful to > start there and see what it buys us. I'm suspicious that that will result in causing pain twice, once doing the compromise thing, and once doing it right. FWIW, I'd started pursuing converting a lot of executor uses of lists with arrays. But everywhere I did, particularly the uses inside the expression machinery, I noticed that larger changes where needed. For the expression stuff we needed a much larger architectural change (hence the stuff in 10 I did with Tom's help). The other big user of lists at the execution phase are targetlists - I think there the architectural change is also bigger: Rebuilding/analyzing all the targetlists, tupledescs etc at execution time is the architectural problem, not the use of lists itself (although that's also bad). The amount of cycles spent in ExecTypeFromTL() etc and friends is partially caused by the use of lists, but the real problem is that we're doing all that work in the first place. So I personally think most uses of lists at execution time are archetecturally wrong, therefore optimizing their implementation is just a bandaid. Where I think the arguments for a more optimized implementation are better is the parser, where we really don't ahead of time know the size of the data we're dealing with. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
2017-11-02 16:07 GMT+01:00 Craig Ringer: > On 26 October 2017 at 15:21, Pavel Stehule > wrote: > > Hi, > > > > I propose a new database object - a variable. > > Didn't we have a pretty long discussion about this already in > > Yeah. > > https://www.postgresql.org/message-id/flat/CAMsr%2BYF0G8_ > FehQyFS8gSfnEer9OPsMOvpfniDJOVGQzJzHzsw%40mail.gmail.com#CAMsr+YF0G8_ > fehqyfs8gsfneer9opsmovpfnidjovgqzjzh...@mail.gmail.com > > It'd be nice if you summarised any outcomes from that and addressed > it, rather than taking this as a new topic. > I am sorry. This thread follow mentioned and I started with small recapitulation. Regards Pavel > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] proposal: schema variables
2017-11-02 16:35 GMT+01:00 Nico Williams: > On Thu, Nov 02, 2017 at 06:05:54PM +0530, Robert Haas wrote: > > On Thu, Oct 26, 2017 at 12:51 PM, Pavel Stehule > wrote: > > > The variables can be modified by SQL command SET (this is taken from > > > standard, and it natural) > > > > > > SET varname = expression; > > > > Overloading SET to handle both variables and GUCs seems likely to > > create problems, possibly including security problems. For example, > > maybe a security-definer function could leave behind variables to > > trick the calling code into failing to set GUCs that it intended to > > set. Or maybe creating a variable at the wrong time will just break > > things randomly. > > That's already true of GUCs, since there are no access controls on > set_config()/current_setting(). > > Presumably "schema variables" would really just be GUC-like and not at > all like lexically scoped variables. And also subject to access > controls, thus an overall improvement on set_config()/current_setting(). > > With access controls, GUCs could become schema variables, and settings > from postgresql.conf could move into the database itself (which I think > would be nice). > I am sorry, but I don't plan it. the behave of GUC is too different than behave of variables. But I am planning so system GUC can be "moved" to pg_catalog to be possibility to specify any object exactly. Regards Pavel > > Nico > -- >
Re: [HACKERS] proposal: schema variables
On Thu, Nov 02, 2017 at 06:05:54PM +0530, Robert Haas wrote: > On Thu, Oct 26, 2017 at 12:51 PM, Pavel Stehule> wrote: > > The variables can be modified by SQL command SET (this is taken from > > standard, and it natural) > > > > SET varname = expression; > > Overloading SET to handle both variables and GUCs seems likely to > create problems, possibly including security problems. For example, > maybe a security-definer function could leave behind variables to > trick the calling code into failing to set GUCs that it intended to > set. Or maybe creating a variable at the wrong time will just break > things randomly. That's already true of GUCs, since there are no access controls on set_config()/current_setting(). Presumably "schema variables" would really just be GUC-like and not at all like lexically scoped variables. And also subject to access controls, thus an overall improvement on set_config()/current_setting(). With access controls, GUCs could become schema variables, and settings from postgresql.conf could move into the database itself (which I think would be nice). Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Client Connection redirection support for PostgreSQL
On 2 November 2017 at 14:02, Satyanarayana Narlapuramwrote: > Proposal: > > Add the ability to the PostgreSQL server instance to route the traffic to a > different server instance based on the rules defined in server’s pg_bha.conf > configuration file. At a high level this enables offloading the user > requests to a different server instance based on the rules defined in the > pg_hba.conf configuration file. pg_hba.conf is "host based access [control]" . I'm not sure it's really the right place. > Some of the interesting scenarios this > enables include but not limited to - rerouting traffic based on the client > hosts, users, database, etc. specified, redirecting read-only query traffic > to the hot stand by replicas, and in multi-master scenarios. When this has come up before, one of the issues has been determining what exactly should constitute "read only" vs "read write" for the purposes of redirecting work. There are a bunch of issues there. If you're doing "read only" txns and then do something "read write" and get redirected, the destination doesn't have your prepared statements, any WITH HOLD cursors, temp tables, etc you were working with. Strangeness ensues. But we now have a session-intent stuff though. So we could possibly do it at session level. Backends used just for a redirect would be pretty expensive though. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
On 26 October 2017 at 15:21, Pavel Stehulewrote: > Hi, > > I propose a new database object - a variable. Didn't we have a pretty long discussion about this already in Yeah. https://www.postgresql.org/message-id/flat/CAMsr%2BYF0G8_FehQyFS8gSfnEer9OPsMOvpfniDJOVGQzJzHzsw%40mail.gmail.com#camsr+yf0g8_fehqyfs8gsfneer9opsmovpfnidjovgqzjzh...@mail.gmail.com It'd be nice if you summarised any outcomes from that and addressed it, rather than taking this as a new topic. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Custom compression methods
On 2 November 2017 at 17:41, Ildus Kurbangalievwrote: > In this patch compression methods is suitable for MAIN and EXTENDED > storages like in current implementation in postgres. Just instead only > of LZ4 you can specify any other compression method. We've had this discussion before. Please read the "pluggable compression support" thread. See you in a few days ;) sorry, it's kinda long. https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.ga12...@alap2.anarazel.de IIRC there were some concerns about what happened with pg_upgrade, with consuming precious toast bits, and a few other things. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
On 3 November 2017 at 03:27, Stephen Frostwrote: > * David Rowley (david.row...@2ndquadrant.com) wrote: >> We could get around a few of these problems if Lists were implemented >> internally as arrays, however, arrays are pretty bad if we want to >> delete an item out the middle as we'd have to shuffle everything over >> one spot. Linked lists are much better here... at least they are when >> you already have the cell you want to remove... so we can't expect >> that we can just rewrite List to use arrays instead. > > This actually depends on just how you delete the item out of the array, > though there's a trade-off there. If you "delete" the item by marking > it as "dead" then you don't need to re-shuffle everything, but, of > course, then you have to make sure to 'skip' over those by running down > the array for each list_nth() call- but here's the thing, with a small > list that's all in a tighly packed array, that might not be too bad. > Certainly far better than having to grab random memory on each step and > I'm guessing you'd only actually store the "data" item for a given list > member in the array if it's a integer/oid/etc list, not when it's > actually a pointer being stored in the list, so you're always going to > be working with pretty tightly packed arrays where scanning the list on > the list_nth call really might be darn close to as fast as using an > offset to the actual entry in the array, unless it's a pretty big list, > but I don't think we've actually got lots of multi-hundred-deep lists. I was hoping to have list_nth as always O(1) so that we'd never be tempted again to use arrays directly like we are not for things like simle_rel_array[]. Probably it could be made to work quickly in most cases by tracking if there are any deleted items and just do the direct array lookups if nothing is deleted, and if something is deleted then visit index n minus bms_num_members() of some deleted bitmapset for the bits earlier than the Nth element. bms_num_members() could be made faster with 64bit bitmap words and __builtin_popcountl() falling back on the number_of_ones[] array when not available. Would also require a bms_num_members_before() function. In the implementation that I attached, I showed an example of how to delete items out an array list, which I think is quite a bit cleaner than how we generally do it today with List. (e.g in reconsider_outer_join_clauses()) However, it'll still perform badly when just a single known item is being removed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > Andres Freundwrites: > > > Do we care about people upgrading to unreleased versions? We could do > > > nothing, document it in the release notes, or ??? > > > > Do nothing. > > Agreed. Not much we can do there. Pushed the reverts. I noticed while doing so that REL_10_STABLE contains the bogus commits. Does that change our opinion regarding what to do for people upgrading to a version containing the broken commits? I don't think so, because 1) we hope that not many people will trust their data to 10.0 immediately after release 2) the bug is very low probability 3) it doesn't look like we can do a lot about it anyway. I'll experiment with Andres' proposed fix now. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
David Rowleywrites: >> It seems to me that a whole lot of the complaints about this could be >> resolved simply by improving the List infrastructure to allocate ListCells >> in batches. That would address the question of "too much palloc traffic" >> and greatly improve the it-accesses-all-over-memory situation too. >> >> Possibly there are more aggressive changes that could be implemented >> without breaking too many places, but I think it would be useful to >> start there and see what it buys us. > Yeah, certainly would be a good way of determining the worth of changing. BTW, with just a little more work that could be made to address the list-nth-is-expensive problem too. I'm imagining a call that preallocates an empty list with room for N ListCells (or perhaps, if we need to preserve compatibility with NIL == NULL, creates a single-element list with room for N-1 more cells), plus some tracking in list.c of whether the list cells have been consumed in order. Given the typical use-case of building lists strictly with lappend, list_nth() could index directly into the ListCell slab as long as it saw the List header's "is_linear" flag was true. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
On 3 November 2017 at 03:38, Tom Lanewrote: > David Rowley writes: >> On 3 November 2017 at 03:17, Tom Lane wrote: >>> We've jacked up the List API and driven a new implementation underneath >>> once before. Maybe it's time to do that again. > >> Maybe, but the new implementation is not going to do well with places >> where we perform lcons(). Probably many of those places could be >> changed to lappend(), but I bet there's plenty that need prepend. > > [ shrug... ] To me, that means this implementation isn't necessarily > the right solution. True, of course, could be worked around probably with some bit flags to record if lcons was called, and then consume the elements in reverse. We might have to shuffle things along a bit for Lists that get lcons and lappend calls. The API break that I can't see a way around is: /* does the list have 0 elements? */ if (list == NIL) ... That would need to be rewritten as: if (list_length(list) == 0) ... > It seems to me that a whole lot of the complaints about this could be > resolved simply by improving the List infrastructure to allocate ListCells > in batches. That would address the question of "too much palloc traffic" > and greatly improve the it-accesses-all-over-memory situation too. > > Possibly there are more aggressive changes that could be implemented > without breaking too many places, but I think it would be useful to > start there and see what it buys us. Yeah, certainly would be a good way of determining the worth of changing. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
David Rowleywrites: > On 3 November 2017 at 03:17, Tom Lane wrote: >> We've jacked up the List API and driven a new implementation underneath >> once before. Maybe it's time to do that again. > Maybe, but the new implementation is not going to do well with places > where we perform lcons(). Probably many of those places could be > changed to lappend(), but I bet there's plenty that need prepend. [ shrug... ] To me, that means this implementation isn't necessarily the right solution. It seems to me that a whole lot of the complaints about this could be resolved simply by improving the List infrastructure to allocate ListCells in batches. That would address the question of "too much palloc traffic" and greatly improve the it-accesses-all-over-memory situation too. Possibly there are more aggressive changes that could be implemented without breaking too many places, but I think it would be useful to start there and see what it buys us. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - use enum for meta commands
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi, Looks good to me. The only thing I'm not quite sure about is a comment "which meta command ...". Maybe it's better to write it without question word, something like "meta command identifier..."? The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
David, * David Rowley (david.row...@2ndquadrant.com) wrote: > Our List implementation internally uses linked lists which are > certainly good for some things, but pretty bad at other things. Linked > lists are pretty bad when you want to fetch the Nth element, as it > means looping ever each previous element until you get to the Nth. > They're also pretty bad for a modern CPU due to their memory access > pattern. I can certainly understand this complaint. > We could get around a few of these problems if Lists were implemented > internally as arrays, however, arrays are pretty bad if we want to > delete an item out the middle as we'd have to shuffle everything over > one spot. Linked lists are much better here... at least they are when > you already have the cell you want to remove... so we can't expect > that we can just rewrite List to use arrays instead. This actually depends on just how you delete the item out of the array, though there's a trade-off there. If you "delete" the item by marking it as "dead" then you don't need to re-shuffle everything, but, of course, then you have to make sure to 'skip' over those by running down the array for each list_nth() call- but here's the thing, with a small list that's all in a tighly packed array, that might not be too bad. Certainly far better than having to grab random memory on each step and I'm guessing you'd only actually store the "data" item for a given list member in the array if it's a integer/oid/etc list, not when it's actually a pointer being stored in the list, so you're always going to be working with pretty tightly packed arrays where scanning the list on the list_nth call really might be darn close to as fast as using an offset to the actual entry in the array, unless it's a pretty big list, but I don't think we've actually got lots of multi-hundred-deep lists. Of course, you'd have to watch out for cases where there's a lot of deletes happening, since that would effectively make this list longer. Anyway, just something to consider. Having to actually store the "delete" flag would have some cost to it also, unless you happen to already be storing something per entry and have an extra bit to spare. Others who are more versed in CPU magics than I am are certainly welcome to comment on if they think this makes sense to consider, I'm no CPU architecture guru. > Anyway, please don't debate the usages of the new type here. As for > all the above plans, I admit to not having a full handle on them yet. +1. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] ArrayLists instead of List (for some things)
On 2 November 2017 at 22:22, David Rowleywrote: > On 3 November 2017 at 03:17, Tom Lane wrote: >> We've jacked up the List API and driven a new implementation underneath >> once before. Maybe it's time to do that again. > > Maybe, but the new implementation is not going to do well with places > where we perform lcons(). Probably many of those places could be > changed to lappend(), but I bet there's plenty that need prepend. Yeah, and it's IMO significant that pretty much every nontrivial system with ADTs offers multiple implementations of list data structures, often wrapped with a common API. Java's Collections, the STL, you name it. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
On 2 November 2017 at 22:17, Tom Lanewrote: > David Rowley writes: >> Comments on the design are welcome, but I was too late to the >> commitfest, so there are other priorities. However, if you have a >> strong opinion, feel free to voice it. > > I do not like replacing Lists piecemeal; that's a recipe for ongoing > API breakage and back-patching pain. Plus we'll then have *four* > different linked-list implementations in the backend, which sure > seems like too many. > > We've jacked up the List API and driven a new implementation underneath > once before. Maybe it's time to do that again. I know some systems use hybrid linked array-lists, where linked list cells are multi-element. https://en.wikipedia.org/wiki/Unrolled_linked_list I don't have much experience with them myself. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
On 3 November 2017 at 03:17, Tom Lanewrote: > We've jacked up the List API and driven a new implementation underneath > once before. Maybe it's time to do that again. Maybe, but the new implementation is not going to do well with places where we perform lcons(). Probably many of those places could be changed to lappend(), but I bet there's plenty that need prepend. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
David Rowleywrites: > Comments on the design are welcome, but I was too late to the > commitfest, so there are other priorities. However, if you have a > strong opinion, feel free to voice it. I do not like replacing Lists piecemeal; that's a recipe for ongoing API breakage and back-patching pain. Plus we'll then have *four* different linked-list implementations in the backend, which sure seems like too many. We've jacked up the List API and driven a new implementation underneath once before. Maybe it's time to do that again. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Andres Freundwrites: > > Do we care about people upgrading to unreleased versions? We could do > > nothing, document it in the release notes, or ??? > > Do nothing. Agreed. Not much we can do there. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] ArrayLists instead of List (for some things)
Hackers, Our List implementation internally uses linked lists which are certainly good for some things, but pretty bad at other things. Linked lists are pretty bad when you want to fetch the Nth element, as it means looping ever each previous element until you get to the Nth. They're also pretty bad for a modern CPU due to their memory access pattern. We could get around a few of these problems if Lists were implemented internally as arrays, however, arrays are pretty bad if we want to delete an item out the middle as we'd have to shuffle everything over one spot. Linked lists are much better here... at least they are when you already have the cell you want to remove... so we can't expect that we can just rewrite List to use arrays instead. I've attached a first cut implementation of "AList". I was originally going to call it "ArrayList", but my fingers were getting tired, so I change it. This first cut version replaces nothing in the code base to use ALists. This email (and due to the timing of it) is more of a marker that this is being worked on. I know in particular that Andres has complained at least once about List usages in the executor, which I agree is not great. Looking at the usage of list_nth() is quite scary! My plans for this include: * Make targetlists from createplan onwards use ALists. Every time I look at build_path_tlist() I gawk at the number of palloc calls that are going on inside those lappend() calls. We already know how many items will be in the list, so with AList we could just alist_premake(list_length(path->pathtarget->exprs)) and we'd never have to palloc() another element for that list again. We do the same again in various mutator functions in setrefs.c too! * list_nth usage in adjust_appendrel_attrs_mutator() to speed up translation between parent and child Vars * And also, umm, simple_rte_array and simple_rel_array. Well, we'd still need somewhere to store all the RelOptInfos, but the idea is that it would be rather nice if we could not expand the entire inheritance hierarchy of a relation, and it would be rather nice if we could just add the partitions that we need, rather than add them all and setting dummy paths for the ones we're not interested in. Anyway, please don't debate the usages of the new type here. As for all the above plans, I admit to not having a full handle on them yet. I wrote the Array List implementation so I could get a better idea on how possible each of those would be, so, for now, to prevent any duplicate work, here it is... (Including in Andres because I know he's mentioned his dislike for List in the executor) Comments on the design are welcome, but I was too late to the commitfest, so there are other priorities. However, if you have a strong opinion, feel free to voice it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Basic-implementation-of-array-lists-AList.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Andres Freundwrites: > Do we care about people upgrading to unreleased versions? We could do > nothing, document it in the release notes, or ??? Do nothing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Client Connection redirection support for PostgreSQL
On Thu, Nov 02, 2017 at 06:02:43AM +, Satyanarayana Narlapuram wrote: > Proposal: > Add the ability to the PostgreSQL server instance to route the > traffic to a different server instance based on the rules defined in > server's pg_bha.conf configuration file. At a high level this > enables offloading the user requests to a different server instance > based on the rules defined in the pg_hba.conf configuration file. > Some of the interesting scenarios this enables include but not > limited to - rerouting traffic based on the client hosts, users, > database, etc. specified, redirecting read-only query traffic to the > hot stand by replicas, and in multi-master scenarios. What advantages do you see in doing this in the backend over the current system where the concerns are separated, i.e. people use connection poolers like pgbouncer to do the routing? > The rules to route the traffic will be provided in the pg_hba.conf > file. The proposal is to add a new optional field 'RoutingList' to > the record format. The RoutingList contains comma-seperated list of > one or more servers that can be routed the traffic to. In the > absence of this new field there is no change to the current login > code path for both the server and the client. RoutingList can be > updated for each new connection to balance the load across multiple > server instances > RoutingList format: > server_address1:port, server_address2:port... Would it make sense also to include an optional routing algorithm or pointer to a routing function for each RoutingList, or do you see this as entirely the client's responsibility? > The message flow > > 1. Client connects to the server, and server accepts the connections How does this work with SSL? > 2. Client sends the startup message > 3. Server looks at the rules configured in the pg_hba.conf file and > * If the rule matches redirection >i. Send a special message with the RoutingList described above >ii. Server disconnects > > * If the rule doesn't have RoutingList defined > >i. Server proceeds in the existing code path and sends auth request > > 1. Client gets the list of addresses and attempts to connect to a > server in the list provided until the first successful connections > is established or the list is exhausted. If the client can't > connect to any server instance on the RoutingList, client reports > the login failure message. > > Backward compatibility: > There are a few ways to provide the backward compatibility, and each > approach has their own advantages and disadvantage and are listed > below > > 1. Bumping the protocol version - old server instances may not > understand the new client protocol This sounds more attractive, assuming that the feature is. > 2. Adding additional optional parameter routing_enabled, without > bumping the protocol version. In this approach, old Postgres > server instances may not understand this and fail the connections. Silently changing an API without bumping the protocol version seems like a bad plan, even when it's an additive change as you propose here. > 3. The current proposal - to keep it in the hba.conf and let the > server admin deal with the configuration by taking conscious > choice on the configuration of routing list based on the clients > connecting to the server instance. How would clients identify themselves as eligible without a protocol version bump? > Backward compatibility scenarios: > > * The feature is not usable for the existing clients, and the > new servers shouldn't set the routing list if they expect any > connections from the legacy clients. We should do either (1) or > (2) in the above list to achieve this. Otherwise need to rely on > the admin to take care of the settings. > * For the new client connecting to the old server, there is no > change in the message flow So to DoS the server, what's required is a flock of old clients? I presume there's a good reason to reroute rather than serve these requests. > * For the new clients to the new server, the message flow will be based > on the routing list filed in the configuration. > This proposal is in very early stage, comments and feedback is very much > appreciated. Comments and feedback have begun. Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add two-arg for of current_setting(NAME, FALLBACK)
On 11/1/17 14:02, Nico Williams wrote: > There already _is_ a two-argument form of current_setting() that yours > somewhat conflicts with: > >current_setting(setting_name [, missing_ok ]) > > https://www.postgresql.org/docs/current/static/functions-admin.html > > I often use > > coalesce(current_setting(setting_name, true), default_value_here) > > as an implementation of current_setting() with a default value. That appears to address this use case then. Do we need the new proposed variant still? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] list of credits for release notes
On 10/3/17 12:31, Dang Minh Huong wrote: > Sorry but please change "Huong Dangminh"(my name) to "Dang Minh Huong". done -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Hi, On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote: > Andres Freund wrote: > > I think the problem is on the pruning, rather than the freezing side. We > > can't freeze a tuple if it has an alive predecessor - rather than > > weakining this, we should be fixing the pruning to not have the alive > > predecessor. > > I gave a look at HTSV back then, but I didn't find what the right tweak > was, but then I only tried changing the return value to DEAD and > DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based > on OldestXmin didn't occur to me ... I was thinking that the fact that > there were live lockers meant that the tuple could not be removed, > obviously failing to notice that the subsequent versions of the tuple > would be good enough. I'll try to write up a commit based on that idea. I think there's some comment work needed too, Robert and I were both confused by a few things. I'm unfortunately travelling atm - it's evening here, and I'll flying back to the US all Saturday. I'm fairly sure I'll be able to come up with a decent patch tomorrow, but I'll need review and testing by others. > > If the update xmin is actually below the cutoff we can remove the tuple > > even if there's live lockers - the lockers will also be present in the > > newer version of the tuple. I verified that for me that fixes the > > problem. Obviously that'd require some comment work and more careful > > diagnosis. > > Sounds good. > > I'll revert those commits then, keeping the test, and then you can > commit your change. OK? Generally that sounds good - but you can't keep the testcase in without the new fix - the buildfarm would immediately turn red. I guess the best thing would be to temporarily remove it from the schedule? Do we care about people upgrading to unreleased versions? We could do nothing, document it in the release notes, or ??? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Andres Freund wrote: > I spent some time discussing this with Robert today (with both of us > alternating between feeling the other and ourselves as stupid), and the > conclusion I think is that the problem is on the pruning, rather than > the freezing side. Thanks both for spending some more time on this. > I think the problem is on the pruning, rather than the freezing side. We > can't freeze a tuple if it has an alive predecessor - rather than > weakining this, we should be fixing the pruning to not have the alive > predecessor. I gave a look at HTSV back then, but I didn't find what the right tweak was, but then I only tried changing the return value to DEAD and DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based on OldestXmin didn't occur to me ... I was thinking that the fact that there were live lockers meant that the tuple could not be removed, obviously failing to notice that the subsequent versions of the tuple would be good enough. > If the update xmin is actually below the cutoff we can remove the tuple > even if there's live lockers - the lockers will also be present in the > newer version of the tuple. I verified that for me that fixes the > problem. Obviously that'd require some comment work and more careful > diagnosis. Sounds good. I'll revert those commits then, keeping the test, and then you can commit your change. OK? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
On Thu, Oct 26, 2017 at 12:51 PM, Pavel Stehulewrote: > The variables can be modified by SQL command SET (this is taken from > standard, and it natural) > > SET varname = expression; Overloading SET to handle both variables and GUCs seems likely to create problems, possibly including security problems. For example, maybe a security-definer function could leave behind variables to trick the calling code into failing to set GUCs that it intended to set. Or maybe creating a variable at the wrong time will just break things randomly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Nov 2, 2017 at 4:50 PM, Andres Freundwrote: > I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in > the face of previously corrupted tuple chains and pg_upgraded clusters - > it can lead to tuples being considered related, even though they they're > from entirely independent hot chains. Especially when upgrading 9.3 post > your fix, to current releases. I think this is a key point. If the new behavior were merely not entirely correct, we could perhaps refine it later. But it's not only not correct - it actually has the potential to create new problems that didn't exist before those commits. And if we release without reverting those commits then we can't change our mind later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Client Connection redirection support for PostgreSQL
Proposal: Add the ability to the PostgreSQL server instance to route the traffic to a different server instance based on the rules defined in server's pg_bha.conf configuration file. At a high level this enables offloading the user requests to a different server instance based on the rules defined in the pg_hba.conf configuration file. Some of the interesting scenarios this enables include but not limited to - rerouting traffic based on the client hosts, users, database, etc. specified, redirecting read-only query traffic to the hot stand by replicas, and in multi-master scenarios. The rules to route the traffic will be provided in the pg_hba.conf file. The proposal is to add a new optional field 'RoutingList' to the record format. The RoutingList contains comma-seperated list of one or more servers that can be routed the traffic to. In the absence of this new field there is no change to the current login code path for both the server and the client. RoutingList can be updated for each new connection to balance the load across multiple server instances RoutingList format: server_address1:port, server_address2:port... The message flow 1. Client connects to the server, and server accepts the connections 2. Client sends the startup message 3. Server looks at the rules configured in the pg_hba.conf file and * If the rule matches redirection i. Send a special message with the RoutingList described above ii. Server disconnects * If the rule doesn't have RoutingList defined i. Server proceeds in the existing code path and sends auth request 1. Client gets the list of addresses and attempts to connect to a server in the list provided until the first successful connections is established or the list is exhausted. If the client can't connect to any server instance on the RoutingList, client reports the login failure message. Backward compatibility: There are a few ways to provide the backward compatibility, and each approach has their own advantages and disadvantage and are listed below 1. Bumping the protocol version - old server instances may not understand the new client protocol 2. Adding additional optional parameter routing_enabled, without bumping the protocol version. In this approach, old Postgres server instances may not understand this and fail the connections. 3. The current proposal - to keep it in the hba.conf and let the server admin deal with the configuration by taking conscious choice on the configuration of routing list based on the clients connecting to the server instance. Backward compatibility scenarios: * The feature is not usable for the existing clients, and the new servers shouldn't set the routing list if they expect any connections from the legacy clients. We should do either (1) or (2) in the above list to achieve this. Otherwise need to rely on the admin to take care of the settings. * For the new client connecting to the old server, there is no change in the message flow * For the new clients to the new server, the message flow will be based on the routing list filed in the configuration. This proposal is in very early stage, comments and feedback is very much appreciated. Thanks, Satya
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Hi, On 2017-09-28 14:47:53 +, Alvaro Herrera wrote: > Fix freezing of a dead HOT-updated tuple > > Vacuum calls page-level HOT prune to remove dead HOT tuples before doing > liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But > concurrent transaction commit/abort may turn DEAD some of the HOT tuples > that survived the prune, before HeapTupleSatisfiesVacuum tests them. > This happens to activate the code that decides to freeze the tuple ... > which resuscitates it, duplicating data. > > (This is especially bad if there's any unique constraints, because those > are now internally violated due to the duplicate entries, though you > won't know until you try to REINDEX or dump/restore the table.) > > One possible fix would be to simply skip doing anything to the tuple, > and hope that the next HOT prune would remove it. But there is a > problem: if the tuple is older than freeze horizon, this would leave an > unfrozen XID behind, and if no HOT prune happens to clean it up before > the containing pg_clog segment is truncated away, it'd later cause an > error when the XID is looked up. > > Fix the problem by having the tuple freezing routines cope with the > situation: don't freeze the tuple (and keep it dead). In the cases that > the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag > so that there is no need to look up the XID in pg_clog later on. I think this is the wrong fix - the assumption that ctid chains can be validated based on the prev-xmax = cur-xmin is fairly ingrained into the system, and we shouldn't just be breaking it. The need to later lobotomize the checks, in a5736bf754, is some evidence of that. I spent some time discussing this with Robert today (with both of us alternating between feeling the other and ourselves as stupid), and the conclusion I think is that the problem is on the pruning, rather than the freezing side. FWIW, I don't think the explanation in the commit message of how the problem triggers is actually correct - the testcase you added doesn't have any transactions concurrently committing / aborting when crashing. Rather the problem is that the liveliness checks for freezing is different from the ones for pruning. HTSV considers xmin RECENTLY_DEAD when there's a multi xmax with at least one alive locker, whereas pruning thinks it has to do something because there's the member xid below the cutoff. No concurrent activity is needed to trigger that. I think the problem is on the pruning, rather than the freezing side. We can't freeze a tuple if it has an alive predecessor - rather than weakining this, we should be fixing the pruning to not have the alive predecessor. The relevant logic in HTSV is: if (tuple->t_infomask & HEAP_XMAX_IS_MULTI) { TransactionId xmax; if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false)) { /* already checked above */ Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)); xmax = HeapTupleGetUpdateXid(tuple); /* not LOCKED_ONLY, so it has to have an xmax */ Assert(TransactionIdIsValid(xmax)); if (TransactionIdIsInProgress(xmax)) return HEAPTUPLE_DELETE_IN_PROGRESS; else if (TransactionIdDidCommit(xmax)) /* there are still lockers around -- can't return DEAD here */ return HEAPTUPLE_RECENTLY_DEAD; /* updating transaction aborted */ return HEAPTUPLE_LIVE; with the problematic branch being the TransactionIdDidCommit() case. Rather than unconditionally returning HEAPTUPLE_RECENTLY_DEAD, we should test the update xid against OldestXmin and return DEAD / RECENTLY_DEAD according to that. If the update xmin is actually below the cutoff we can remove the tuple even if there's live lockers - the lockers will also be present in the newer version of the tuple. I verified that for me that fixes the problem. Obviously that'd require some comment work and more careful diagnosis. I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in the face of previously corrupted tuple chains and pg_upgraded clusters - it can lead to tuples being considered related, even though they they're from entirely independent hot chains. Especially when upgrading 9.3 post your fix, to current releases. In short, I think the two commits should be reverted, and replaced with a fix along what I'm outlining above. There'll be some trouble for people that upgraded to an unreleased version, but I don't really see what we could do about that. I could be entirely wrong - I've been travelling for the last two weeks and my brain is somewhat more fried than usual. Regards, Andres -- Sent via pgsql-hackers mailing list