Re: [HACKERS] path toward faster partition pruning
On Mon, Nov 6, 2017 at 3:31 PM, Amit Langotewrote: > Attached updated set of patches, including the fix to make the new pruning > code handle Boolean partitioning. > Hi Amit, I have tried pruning for different values of constraint exclusion GUC change, not sure exactly how it should behave, but I can see with the delete statement pruning is not happening when constraint_exclusion is off, but select is working as expected. Is this expected behaviour? create table lp (c1 int, c2 text) partition by list(c1); create table lp1 partition of lp for values in (1,2); create table lp2 partition of lp for values in (3,4); create table lp3 partition of lp for values in (5,6); insert into lp values (1,'p1'),(2,'p1'),(3,'p2'),(4,'p2'),(5,'p3'); show constraint_exclusion ; constraint_exclusion -- partition (1 row) explain select c1 from lp where c1 >= 1 and c1 < 2; QUERY PLAN -- Append (cost=0.00..29.05 rows=6 width=4) -> Seq Scan on lp1 (cost=0.00..29.05 rows=6 width=4) Filter: ((c1 >= 1) AND (c1 < 2)) (3 rows) explain delete from lp where c1 >= 1 and c1 < 2; QUERY PLAN -- Delete on lp (cost=0.00..29.05 rows=6 width=6) Delete on lp1 -> Seq Scan on lp1 (cost=0.00..29.05 rows=6 width=6) Filter: ((c1 >= 1) AND (c1 < 2)) (4 rows) set constraint_exclusion = off; explain select c1 from lp where c1 >= 1 and c1 < 2; QUERY PLAN -- Append (cost=0.00..29.05 rows=6 width=4) -> Seq Scan on lp1 (cost=0.00..29.05 rows=6 width=4) Filter: ((c1 >= 1) AND (c1 < 2)) (3 rows) *explain delete from lp where c1 >= 1 and c1 < 2;* QUERY PLAN -- Delete on lp (cost=0.00..87.15 rows=18 width=6) Delete on lp1 Delete on lp2 Delete on lp3 -> Seq Scan on lp1 (cost=0.00..29.05 rows=6 width=6) Filter: ((c1 >= 1) AND (c1 < 2)) -> Seq Scan on lp2 (cost=0.00..29.05 rows=6 width=6) Filter: ((c1 >= 1) AND (c1 < 2)) -> Seq Scan on lp3 (cost=0.00..29.05 rows=6 width=6) Filter: ((c1 >= 1) AND (c1 < 2)) (10 rows) Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] MERGE SQL Statement for PG11
Ah, there is one reason not to use a mapping to CTEs to implement MERGE: it might be faster to use a single query that is a FULL OUTER JOIN of the source and target to drive the update/insert/delete operations. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On Wed, Nov 8, 2017 at 5:57 PM, Amit Khandekarwrote: > Thomas, can you please try the attached incremental patch > regress_locale_changes.patch and check if the test passes ? The patch > is to be applied on the main v22 patch. If the test passes, I will > include these changes (also for list_parted) in the upcoming v23 > patch. That looks good. Thanks. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
Hi David. Thanks for the review. (..also looking at the comments you sent earlier today.) On 2017/11/07 11:14, David Rowley wrote: > On 7 November 2017 at 01:52, David Rowley> wrote: >> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13) > > I have a little more review to share: > > 1. Missing "in" in comment. Should be "mentioned in" > > * get_append_rel_partitions > * Return the list of partitions of rel that pass the clauses mentioned > * rel->baserestrictinfo > > 2. Variable should be declared in inner scope with the following fragment: > > void > set_basic_child_rel_properties(PlannerInfo *root, >RelOptInfo *rel, >RelOptInfo *childrel, >AppendRelInfo *appinfo) > { > AttrNumber attno; > > if (rel->part_scheme) > { > > which makes the code the same as where you moved it from. It seems like you included the above changes in your attached C file, which I will incorporate into my repository. > 3. Normally lfirst() is assigned to a variable at the start of a > foreach() loop. You have code which does not follow this. > > foreach(lc, clauses) > { > Expr *clause; > int i; > > if (IsA(lfirst(lc), RestrictInfo)) > { > RestrictInfo *rinfo = lfirst(lc); > > You could assign this to a Node * since the type is unknown to you at > the start of the loop. Will make the suggested changes to match_clauses_to_partkey(). > 4. > /* > * Useless if what we're thinking of as a constant is actually > * a Var coming from this relation. > */ > if (bms_is_member(rel->relid, constrelids)) > continue; > > should this be moved to just above the op_strict() test? This one seems > cheaper. Agreed, will do. Also makes sense to move the PartCollMatchesExprColl() test together. > 5. Typo "paritions": /* No clauses to prune paritions, so scan all > partitions. */ > > But thinking about it more the comment should something more along the > lines of /* No useful clauses for partition pruning. Scan all > partitions. */ You fixed it. :) > The key difference is that there might be clauses, just without Consts. > > Actually, the more I look at get_append_rel_partitions() I think it > would be better if you re-shaped that if/else if test so that it only > performs the loop over the partindexes if it's been set. > > I ended up with the attached version of the function after moving > things around a little bit. Thanks a lot for that. Looks much better now. > I'm still reviewing but thought I'd share this part so far. As mentioned at the top, I'm looking at your latest comments and they all seem to be good points to me, so will address those in the next version. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On 8 November 2017 at 07:55, Thomas Munrowrote: > On Tue, Nov 7, 2017 at 8:03 AM, Robert Haas wrote: >> The changes to trigger.c still make me super-nervous. Hey THOMAS >> MUNRO, any chance you could review that part? > > Looking, but here's one silly thing that jumped out at me while > getting started with this patch. I cannot seem to convince my macOS > system to agree with the expected sort order from :show_data, where > underscores precede numbers: > > part_a_10_a_20 | a | 10 | 200 | 1 | > part_a_1_a_10 | a | 1 | 1 | 1 | > - part_d_1_15| b | 15 | 146 | 1 | > - part_d_1_15| b | 16 | 147 | 2 | > part_d_15_20 | b | 17 | 155 | 16 | > part_d_15_20 | b | 19 | 155 | 19 | > + part_d_1_15| b | 15 | 146 | 1 | > + part_d_1_15| b | 16 | 147 | 2 | > > It seems that macOS (like older BSDs) just doesn't know how to sort > Unicode and falls back to sorting the bits. I expect that means that > the test will also fail on any other OS with "make check > LC_COLLATE=C". I believe our regression tests are supposed to pass > with a wide range of collations including C, so I wonder if this means > we should stick a leading zero on those single digit numbers, or > something, to stabilise the output. I preferably need to retain the partition names. I have now added a LOCALE "C" for partname like this : -\set show_data 'select tableoid::regclass::text partname, * from range_parted order by 1, 2, 3, 4, 5, 6' +\set show_data 'select tableoid::regclass::text COLLATE "C" partname, * from range_parted order by 1, 2, 3, 4, 5, 6' Thomas, can you please try the attached incremental patch regress_locale_changes.patch and check if the test passes ? The patch is to be applied on the main v22 patch. If the test passes, I will include these changes (also for list_parted) in the upcoming v23 patch. Thanks -Amit Khandekar regress_locale_changes.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 7 November 2017 at 01:52, David Rowleywrote: > Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13) Hi Amit, I had another look over this today. Apologies if any of the review seems petty. Here goes: 1. If test seems to be testing for a child that's a partitioned table, but is testing for a non-NULL part_scheme. /* * If childrel is itself partitioned, add it and its partitioned * children to the list being propagated up to the root rel. */ if (childrel->part_scheme && rel->part_scheme) Should this code use IS_PARTITIONED_REL() instead? Seems a bit strange to test for a NULL part_scheme 2. There's a couple of mistakes in my bms_add_range() code. I've attached bms_add_range_fix.patch. Can you apply this to your tree? 3. This assert seems to be Asserting the same thing twice: Assert(rel->live_partitioned_rels != NIL && list_length(rel->live_partitioned_rels) > 0); A List with length == 0 is always NIL. 4. get_partitions_from_clauses(), can you comment why you perform the list_concat() there. I believe this is there so that the partition bound from the parent is passed down to the child so that we can properly eliminate all child partitions when the 2nd level of partitioning is using the same partition key as the 1st level. I think this deserves a paragraph of comment to explain this. 5. Please add a comment to explain what's going on here in classify_partition_bounding_keys() if (partattno == 0) { partexpr = lfirst(partexprs_item); partexprs_item = lnext(partexprs_item); } Looks like, similar to index expressions, that partition expressions are attno 0 to mark to consume the next expression from the list. Does this need validation that there are enough partexprs_item items like what is done in get_range_key_properties()? Or is this validated somewhere else? 6. Comment claims the if test will test something which it does not seem to test for: /* * Redundant key elimination using btree-semantics based tricks. * * Only list and range partitioning use btree operator semantics, so * skip otherwise. Also, if there are expressions whose value is yet * unknown, skip this step, because we need to compare actual values * below. */ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *)); if (partkey->strategy == PARTITION_STRATEGY_LIST || partkey->strategy == PARTITION_STRATEGY_RANGE) I was expecting this to be skipped when the clauses contained a non-const, but it does not seem to. 7. Should be "compare them" /* * If the leftarg and rightarg clauses' constants are both of the type * expected by "op" clause's operator, then compare then using the * latter's comparison function. */ But if I look at the code "compare then using the latter's comparison function." is not true, it seems to use op's comparison function not rightarg's. With most of the calls op and rightarg are the same, but not all of them. The function shouldn't make that assumption even if the args op was always the same as rightarg. 8. remove_redundant_clauses() needs an overview comment of what the function does. 9. The comment should explain what we would do in the case of key < 3 AND key <= 2 using some examples. /* try to keep only one of <, <= */ 10. Wondering why this loop runs backward? for (s = BTMaxStrategyNumber; --s >= 0;) Why not just: for (s = 0; s < BTMaxStrategyNumber; s++) I can't see a special reason for it to run backward. It seems unusual, but if there's a good reason that I've failed to realise then it's maybe worth a comment. 11. Pleae comment on why *constfalse = true is set here: if (!chk || s == (BTEqualStrategyNumber - 1)) continue; if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk, _result)) { if (!test_result) { *constfalse = true; return; } /* discard the redundant key. */ xform[s] = NULL; } Looks like we'd hit this in a case such as: WHERE key = 1 AND key > 1. Also please add a comment when discarding the redundant key maybe explain that equality is more useful than the other strategies when there's an overlap. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services bms_add_range_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Restricting maximum keep segments by repslots
Hello, At Mon, 6 Nov 2017 05:20:50 -0800, Andres Freundwrote in <20171106132050.6apzynxrqrzgh...@alap3.anarazel.de> > Hi, > > On 2017-10-31 18:43:10 +0900, Kyotaro HORIGUCHI wrote: > > - distance: > > how many bytes LSN can advance before the margin defined by > > max_slot_wal_keep_size (and wal_keep_segments) is exhasuted, > > or how many bytes this slot have lost xlog from restart_lsn. > > I don't think 'distance' is a good metric - that's going to continually > change. Why not store the LSN that's available and provide a function > that computes this? Or just rely on the lsn - lsn operator? It seems reasonable.,The 'secured minimum LSN' is common among all slots so showing it in the view may look a bit stupid but I don't find another suitable place for it. distance = 0 meant the state that the slot is living but insecured in the previous patch and that information is lost by changing 'distance' to 'min_secure_lsn'. Thus I changed the 'live' column to 'status' and show that staus in text representation. status: secured | insecured | broken So this looks like the following (max_slot_wal_keep_size = 8MB, which is a half of the default segment size) -- slots that required WAL is surely available select restart_lsn, status, min_secure_lsn, pg_current_wal_lsn() from pg_replication_slots; restart_lsn | status | min_recure_lsn | pg_current_wal_lsn +-++ 0/1A60 | secured | 0/1A00 | 0/1B42BC78 -- slots that required WAL is still available but insecured restart_lsn | status| min_recure_lsn | pg_current_wal_lsn +---++ 0/1A60 | insecured | 0/1C00 | 0/1D76C948 -- slots that required WAL is lost # We should have seen the log 'Some replication slots have lost...' restart_lsn | status | min_recure_lsn | pg_current_wal_lsn +++ 0/1A60 | broken | 0/1C00 | 0/1D76C9F0 I noticed that I abandoned the segment fragment of max_slot_wal_keep_size in calculating in the routines. The current patch honors the frament part of max_slot_wal_keep_size. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 109f056e257aba70dddc8d466767ed0a1db371e2 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 28 Feb 2017 11:39:48 +0900 Subject: [PATCH 1/2] Add WAL releaf vent for replication slots Adds a capability to limit the number of segments kept by replication slots by a GUC variable. --- src/backend/access/transam/xlog.c | 39 +++ src/backend/utils/misc/guc.c | 11 src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/xlog.h | 1 + 4 files changed, 52 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index dd028a1..cfdae39 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -105,6 +105,7 @@ int wal_level = WAL_LEVEL_MINIMAL; int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ int wal_retrieve_retry_interval = 5000; +int max_slot_wal_keep_size_mb = 0; #ifdef WAL_DEBUG bool XLOG_DEBUG = false; @@ -9432,9 +9433,47 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo) if (max_replication_slots > 0 && keep != InvalidXLogRecPtr) { XLogSegNo slotSegNo; + int slotlimitsegs; + uint64 recptroff; + uint64 slotlimitbytes; + uint64 slotlimitfragment; + + recptroff = XLogSegmentOffset(recptr, wal_segment_size); + slotlimitbytes = 1024 * 1024 * max_slot_wal_keep_size_mb; + slotlimitfragment = XLogSegmentOffset(slotlimitbytes, + wal_segment_size); + + /* calculate segments to keep by max_slot_wal_keep_size_mb */ + slotlimitsegs = ConvertToXSegs(max_slot_wal_keep_size_mb, + wal_segment_size); + /* honor the fragment */ + if (recptroff < slotlimitfragment) + slotlimitsegs++; XLByteToSeg(keep, slotSegNo, wal_segment_size); + /* + * ignore slots if too many wal segments are kept. + * max_slot_wal_keep_size is just accumulated on wal_keep_segments. + */ + if (max_slot_wal_keep_size_mb > 0 && slotSegNo + slotlimitsegs < segno) + { + segno = segno - slotlimitsegs; /* must be positive */ + + /* + * warn only if the checkpoint flushes the required segment. + * we assume here that *logSegNo is calculated keep location. + */ + if (slotSegNo < *logSegNo) +ereport(WARNING, + (errmsg ("restart LSN of replication slots is ignored by checkpoint"), + errdetail("Some replication slots have lost required WAL segnents to continue by up to %ld segments.", + (segno < *logSegNo ? segno : *logSegNo) - slotSegNo))); + + /* emergency vent */ + slotSegNo = segno; + } + if (slotSegNo <=
Re: [HACKERS] Parallel Hash take II
Hi, * avoids wasting memory on duplicated hash tables * avoids wasting disk space on duplicated batch files * avoids wasting CPU executing duplicate subplans What's the last one referring to? +static void +MultiExecParallelHash(HashState *node) +{ + switch (BarrierPhase(build_barrier)) + { + case PHJ_BUILD_ALLOCATING: + + /* +* Either I just allocated the initial hash table in +* ExecHashTableCreate(), or someone else is doing that. Either +* way, wait for everyone to arrive here so we can proceed, and +* then fall through. +*/ + BarrierArriveAndWait(build_barrier, WAIT_EVENT_HASH_BUILD_ALLOCATING); Can you add a /* fallthrough */ comment here? Gcc is warning if you don't. While we currently have lotsa other places not having the annotation, it seem reasonable to have it in new code. + case PHJ_BUILD_HASHING_INNER: + + /* +* It's time to begin hashing, or if we just arrived here then +* hashing is already underway, so join in that effort. While +* hashing we have to be prepared to help increase the number of +* batches or buckets at any time, and if we arrived here when +* that was already underway we'll have to help complete that work +* immediately so that it's safe to access batches and buckets +* below. +*/ + if (PHJ_GROW_BATCHES_PHASE(BarrierAttach(>grow_batches_barrier)) != + PHJ_GROW_BATCHES_ELECTING) + ExecParallelHashIncreaseNumBatches(hashtable); + if (PHJ_GROW_BUCKETS_PHASE(BarrierAttach(>grow_buckets_barrier)) != + PHJ_GROW_BUCKETS_ELECTING) + ExecParallelHashIncreaseNumBuckets(hashtable); + ExecParallelHashEnsureBatchAccessors(hashtable); "accessors" sounds a bit weird for a bunch of pointers, but maybe that's just my ESL senses tingling wrongly. /* @@ -240,12 +427,15 @@ ExecEndHash(HashState *node) * */ HashJoinTable -ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls) +ExecHashTableCreate(HashState *state, List *hashOperators, bool keepNulls) + /* +* Parallel Hash tries to use the combined work_mem of all workers to +* avoid the need to batch. If that won't work, it falls back to work_mem +* per worker and tries to process batches in parallel. +*/ One day we're going to need a better approach to this. I have no idea how, but this per-node, and now per_node * max_parallelism, approach has only implementation simplicity as its benefit. +static HashJoinTuple +ExecParallelHashLoadTuple(HashJoinTable hashtable, MinimalTuple tuple, + dsa_pointer *shared) +{ +static void +ExecParallelHashJoinSetUpBatches(HashJoinTable hashtable, int nbatch) +{ +/* + * Get the first tuple in a given bucket identified by number. + */ +static HashJoinTuple +ExecHashFirstTupleInBucket(HashJoinTable hashtable, int bucketno) +{ + if (hashtable->parallel_state) + { + dsa_pointer p = + dsa_pointer_atomic_read(>buckets.shared[bucketno]); Can you make this, and possibly a few other places, more readable by introducing a temporary variable? +/* + * Insert a tuple at the front of a chain of tuples in DSA memory atomically. + */ +static void +ExecParallelHashPushTuple(dsa_pointer_atomic *head, + HashJoinTuple tuple, + dsa_pointer tuple_shared) +{ + do + { + tuple->next.shared = dsa_pointer_atomic_read(head); + } while (!dsa_pointer_atomic_compare_exchange(head, + >next.shared, + tuple_shared)); +} This is hard to read. + * While in the phase PHJ_BUILD_HASHING_INNER a separate pair of barriers may + * be used repeatedly as required to coordinate expansions in the number of + * batches or buckets. Their phases are as follows: + * + * PHJ_GROW_BATCHES_ELECTING -- initial state + * PHJ_GROW_BATCHES_ALLOCATING -- one allocates new batches + * PHJ_GROW_BATCHES_REPARTITIONING -- all rep s/rep/repartition/? #include "access/htup_details.h" +#include "access/parallel.h" #include
Re: [HACKERS] Fix bloom WAL tap test
On Wed, Nov 8, 2017 at 11:20 AM, Michael Paquierwrote: > On Wed, Nov 8, 2017 at 1:58 AM, Alexander Korotkov > wrote: >> On Tue, Nov 7, 2017 at 4:34 PM, Masahiko Sawada >> wrote: >>> I understood the necessity of this patch and reviewed two patches. >> >> Good, thank you. > > That's clearly a bug fix. > >>> diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile >>> index 13bd397..c1566d4 100644 >>> --- a/contrib/bloom/Makefile >>> +++ b/contrib/bloom/Makefile >>> @@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global >>> include $(top_srcdir)/contrib/contrib-global.mk >>> endif >>> >>> +check: wal-check >>> + >>> wal-check: temp-install >>> $(prove_check) >> >> >> I've tried this version Makefile. And I've seen the only difference: when >> tap tests are enabled, this version of Makefile runs tap tests before >> regression tests. While my version of Makefile runs tap tests after >> regression tests. That seems like more desirable behavior for me. But it >> would be sill nice to simplify Makefile. > > Why do you care about the order of actions? There is no dependency > between each test and it seems to me that it should remain so to keep > a maximum flexibility. This does not sacrifice coverage as well. In > short, Sawada-san's suggestion looks like a thing to me. One piece > that it is missing though is that installcheck triggers only the > SQL-based tests, and it should also trigger the TAP tests. +1 > So I think > that you should instead do something like that: > > --- a/contrib/bloom/Makefile > +++ b/contrib/bloom/Makefile > @@ -20,5 +20,12 @@ include $(top_builddir)/src/Makefile.global > include $(top_srcdir)/contrib/contrib-global.mk > endif > > +installcheck: wal-installcheck > + > +check: wal-check > + > +wal-installcheck: > + $(prove_installcheck) > + > wal-check: temp-install > $(prove_check) > > This works for me as I would expect it should. Looks good to me too. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] A hook for session start
On Tue, Nov 7, 2017 at 9:58 PM, Fabrízio de Royes Mellowrote: > On Tue, Nov 7, 2017 at 1:15 AM, Michael Paquier > wrote: >> On Sun, Nov 5, 2017 at 3:14 AM, Fabrízio de Royes Mello >> wrote: >> I was going to to hack something like that. That's interesting for the >> use case Robert has mentioned. >> >> Well, in the case of the end session hook, those variables are passed >> to the hook by being directly taken from the context from MyProcPort: >> + (*session_end_hook) (MyProcPort->database_name, >> MyProcPort->user_name); >> In the case of the start hook, those are directly taken from the >> command outer caller, but similarly MyProcPort is already set, so >> those are strings available (your patch does so in the end session >> hook)... Variables in hooks are useful if those are not available >> within the memory context, and refer to a specific state where the >> hook is called. For example, take the password hook, this uses the >> user name and the password because those values are not available >> within the session context. The same stands for other hooks as well. >> Keeping the interface minimal helps in readability and maintenance. >> See for the attached example that can be applied on top of 0003, which >> makes use of the session context, the set regression tests does not >> pass but this shows how I think those hooks had better be shaped. >> > > Makes sense... fixed. Thanks for considering it. >> + (*session_end_hook) (MyProcPort->database_name, >> MyProcPort->user_name); >> + >> + /* Make don't leave any active transactions and/or locks behind */ >> + AbortOutOfAnyTransaction(); >> + LockReleaseAll(USER_LOCKMETHOD, true); >> Let's leave this work to people actually implementing the hook contents. >> > > Fixed. > > Attached new version. I unify all three patches in just one because this is > a small patch and it's most part is test code. Thanks. This makes sense to me. + /* Hook just normal backends */ + if (session_end_hook && MyBackendId != InvalidBackendId) + (*session_end_hook) (); I have been wondering about the necessity of this restriction. Couldn't it be useful to just let the people implementing the hook do any decision-making? Tracking some non-backend shutdowns may be useful as well for logging purposes. The patch is beginning to take shape (I really like the test module you are implementing here!), still needs a bit more work. +CREATE ROLE session_hook_usr1 LOGIN; +CREATE ROLE session_hook_usr2 LOGIN; Roles created during regression tests should be prefixed with regress_. + if (prev_session_start_hook) + prev_session_start_hook(); + + if (session_start_hook_enabled) + (void) register_session_hook("START"); Shouldn't both be reversed? +/* sample sessoin end hook function */ Typo here. +CREATE DATABASE session_hook_db; [...] + if (!strcmp(dbname, "session_hook_db")) + { Creating a database is usually avoided in sql files as those can be run on existing servers using installcheck. I am getting that this restriction is in place so as it is possible to create an initial connection to the server so as the base table to log the activity is created. That's a fine assumption to me. Still, I think that the following changes should be done: - Let's restrict the logging to a role name instead of a database name, and let's parametrize it with a setting in the temporary configuration file. Let's not bother about multiple role support with a list, for the sake of tests and simplicity only defining one role looks fine to me. Comments in the code should be clear about the dependency. - The GUCs test_session_hooks.start_enabled and test_session_hooks.end_enabled are actually redundant with the database restriction (well, role restriction per previous comment), so let's remove them. Please let's also avoid ALTER SYSTEM calls in tests as it would impact existing installations with installcheck. Impossible to tell committer's feeling about this test suite, but my opinion is to keep it as that's a good template example about what can be done with those two hooks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Nov 8, 2017 at 2:51 AM, Robert Haaswrote: > On Mon, Nov 6, 2017 at 9:57 PM, Amit Kapila wrote: > >>> Also, even if inheritance is used, we might still be the >>> topmost scan/join target. >> >> Sure, but in that case, it won't generate the gather path here (due to >> this part of check "!root->append_rel_list"). I am not sure whether I >> have understood the second part of your question, so if my answer >> appears inadequate, then can you provide more details on what you are >> concerned about? > > I don't know why the question of why root->append_rel_list is empty is > the relevant thing here for deciding whether to generate gather paths > at this point. > This is required to prohibit generating gather path for top rel in case of inheritence (Append node) at this place (we want to generate it later when scan/join target is available). For such a case, the reloptkind will be RELOPT_BASEREL and simple_rel_array_size will be greater than two as it includes child rels as well. So, the check for root->append_rel_list will prohibit generating gather path for such a rel. Now, for all the child rels of Append, the reloptkind will be RELOPT_OTHER_MEMBER_REL, so it won't generate gather path here because of the first part of check (rel->reloptkind == RELOPT_BASEREL). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On Tue, Nov 7, 2017 at 8:03 AM, Robert Haaswrote: > The changes to trigger.c still make me super-nervous. Hey THOMAS > MUNRO, any chance you could review that part? Looking, but here's one silly thing that jumped out at me while getting started with this patch. I cannot seem to convince my macOS system to agree with the expected sort order from :show_data, where underscores precede numbers: part_a_10_a_20 | a | 10 | 200 | 1 | part_a_1_a_10 | a | 1 | 1 | 1 | - part_d_1_15| b | 15 | 146 | 1 | - part_d_1_15| b | 16 | 147 | 2 | part_d_15_20 | b | 17 | 155 | 16 | part_d_15_20 | b | 19 | 155 | 19 | + part_d_1_15| b | 15 | 146 | 1 | + part_d_1_15| b | 16 | 147 | 2 | It seems that macOS (like older BSDs) just doesn't know how to sort Unicode and falls back to sorting the bits. I expect that means that the test will also fail on any other OS with "make check LC_COLLATE=C". I believe our regression tests are supposed to pass with a wide range of collations including C, so I wonder if this means we should stick a leading zero on those single digit numbers, or something, to stabilise the output. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Expand empty end tag
On Wed, Nov 8, 2017 at 11:15 AM, Peter Eisentrautwrote: > Expand empty end tag Perhaps you missed this patch? https://www.postgresql.org/message-id/CAJrrPGdkL8TFk+-VivrW637js0v_KM=ub4pBFy=nf0bpafb...@mail.gmail.com It seems to me that the information within brackets should not be inside the filename markup :) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix bloom WAL tap test
On Wed, Nov 8, 2017 at 1:58 AM, Alexander Korotkovwrote: > On Tue, Nov 7, 2017 at 4:34 PM, Masahiko Sawada > wrote: >> I understood the necessity of this patch and reviewed two patches. > > Good, thank you. That's clearly a bug fix. >> diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile >> index 13bd397..c1566d4 100644 >> --- a/contrib/bloom/Makefile >> +++ b/contrib/bloom/Makefile >> @@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global >> include $(top_srcdir)/contrib/contrib-global.mk >> endif >> >> +check: wal-check >> + >> wal-check: temp-install >> $(prove_check) > > > I've tried this version Makefile. And I've seen the only difference: when > tap tests are enabled, this version of Makefile runs tap tests before > regression tests. While my version of Makefile runs tap tests after > regression tests. That seems like more desirable behavior for me. But it > would be sill nice to simplify Makefile. Why do you care about the order of actions? There is no dependency between each test and it seems to me that it should remain so to keep a maximum flexibility. This does not sacrifice coverage as well. In short, Sawada-san's suggestion looks like a thing to me. One piece that it is missing though is that installcheck triggers only the SQL-based tests, and it should also trigger the TAP tests. So I think that you should instead do something like that: --- a/contrib/bloom/Makefile +++ b/contrib/bloom/Makefile @@ -20,5 +20,12 @@ include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk endif +installcheck: wal-installcheck + +check: wal-check + +wal-installcheck: + $(prove_installcheck) + wal-check: temp-install $(prove_check) This works for me as I would expect it should. Could you update the patch? That's way more simple than having to replicate again rules like regresscheck and regressioncheck-install. I am switching the patch back to "waiting on author" for now. I can see that src/test/modules/commit_ts is missing the shot as well, and I think that's the case as well of other test modules like pg_dump's suite.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix bloom WAL tap test
On Wed, Nov 8, 2017 at 1:49 AM, Alexander Korotkovwrote: > On Tue, Nov 7, 2017 at 4:26 PM, Fabrízio Mello > wrote: >> The patch doesn't apply against master: >> >> fabrizio@macanudo:/d/postgresql (master) >> $ git apply /tmp/wal-check-on-bloom-check.patch >> error: contrib/bloom/Makefile: already exists in working directory > > Apparently I sent patches whose are ok for "patch -p1", but not ok for "git > apply". > Sorry for that. I resubmit both of them. I would not worry about that as long as there is a way to apply patches. git apply fails to easily to be honest, while patch -p1 is more permissive. I use the latter extensively by the way, because it has the merit to not be a PITA. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove duplicate setting in test/recovery/Makefile
On Wed, Nov 8, 2017 at 10:38 AM, Masahiko Sawadawrote: > Hi, > > I found that EXTRA_INSTALL is doubly set at both top and bottom of the > src/test/recovery/Makefile. Is it necessary? > > Attached patch fixes this. Indeed, there is some bad overlap between d851bef and 1148e22a. Better to CC Simon who committed both things. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix a typo in dsm_impl.c
On Wed, Nov 8, 2017 at 6:36 AM, Robert Haaswrote: > On Mon, Nov 6, 2017 at 11:22 PM, Masahiko Sawada > wrote: >> Attached the patch for $subject. > > Committed. > Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Remove duplicate setting in test/recovery/Makefile
Hi, I found that EXTRA_INSTALL is doubly set at both top and bottom of the src/test/recovery/Makefile. Is it necessary? Attached patch fixes this. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center remove_duplicate_setting.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] OpenTemporaryFile() vs resowner.c
Hi hackers, Andres, Robert and Peter G rightly complained[1] that my shared temporary file patch opens a file, then calls ResourceOwnerEnlargeFiles() which can fail due to lack of memory, and then registers the file handle to make sure we don't leak it. Doh. The whole point of the separate ResourceOwnerEnlargeXXX() interface is to be able to put it before resource acquisition. The existing OpenTemporaryFile() coding has the same mistake. Please see attached. [1] https://www.postgresql.org/message-id/20171107210155.kuksdd324kgz5oev%40alap3.anarazel.de -- Thomas Munro http://www.enterprisedb.com fix-file-leak.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exclude pg_internal.init from base backup
On Wed, Nov 8, 2017 at 9:50 AM, Haribabu Kommiwrote: > Thanks for the correction. I was not much aware of SGML markup usage. > While building the documentation, it raises an warning message of "empty > end-tag". > So I just added the end tag. Attached the update patch with the suggested > correction. Ah, I can see the warning as well. Using empty tags is forbidden since c29c5789, which is really recent. Sorry for missing it. Simon got trapped by that as well visibly. Your patch looks good to me. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exclude pg_internal.init from base backup
On Wed, Nov 8, 2017 at 11:11 AM, Michael Paquierwrote: > On Wed, Nov 8, 2017 at 9:04 AM, Haribabu Kommi > wrote: > > The commit 98267e missed to check the empty SGML tag, attached patch > > fixes the same. > > > > - pg_internal.init (found in multiple directories) > + pg_internal.init (found in multiple > directories) > > > What has been committed in 98267ee and what is proposed here both look > incorrect to me. The markup filename ought to be used only with file > names, so "(found in multiple directories)" should not be within it. > Simon's commit is not wrong with the markup usage by the way. > Thanks for the correction. I was not much aware of SGML markup usage. While building the documentation, it raises an warning message of "empty end-tag". So I just added the end tag. Attached the update patch with the suggested correction. Regards, Hari Babu Fujitsu Australia sgml_empty_tag_fix_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Tue, Nov 07, 2017 at 03:31:22PM -0800, Peter Geoghegan wrote: > On Tue, Nov 7, 2017 at 3:29 PM, Nico Williamswrote: > > On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote: > >> Nico Williams wrote: > >> >A MERGE mapped to a DML like this: > > > > I needed to spend more time reading MERGE docs from other RDBMSes. > > Please don't hijack this thread. It's about the basic question of > semantics, and is already hard enough for others to follow as-is. I'm absolutely not. If you'd like a pithy summary devoid of detail, it is this: I'm making the argument that using ON CONFLICT to implement MERGE cannot produce a complete implementation [you seem to agree], but there is at least one light-weight way to implement MERGE with _existing_ machinery in PG: CTEs. It's perfectly fine to implement an executor for MERGE, but I think that's a bit silly and I explain why. Further, I explored your question regarding order of events, which you (and I) think is a very important semantics question. You thought order of execution / trigger firing should be defined, whereas I think it should not because MERGE explicitly says, at least MSFT's! MSFT's MERGE says: | For every insert, update, or delete action specified in the MERGE | statement, SQL Server fires any corresponding AFTER triggers defined | on the target table, but does not guarantee on which action to fire | triggers first or last. Triggers defined for the same action honor the | order you specify. Impliedly (though not stated explicitly), the actual updates, inserts, and deletes, can happen in any order as well as the triggers firing in any order. As usual, in the world of programming language design, leaving order of execution undefined as much as possible increases the level of available opportunities to parallelize. Presumably MSFT is leaving the door open to parallizing MERGE, if they haven't already. Impliedly, CTEs that have no dependencies on each other are also ripe for parallelization. This is important too! For one of my goals is: to improve CTE performance. If implementing MERGE as a mapping to CTEs leads to improvements in CTEs, so much the better. But also this *is* a simple implementation of MERGE, and simplicity seems like a good thing. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exclude pg_internal.init from base backup
On Wed, Nov 8, 2017 at 9:04 AM, Haribabu Kommiwrote: > The commit 98267e missed to check the empty SGML tag, attached patch > fixes the same. - pg_internal.init (found in multiple directories) + pg_internal.init (found in multiple directories) What has been committed in 98267ee and what is proposed here both look incorrect to me. The markup filename ought to be used only with file names, so "(found in multiple directories)" should not be within it. Simon's commit is not wrong with the markup usage by the way. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exclude pg_internal.init from base backup
On Wed, Nov 8, 2017 at 3:03 AM, Simon Riggswrote: > On 5 November 2017 at 11:55, Magnus Hagander wrote: > > On Sat, Nov 4, 2017 at 4:04 AM, Michael Paquier < > michael.paqu...@gmail.com> > > wrote: > >> > >> On Fri, Nov 3, 2017 at 4:04 PM, Petr Jelinek > >> wrote: > >> > Not specific problem to this patch, but I wonder if it should be made > >> > more clear that those files (there are couple above of what you added) > >> > are skipped no matter which directory they reside in. > >> > >> Agreed, it is a good idea to tell in the docs how this behaves. We > >> could always change things so as the comparison is based on the full > >> path like what is done for pg_control, but that does not seem worth > >> complicating the code. > > > > > > pg_internal.init can, and do, appear in multiple different directories. > > pg_control is always in the same place. So they're not the same thing. > > > > So +1 for documenting the difference in how these are handled, as this is > > important to know for somebody writing an external tool for it. > > Changes made, moving to commit the attached patch. > The commit 98267e missed to check the empty SGML tag, attached patch fixes the same. Regards, Hari Babu Fujitsu Australia sgml_empty_tag_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Hash take II
Hi Peter, See responses to a couple of points below. I'll respond to the other points separately (ie with code/comment changes). On Wed, Nov 8, 2017 at 10:32 AM, Peter Geogheganwrote: > On Tue, Nov 7, 2017 at 1:01 PM, Andres Freund wrote: >> +/* >> + * Delete a BufFile that was created by BufFileCreateShared in the given >> + * SharedFileSet using the given name. >> + * >> + * It is not necessary to delete files explicitly with this function. It is >> + * provided only as a way to delete files proactively, rather than waiting >> for >> + * the SharedFileSet to be cleaned up. >> + * >> + * Only one backend should attempt to delete a given name, and should know >> + * that it exists and has been exported or closed. >> + */ > > This part is new to me. We now want one backend to delete a given > filename. What changed? Please provide a Message-Id reference if that > will help me to understand. > > For now, I'm going to guess that this development had something to do > with the need to deal with virtual FDs that do a close() on an FD to > keep under backend limits. Do I have that right? No -- this is simply an option available to client code that wants to clean up individual temporary files earlier. Such client code is responsible for meeting the synchronisation requirements described in the comment, but it's entirely optional. For example: SharedTuplestore is backed by multiple files and it could take the opportunity to delete individual files after they've been scanned, if you told it you were going to scan only once (SHARED_TUPLESTORE_SINGLE_PASS) and if it could prove that no other backend could still need to read from it, as mentioned in a comment in sts_end_parallel_scan(). However, since I changed to the page/chunk based model (spinlock while advancing block counter, mimicking Parallel Seq Scan) instead of the earlier Fisher Price version (LWLock while reading each tuple with a shared read head maintained with tell/seek), I don't actually do that. Hitting the end of the file no longer means that no one else is reading from the file (someone else might still be reading from an earlier chunk even though you've finished reading the final chunk in the file, and the vfd systems means that they must be free to close and reopen the file at any time). In the current patch version the files are cleaned up wholesale at two times: SharedFileSet cleanup triggered by DSM destruction, and SharedFileSet reset triggered by rescan. Practically, it's always the former case. It's vanishingly rare that you'd actually want to be rescanning a Parallel Hash that spills to disk but in that case we delete the old files and recreate, and that case is tested in the regression tests. If it bothers you that I have an API there that I'm not actually using yet, I will remove it. >> + if (vfdP->fdstate & FD_TEMP_FILE_LIMIT) >> + { >> + /* Subtract its size from current usage (do first in case of >> error) */ >> + temporary_files_size -= vfdP->fileSize; >> + vfdP->fileSize = 0; >> + } >> >> So, is it right to do so unconditionally and without regard for errors? >> If the file isn't deleted, it shouldn't be subtracted from fileSize. I >> guess you're managing that through the flag, but that's not entirely >> obvious. > > I think that the problem here is that the accounting is expected to > always work. It's not like there is a resowner style error path in > which temporary_files_size gets reset to 0. But there is a resowner error path in which File handles get automatically closed and temporary_files_size gets adjusted. The counter goes up when you create, and goes down when you close or resowner closes for you. Eventually you either close and the bookkeeping is consistent or you crash and it doesn't matter. And some kind of freak multiple close attempt is guarded against by setting the files size to 0 so we can't double-subtract. Do you see a bug? None of this has any impact on whether files are leaked: either SharedFileSet removes the files, or you crash (or take a filesystem snapshot, etc) and RemovePgTempFiles() mops them up at the next clean startup. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall
On Wed, Nov 8, 2017 at 8:48 AM, Robert Haaswrote: > On Tue, Nov 7, 2017 at 4:35 AM, Haribabu Kommi > wrote: > > The newly added option is not recommended to be used in normal cases and > > it is used only for upgrade utilities. > > I don't know why it couldn't be used in normal cases. That seems like > a totally legitimate thing for somebody to want. Maybe nobody does, > but I see no reason to worry if they do. Ok. Removed the documentation changes that it cannot be used for normal scenarios, and also added a Note section explaining the problem of using the dump with pg_restore command with --clean and --create options. Regards, Hari Babu Fujitsu Australia pg_dump-and-pg_dumpall-database-handling-refactoring_v10.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Tue, Nov 7, 2017 at 3:29 PM, Nico Williamswrote: > On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote: >> Nico Williams wrote: >> >A MERGE mapped to a DML like this: > > I needed to spend more time reading MERGE docs from other RDBMSes. Please don't hijack this thread. It's about the basic question of semantics, and is already hard enough for others to follow as-is. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints
On Mon, Nov 06, 2017 at 05:50:21PM +1300, Thomas Munro wrote: > On Fri, Oct 20, 2017 at 9:05 AM, Nico Williamswrote: > > Rebased (there were conflicts in the SGML files). > > Hi Nico > > FYI that version has some stray absolute paths in constraints.source: > > -COPY COPY_TBL FROM '@abs_srcdir@/data/constro.data'; > +COPY COPY_TBL FROM > '/home/nico/ws/postgres/src/test/regress/data/constro.data'; > > -COPY COPY_TBL FROM '@abs_srcdir@/data/constrf.data'; > +COPY COPY_TBL FROM > '/home/nico/ws/postgres/src/test/regress/data/constrf.data'; Oops! Thanks for catching that! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote: > Nico Williamswrote: > >A MERGE mapped to a DML like this: I needed to spend more time reading MERGE docs from other RDBMSes. The best MERGE so far is MS SQL Server's, which looks like: MERGE INTO USING ON () -- optional: WHEN MATCHED THEN UPDATE SET ... -- optional: WHEN NOT MATCHED [ BY TARGET ] THEN INSERT ... -- optional: WHEN NOT MATCHED BY SOURCE THEN DELETE -- optional: OUTPUT ... ; (The other MERGEs are harder to use because they lack a WHEN NOT MATCHED BY SOURCE THEN DELETE, instead having a DELETE clause on the UPDATE, which is then difficult to use.) This is *trivial* to map to a CTE, and, in fact, I and my colleagues have resorted to hand-coded CTEs like this precisely because PG lacks MERGE (though we ourselves didn't know about MERGE -- it's new to us). If is a query, then we start with a CTE for that, else if it's a view or table, then we don't setup a CTE for it. Each of the UPDATE, INSERT, and/or DELETE can be it's own CTE. If there's an OUTPUT clause, that can be a final SELECT that queries from the CTEs that ran the DMLs with RETURNING. If there's no OUTPUT then none of the DMLs need to have RETURNING, and one of them will be the main statement, rather than a CTE. The pattern is: WITH -- IFF is a query: AS (), -- IFF there's a WHEN MATCHED THEN UPDATE updates AS ( UPDATE AS SET ... FROM WHERE -- IFF there's an OUTPUT clause, then: RETURNING 'update' as "@action", ... ), inserts AS ( INSERT INTO () SELECT ... FROM LEFT JOIN ON WHERE IS NOT DISTINCT FROM NULL -- IFF there's a CONCURRENTLY clause: ON CONFLICT DO NOTHING -- IFF there's an OUTPUT clause, then: RETURNING 'insert' as "@action", ... ), deletes AS ( DELETE FROM WHERE NOT EXISTS (SELECT * FROM WHERE ) -- IFF there's an OUTPUT clause, then: RETURNING 'delete' as "@action", ... ), -- IFF there's an OUTPUT clause SELECT * FROM updates UNION SELECT * FROM inserts UNION SELECT * FROM deletes; If there's not an output clause then one of the DMLs has to be the main statement: WITH ... DELETE ...; -- or UPDATE, or INSERT Note that if the source is a view or table and there's no OUTPUT clause, then it's one DML with up to (but not referring to) two CTEs, and in all cases the CTEs do not refer to each other. This means that the executor can parallelize all of the DMLs. If the source is a query, then that could be made a temp view to avoid having to run the query first. The CTE executor needs to learn to sometimes do this anyways, so this is good. The CTE can be equivalently written without a NOT EXISTS: to_be_deleted AS ( SELECT FROM LEFT JOIN ON () WHERE IS NOT DISTINCT FROM NULL ), deletes AS ( DELETE FROM USING to_be_deleted tbd WHERE = ) if that were to run faster (probably not, since PG today would first run the to_be_deleted CTE, then the deletes CTE). I mention only because it's nice to see the symmetry of LEFT JOINs for the two WHEN NOT MATCHED cases. (Here is the alias for it if one was given.) *** This mapping triggers triggers as one would expect (at least FOR EACH ROW; I expect the DMLs in CTEs should also trigger FOR EACH STATEMENT triggers, and if they don't I consider that a bug). > This is a bad idea. An implementation like this is not at all > maintainable. I beg to differ. First of all, not having to add an executor for MERGE is a win: much, much less code to maintain. The code to map MERGE to CTEs can easily be contained entirely in src/backend/parser/gram.y, which is a maintainability win: any changes to how CTEs are compiled will fail to compile if they break the MERGE mapping to CTEs. > >can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE. > > That's not handling concurrency -- it's silently ignoring an error. Who > is to say that the conflict that IGNORE ignored is associated with a row > visible to the MVCC snapshot of the statement? IOW, why should the DELETE > affect any row? That was me misunderstanding MERGE. The DELETE is independent of the INSERT -- if an INSERT does nothing because of an ON CONFLICT DO NOTHING clause, then that won't cause that row to be deleted -- the inserts and deletes CTEs are independent in the latest mapping (see above). I believe adding ON CONFLICT DO NOTHING to the INSERT in this mapping is all that's needed to support concurrency. > There are probably a great many reasons why you need a ModifyTable > executor node that keeps around state, and explicitly indicates that a > MERGE is a MERGE. For example, we'll probably want statement level > triggers to execute in a fixed order, regardless of the MERGE, RLS will > probably require explicitly knowledge of MERGE
Re: [HACKERS] Exclude pg_internal.init from base backup
On Wed, Nov 8, 2017 at 1:42 AM, David Steelewrote: > On 11/7/17 11:03 AM, Simon Riggs wrote: >> On 5 November 2017 at 11:55, Magnus Hagander wrote: >>> >>> So +1 for documenting the difference in how these are handled, as this is >>> important to know for somebody writing an external tool for it. >> >> Changes made, moving to commit the attached patch. > > Thanks, Simon! This was on my to do list today -- glad I checked my > email first. +pg_internal.init files can be omitted from the +backup whenever a file of that name is found. These files contain +relation cache data that is always rebuilt when recovering. + Do we want to mention in the docs that the same decision-making is done for *all* files with matching names, aka the fact that if a file is listed and found in a sub-folder it is skipped? postmaster.opts or similar friends are unlikely to be found in anything but the root of the data folder, still the upthread argument of documenting precisely what basebackup.c does sounded rather convincing to me. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove secondary checkpoint
On Wed, Nov 8, 2017 at 2:23 AM, Simon Riggswrote: > On 31 October 2017 at 12:01, Michael Paquier > wrote: >> While the mention about a manual checkpoint happening after a timed >> one will cause a full range of WAL segments to be recycled, it is not >> actually true that segments of the prior's prior checkpoint are not >> needed, because with your patch the segments of the prior checkpoint >> are getting recycled. So it seems to me that based on that the formula >> ought to use 1.0 instead of 2.0... > > I think the argument in the comment is right, in that > CheckPointDistanceEstimate is better if we use multiple checkpoint > cycles. Yes, the theory behind is correct. No argument behind that. > But the implementation of that is bogus and multiplying by 2.0 > wouldn't make it better if CheckPointDistanceEstimate is wrong. Yes, this is wrong. My apologies if my words looked confusing. By reading your message I can see that our thoughts are on the same page. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
On Tue, Nov 7, 2017 at 2:40 PM, Юрий Соколовwrote: >> The same is true of unique indexes vs. non-unique. > > offtopic: recently I'd a look at setting LP_DEAD in indexes. > I didn't found huge difference between unique and non-unique indices. > There is codepath that works only for unique, but it is called less > frequently than common codepath that also sets LP_DEAD. I meant to say that this is only important with UPDATEs + contention. The extra LP_DEAD setting within _bt_check_unique() makes quite a noticeable difference, at least in terms of index bloat (though less so in terms of raw TPS). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
On Tue, Nov 7, 2017 at 2:36 PM, Tom Lanewrote: > Peter Geoghegan writes: >> My point is only that it's worth considering that this factor affects >> how representative your sympathetic case is. It's not clear how many >> PageIndexMultiDelete() calls are from opportunistic calls to >> _bt_vacuum_one_page(), how important that subset of calls is, and so >> on. Maybe it doesn't matter at all. > > According to the perf measurements I took earlier, essentially all the > compactify_tuple calls in this test case are from PageRepairFragmentation > (from heap_page_prune), not PageIndexMultiDelete. For a workload with high contention (e.g., lots of updates that follow a Zipfian distribution) lots of important cleanup has to occur within _bt_vacuum_one_page(), and with an exclusive buffer lock held. It may be that making PageIndexMultiDelete() faster pays off disproportionately well there, but I'd only expect to see that at higher client count workloads with lots of contention -- workloads that we still do quite badly on (note that we always have not done well here, even prior to commit 2ed5b87f9 -- Yura showed this at one point). It's possible that this work influenced Yura in some way. When Postgres Pro did some benchmarking of this at my request, we saw that the bloat got really bad past a certain client count. IIRC there was a clear point at around 32 or 64 clients where TPS nosedived, presumably because cleanup could not keep up. This was a 128 core box, or something like that, so you'll probably have difficulty recreating it with what's at hand. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
On Wed, Nov 8, 2017 at 2:26 AM, Jacob Championwrote: > On Mon, Nov 6, 2017 at 6:18 PM, Michael Paquier > wrote: >> Did you really test WAL replay? > > Is there a way to test this other than installcheck-world? The only > failure we've run into at the moment is in the snapshot-too-old tests. > Maybe we're not configuring with all the tests enabled? Not automatically. The simplest method I use in this case is installcheck with a standby replaying changes behind. >>> The assertion added caught at least one code path where TestForOldSnapshot >>> calls PageGetLSN without holding any lock. The snapshot_too_old test in >>> "check-world" failed due to the assertion failure. This needs to be fixed, >>> see the open question in the opening mail on this thread. >> >> Good point. I am looping Kevin Grittner here for his input, as the >> author and committer of old_snapshot_threshold. Things can be >> addressed with a separate patch by roughly moving the check on >> PageGetLSN() to TestForOldSnapshot_impl() and use BufferGetLSNAtomic() >> instead. > > It still doesn't pass the tests, as not all code paths ensure that a > content lock is held before calling TestForOldSnapshot. > BufferGetLSNAtomic only adds the spinlock. I would prefer waiting for more input from Kevin here. This may prove to be a more invasive change. There are no objections into fixing the existing callers in index AMs though. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
2017-11-08 1:11 GMT+03:00 Peter Geoghegan: > > The same is true of unique indexes vs. non-unique. offtopic: recently I'd a look at setting LP_DEAD in indexes. I didn't found huge difference between unique and non-unique indices. There is codepath that works only for unique, but it is called less frequently than common codepath that also sets LP_DEAD.
Re: [HACKERS] Small improvement to compactify_tuples
Peter Geogheganwrites: > My point is only that it's worth considering that this factor affects > how representative your sympathetic case is. It's not clear how many > PageIndexMultiDelete() calls are from opportunistic calls to > _bt_vacuum_one_page(), how important that subset of calls is, and so > on. Maybe it doesn't matter at all. According to the perf measurements I took earlier, essentially all the compactify_tuple calls in this test case are from PageRepairFragmentation (from heap_page_prune), not PageIndexMultiDelete. I'd be the first to agree that I doubt that test case is really representative. I'd been whacking around Yura's original case to try to get PageRepairFragmentation's runtime up to some measurable fraction of the total, and while I eventually succeeded, I'm not sure that too many real workloads will look like that. However, if we can make it smaller as well as faster, that seems like a win even if it's not a measurable fraction of most workloads. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
On Tue, Nov 7, 2017 at 9:26 AM, Jacob Championwrote: > On Mon, Nov 6, 2017 at 6:18 PM, Michael Paquier > wrote: >> It seems to me that 0001 is good for a committer lookup, that will get >> rid of all existing bugs. For 0002, what you are proposing is still >> not a good idea for anything using page copies. > > I think there is still significant confusion here. To be clear: this > patch is intended to add no changes for local page copies. Maybe a better way to put this is: we see no failures with the pageinspect regression tests, which exercise PageGetLSN() via the page_header() function. Are there other tests we should be paying attention to that might show a problem with our local-copy logic? --Jacob -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
) On Tue, Nov 7, 2017 at 1:39 PM, Tom Lanewrote: > So I think we should seriously consider the attached, but it'd be a > good idea to benchmark it on a wider variety of platforms and test > cases. > create unlogged table test3 ( > id integer PRIMARY KEY with (fillfactor=85), > val text > ) WITH (fillfactor=85); Passing observation: Unlogged table B-Tree indexes have a much greater tendency for LP_DEAD setting/kill_prior_tuple() working out following commit 2ed5b87f9 [1], because unlogged tables were unaffected by that commit. (I've been meaning to follow up with my analysis of that regression, actually.) The same is true of unique indexes vs. non-unique. There are workloads where the opportunistic LP_DEAD setting performed by _bt_check_unique() is really important (it calls ItemIdMarkDead()). Think high contention workloads, like when Postgres is used to implement a queue table. My point is only that it's worth considering that this factor affects how representative your sympathetic case is. It's not clear how many PageIndexMultiDelete() calls are from opportunistic calls to _bt_vacuum_one_page(), how important that subset of calls is, and so on. Maybe it doesn't matter at all. [1] https://postgr.es/m/cah2-wzmyry7mnjf0gw5wtk3cszh3gqfhhoxvsyuno5pk8cu...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Hash take II
On Wed, Nov 8, 2017 at 10:32 AM, Robert Haaswrote: > On Tue, Nov 7, 2017 at 4:01 PM, Andres Freund wrote: >> diff --git a/src/backend/utils/resowner/resowner.c >> b/src/backend/utils/resowner/resowner.c >> index 4c35ccf65eb..8b91d5a6ebe 100644 >> --- a/src/backend/utils/resowner/resowner.c >> +++ b/src/backend/utils/resowner/resowner.c >> @@ -528,16 +528,6 @@ ResourceOwnerReleaseInternal(ResourceOwner owner, >> PrintRelCacheLeakWarning(res); >> RelationClose(res); >> } >> - >> - /* Ditto for dynamic shared memory segments */ >> - while (ResourceArrayGetAny(&(owner->dsmarr), )) >> - { >> - dsm_segment *res = (dsm_segment *) >> DatumGetPointer(foundres); >> - >> - if (isCommit) >> - PrintDSMLeakWarning(res); >> - dsm_detach(res); >> - } >> } >> else if (phase == RESOURCE_RELEASE_LOCKS) >> { >> @@ -654,6 +644,16 @@ ResourceOwnerReleaseInternal(ResourceOwner owner, >> PrintFileLeakWarning(res); >> FileClose(res); >> } >> + >> + /* Ditto for dynamic shared memory segments */ >> + while (ResourceArrayGetAny(&(owner->dsmarr), )) >> + { >> + dsm_segment *res = (dsm_segment *) >> DatumGetPointer(foundres); >> + >> + if (isCommit) >> + PrintDSMLeakWarning(res); >> + dsm_detach(res); >> + } >> } >> >> Is that entirely unproblematic? Are there any DSM callbacks that rely on >> locks still being held? Please split this part into a separate commit >> with such analysis. > > FWIW, I think this change is a really good idea (I recommended it to > Thomas at some stage, I think). Yeah, it was Robert's suggestion; I thought I needed *something* like this but was hesitant for the niggling reason that Andres mentions: what if someone somewhere (including code outside our source tree) depends on this ordering because of unlocking etc? At that time I thought that my clean-up logic wasn't going to work on Windows without this reordering, because we were potentially closing file handles after unlinking the files, and I was under the impression that Windows wouldn't like that. Since then I've learned that Windows does actually allow it, but only if all file handles were opened with the FILE_SHARE_DELETE flag. We always do that (see src/port/open.c), so in fact this change is probably not needed for my patch set (theory not tested). I will put it in a separate patch as requested by Andres, because it's generally a good idea anyway for the reasons that Robert explained (ie you probably always want to clean up memory last, since it might contain the meta-data/locks/control objects/whatever you'll need to clean up anything else). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall
On Tue, Nov 7, 2017 at 4:35 AM, Haribabu Kommiwrote: > The newly added option is not recommended to be used in normal cases and > it is used only for upgrade utilities. I don't know why it couldn't be used in normal cases. That seems like a totally legitimate thing for somebody to want. Maybe nobody does, but I see no reason to worry if they do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
Robert Haaswrites: > I think it would be a good idea, as Thomas says, to order the qual > clauses at an earlier stage and then remember our decision. However, > we have to think about whether that's going to increase planning time > in a noticeable way. I wonder why we currently postpone this until > such a late phase of planning. Because (1) up to now there's been no need to consider the qual ordering till later, and (2) re-doing that sort for each path seemed unduly expensive. If we're to try to estimate whether later quals will be reached, then sure the ordering becomes important. I'm still concerned about (2) though. If there were a way to pre-sort the quals once for all paths, the problem goes away, but I don't think that works --- indexscans may segregate some quals, and in join cases different paths will actually have different sets of quals they need to check depending on the join order. So the bottom line here is how much is it going to cost us to add this additional refinement in cost estimation, and is it worth it given our extremely poor level of accuracy in expression cost estimation. Color me dubious. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_wal_write statistics view
On Tue, Nov 7, 2017 at 4:31 AM, Haribabu Kommiwrote: >> Updated patch attached. > Patch rebased. I think the earlier concerns about the performance impact of this are probably very valid concerns, and I don't see how the new version of the patch gets us much closer to solving them. I am also not sure I understand how the backend_write_blocks column is intended to work. The only call to pgstat_send_walwrites() is in WalWriterMain, so where do the other backends report anything? Also, if there's only ever one global set of counters (as opposed to one per table, say) then why use the stats collector machinery for this at all, vs. having a structure in shared memory that can be updated directly? It seems like adding a lot of overhead for no functional benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
I've been getting less and less excited about this patch, because I still couldn't measure any above-the-noise performance improvement without artificial exaggerations, and some cases seemed actually slower. However, this morning I had an epiphany: why are we sorting at all? There is no requirement that these functions preserve the physical ordering of the tuples' data areas, only that the line-pointer ordering be preserved. Indeed, reorganizing the data areas into an ordering matching the line pointers is probably a good thing, because it should improve locality of access in future scans of the page. This is trivial to implement if we copy the data into a workspace area and back again, as I was already proposing to do to avoid memmove. Moreover, at that point there's little value in a separate compactify function at all: we can integrate the data-copying logic into the line pointer scan loops in PageRepairFragmentation and PageIndexMultiDelete, and get rid of the costs of constructing the intermediate itemIdSortData arrays. That led me to the attached patch, which is the first version of any of this work that produces an above-the-noise performance win for me. I'm seeing 10-20% gains on this modified version of Yura's original example: psql -f test3setup.sql pgbench -M prepared -c 3 -s 1000 -T 300 -P 3 -n -f test3.sql (sql scripts also attached below; I'm using 1GB shared_buffers and fsync off, other parameters stock.) However, there are a couple of objections that could be raised to this patch: 1. It's trading off per-byte work, in the form of an extra memcpy, to save sorting work that has per-tuple costs. Therefore, the relatively narrow tuples used in Yura's example offer a best-case scenario; with wider tuples the performance might be worse. 2. On a platform with memmove not so much worse than memcpy as I'm seeing on my RHEL6 server, trading memmove for memcpy might not be such a win. To address point 1, I tried some measurements on the standard pgbench scenario, which uses significantly wider tuples. In hopes of addressing point 2, I also ran the measurements on a laptop running Fedora 25 (gcc 6.4.1, glibc 2.24); I haven't actually checked memmove vs memcpy on that machine, but at least it's a reasonably late-model glibc. What I'm getting from the standard pgbench measurements, on both machines, is that this patch might be a couple percent slower than HEAD, but that is barely above the noise floor so I'm not too sure about it. So I think we should seriously consider the attached, but it'd be a good idea to benchmark it on a wider variety of platforms and test cases. regards, tom lane drop table if exists test3; create unlogged table test3 ( id integer PRIMARY KEY with (fillfactor=85), val text ) WITH (fillfactor=85); insert into test3 select i, '!'||i from generate_series(1, 1000) as i; vacuum analyze; checkpoint; create or replace function dotest3(n int, scale float8) returns void language plpgsql as $$ begin for i in 1..n loop declare id1 int := random() * scale; id2 int := random() * scale; begin perform * from test3 where id = id1; update test3 set val = '!'|| id2 where id = id1; end; end loop; end $$; select dotest3(100, :scale); diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index b6aa2af..73b73de 100644 *** a/src/backend/storage/page/bufpage.c --- b/src/backend/storage/page/bufpage.c *** PageRestoreTempPage(Page tempPage, Page *** 415,471 } /* - * sorting support for PageRepairFragmentation and PageIndexMultiDelete - */ - typedef struct itemIdSortData - { - uint16 offsetindex; /* linp array index */ - int16 itemoff; /* page offset of item data */ - uint16 alignedlen; /* MAXALIGN(item data len) */ - } itemIdSortData; - typedef itemIdSortData *itemIdSort; - - static int - itemoffcompare(const void *itemidp1, const void *itemidp2) - { - /* Sort in decreasing itemoff order */ - return ((itemIdSort) itemidp2)->itemoff - - ((itemIdSort) itemidp1)->itemoff; - } - - /* - * After removing or marking some line pointers unused, move the tuples to - * remove the gaps caused by the removed items. - */ - static void - compactify_tuples(itemIdSort itemidbase, int nitems, Page page) - { - PageHeader phdr = (PageHeader) page; - Offset upper; - int i; - - /* sort itemIdSortData array into decreasing itemoff order */ - qsort((char *) itemidbase, nitems, sizeof(itemIdSortData), - itemoffcompare); - - upper = phdr->pd_special; - for (i = 0; i < nitems; i++) - { - itemIdSort itemidptr = [i]; - ItemId lp; - - lp = PageGetItemId(page, itemidptr->offsetindex + 1); - upper -= itemidptr->alignedlen; - memmove((char *) page + upper, - (char *) page + itemidptr->itemoff, - itemidptr->alignedlen); - lp->lp_off = upper; - } - - phdr->pd_upper = upper; - } - - /* * PageRepairFragmentation * *
Re: [HACKERS] Fix a typo in dsm_impl.c
On Mon, Nov 6, 2017 at 11:22 PM, Masahiko Sawadawrote: > Attached the patch for $subject. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Hash take II
On Tue, Nov 7, 2017 at 1:01 PM, Andres Freundwrote: > +/* > + * Build the name for a given segment of a given BufFile. > + */ > +static void > +MakeSharedSegmentName(char *name, const char *buffile_name, int segment) > +{ > + snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment); > +} > > Not a fan of this name - you're not "making" a filename here (as in > allocating or such). I think I'd just remove the Make prefix. +1 Can we document the theory behind file naming here, if that isn't in the latest version? This is a routine private to parallel hash join (or shared tuplestore), not Buffile. Maybe Buffile should have some opinion on this, though. Just as a matter of style. > +/* > + * Delete a BufFile that was created by BufFileCreateShared in the given > + * SharedFileSet using the given name. > + * > + * It is not necessary to delete files explicitly with this function. It is > + * provided only as a way to delete files proactively, rather than waiting > for > + * the SharedFileSet to be cleaned up. > + * > + * Only one backend should attempt to delete a given name, and should know > + * that it exists and has been exported or closed. > + */ This part is new to me. We now want one backend to delete a given filename. What changed? Please provide a Message-Id reference if that will help me to understand. For now, I'm going to guess that this development had something to do with the need to deal with virtual FDs that do a close() on an FD to keep under backend limits. Do I have that right? > + /* > +* We don't set FD_DELETE_AT_CLOSE for files opened this way, but we > still > +* want to make sure they get closed at end of xact. > +*/ > + ResourceOwnerEnlargeFiles(CurrentResourceOwner); > + ResourceOwnerRememberFile(CurrentResourceOwner, file); > + VfdCache[file].resowner = CurrentResourceOwner; > > So maybe I'm being pedantic here, but wouldn't the right order be to do > ResourceOwnerEnlargeFiles() *before* creating the file? It's a memory > allocating operation, so it can fail, which'd leak the file. I remember going to pains to get this right with my own unifiable BufFile concept. I'm going to wait for an my question about file deletion + resowners before commenting further, though. > + if (vfdP->fdstate & FD_TEMP_FILE_LIMIT) > + { > + /* Subtract its size from current usage (do first in case of > error) */ > + temporary_files_size -= vfdP->fileSize; > + vfdP->fileSize = 0; > + } > > So, is it right to do so unconditionally and without regard for errors? > If the file isn't deleted, it shouldn't be subtracted from fileSize. I > guess you're managing that through the flag, but that's not entirely > obvious. I think that the problem here is that the accounting is expected to always work. It's not like there is a resowner style error path in which temporary_files_size gets reset to 0. > Is that entirely unproblematic? Are there any DSM callbacks that rely on > locks still being held? Please split this part into a separate commit > with such analysis. I was always confused about the proper use of DSM callbacks myself. They seemed generally underdocumented. > + /* Create the output buffer. */ > + if (accessor->write_chunk != NULL) > + pfree(accessor->write_chunk); > + accessor->write_chunk = (SharedTuplestoreChunk *) > + palloc0(accessor->write_pages * BLCKSZ); > > Are we guaranteed to be in a long-lived memory context here? I imagine that Thomas looked to tuplestore_begin_heap() + interXact as a kind of precedent here. See comments above that function. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Hash take II
On Tue, Nov 7, 2017 at 4:01 PM, Andres Freundwrote: > + ResourceOwnerEnlargeFiles(CurrentResourceOwner); > + ResourceOwnerRememberFile(CurrentResourceOwner, file); > + VfdCache[file].resowner = CurrentResourceOwner; > > So maybe I'm being pedantic here, but wouldn't the right order be to do > ResourceOwnerEnlargeFiles() *before* creating the file? It's a memory > allocating operation, so it can fail, which'd leak the file. That's not pedantic ... that's a very sound criticism. IMHO, anyway. > diff --git a/src/backend/utils/resowner/resowner.c > b/src/backend/utils/resowner/resowner.c > index 4c35ccf65eb..8b91d5a6ebe 100644 > --- a/src/backend/utils/resowner/resowner.c > +++ b/src/backend/utils/resowner/resowner.c > @@ -528,16 +528,6 @@ ResourceOwnerReleaseInternal(ResourceOwner owner, > PrintRelCacheLeakWarning(res); > RelationClose(res); > } > - > - /* Ditto for dynamic shared memory segments */ > - while (ResourceArrayGetAny(&(owner->dsmarr), )) > - { > - dsm_segment *res = (dsm_segment *) > DatumGetPointer(foundres); > - > - if (isCommit) > - PrintDSMLeakWarning(res); > - dsm_detach(res); > - } > } > else if (phase == RESOURCE_RELEASE_LOCKS) > { > @@ -654,6 +644,16 @@ ResourceOwnerReleaseInternal(ResourceOwner owner, > PrintFileLeakWarning(res); > FileClose(res); > } > + > + /* Ditto for dynamic shared memory segments */ > + while (ResourceArrayGetAny(&(owner->dsmarr), )) > + { > + dsm_segment *res = (dsm_segment *) > DatumGetPointer(foundres); > + > + if (isCommit) > + PrintDSMLeakWarning(res); > + dsm_detach(res); > + } > } > > Is that entirely unproblematic? Are there any DSM callbacks that rely on > locks still being held? Please split this part into a separate commit > with such analysis. FWIW, I think this change is a really good idea (I recommended it to Thomas at some stage, I think). The current positioning was decided by me at a very early stage of parallel query development where I reasoned as follows (1) the first thing we're going to implement is going to be parallel quicksort, (2) that's going to allocate a huge amount of DSM, (3) therefore we should try to free it as early as possible. However, I now thing that was wrongheaded, and not just because parallel quicksort didn't turn out to be the first thing we developed. Memory is the very last resource we should release when aborting a transaction, because any other resource we have is tracked using data structures that are stored in memory. Throwing the memory away before the end therefore makes life very difficult. That's why, for backend-private memory, we clean up most everything else in AbortTransaction() and then only destroy memory contexts in CleanupTransaction(). This change doesn't go as far, but it's in the same general direction, and I think rightly so. My error was in thinking that the primary use of memory would be for storing data, but really it's about where you put your control structures. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Mon, Nov 6, 2017 at 9:57 PM, Amit Kapilawrote: >> Well, I suppose that test will fire for a baserel when the total >> number of baserels is at least 3 and there's no inheritance involved. >> But if there are 2 baserels, we're still not the topmost scan/join >> target. > > No, if there are 2 baserels, then simple_rel_array_size will be 3. > The simple_rel_array_size is always the "number of relations" plus > "one". See setup_simple_rel_arrays. Oh, wow. That's surprising. >> Also, even if inheritance is used, we might still be the >> topmost scan/join target. > > Sure, but in that case, it won't generate the gather path here (due to > this part of check "!root->append_rel_list"). I am not sure whether I > have understood the second part of your question, so if my answer > appears inadequate, then can you provide more details on what you are > concerned about? I don't know why the question of why root->append_rel_list is empty is the relevant thing here for deciding whether to generate gather paths at this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Hash take II
Hi, Here's a review of v24 +set min_parallel_table_scan_size = 0; +set parallel_setup_cost = 0; +-- Make a simple relation with well distributed keys and correctly +-- estimated size. +create table simple as + select generate_series(1, 2) AS id, 'aa'; +alter table simple set (parallel_workers = 2); +analyze simple; +-- Make a relation whose size we will under-estimate. We want stats +-- to say 1000 rows, but actually there are 20,000 rows. +create table bigger_than_it_looks as + select generate_series(1, 2) as id, 'aa'; +alter table bigger_than_it_looks set (autovacuum_enabled = 'false'); +alter table bigger_than_it_looks set (parallel_workers = 2); +delete from bigger_than_it_looks where id <= 19000; +vacuum bigger_than_it_looks; +analyze bigger_than_it_looks; +insert into bigger_than_it_looks + select generate_series(1, 19000) as id, 'aa'; It seems kinda easier to just manipulate ndistinct and reltuples... +set max_parallel_workers_per_gather = 0; +set work_mem = '4MB'; I hope there's a fair amount of slop here - with different archs you're going to see quite some size differences. +-- The "good" case: batches required, but we plan the right number; we +-- plan for 16 batches, and we stick to that number, and peak memory +-- usage says within our work_mem budget +-- non-parallel +set max_parallel_workers_per_gather = 0; +set work_mem = '128kB'; So how do we know that's actually the case we're testing rather than something arbitrarily different? There's IIRC tests somewhere that just filter the json explain output to the right parts... +/* + * Build the name for a given segment of a given BufFile. + */ +static void +MakeSharedSegmentName(char *name, const char *buffile_name, int segment) +{ + snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment); +} Not a fan of this name - you're not "making" a filename here (as in allocating or such). I think I'd just remove the Make prefix. +/* + * Open a file that was previously created in another backend with + * BufFileCreateShared in the same SharedFileSet using the same name. The + * backend that created the file must have called BufFileClose() or + * BufFileExport() to make sure that it is ready to be opened by other + * backends and render it read-only. + */ Is it actually guaranteed that it's another backend / do we rely on that? +BufFile * +BufFileOpenShared(SharedFileSet *fileset, const char *name) +{ + /* +* If we didn't find any files at all, then no BufFile exists with this +* tag. +*/ + if (nfiles == 0) + return NULL; s/taag/name/? +/* + * Delete a BufFile that was created by BufFileCreateShared in the given + * SharedFileSet using the given name. + * + * It is not necessary to delete files explicitly with this function. It is + * provided only as a way to delete files proactively, rather than waiting for + * the SharedFileSet to be cleaned up. + * + * Only one backend should attempt to delete a given name, and should know + * that it exists and has been exported or closed. + */ +void +BufFileDeleteShared(SharedFileSet *fileset, const char *name) +{ + charsegment_name[MAXPGPATH]; + int segment = 0; + boolfound = false; + + /* +* We don't know how many segments the file has. We'll keep deleting +* until we run out. If we don't manage to find even an initial segment, +* raise an error. +*/ + for (;;) + { + MakeSharedSegmentName(segment_name, name, segment); + if (!SharedFileSetDelete(fileset, segment_name, true)) + break; + found = true; + ++segment; + } Hm. Do we properly delete all the files via the resowner mechanism if this fails midway? I.e. if there are no leading segments? Also wonder if this doesn't need a CFI check. +void +PathNameCreateTemporaryDir(const char *basedir, const char *directory) +{ + if (mkdir(directory, S_IRWXU) < 0) + { + if (errno == EEXIST) + return; + + /* +* Failed. Try to create basedir first in case it's missing. Tolerate +* ENOENT to close a race against another process following the same +* algorithm. +*/ + if (mkdir(basedir, S_IRWXU) < 0 && errno != ENOENT) + elog(ERROR, "cannot create temporary directory \"%s\": %m", +basedir); ENOENT or EEXIST? +File +PathNameCreateTemporaryFile(const char *path, bool error_on_failure) +{ + Filefile; + + /* +* Open the file. Note: we don't use O_EXCL, in case there is an orphaned +* temp file that can be reused. +*/ + file = PathNameOpenFile(path,
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On Mon, Nov 6, 2017 at 5:19 AM, Ashutosh Bapatwrote: > IIRC, only thing that changes between plan time quals and execution > time quals is constaint folding of constant parameters. But I don't > think we change the selectivity estimates when that's done. At the > same time, I don't think we should make a lot of effort to make sure > that the order used during the estimation is same as the order at the > execution; we are anyway estimating. There can always be some > difference between what's estimated and what actually happens. I don't think that's a good justification for allowing the order to vary. It's true that, at execution time, we may find row counts and selectivities to be different than what we predicted, but that is a case of the real data being different from our idealized data -- which is difficult to avoid in general. Allowing our actual decisions to be different from the decisions we thought we would make seems like programmer sloppiness. It would also be very confusing if the planner uses one ordering to estimate the cost and then a different order at execution time and in the EXPLAIN ANALYZE output. How could anybody understand what had happened in such a case? I think it would be a good idea, as Thomas says, to order the qual clauses at an earlier stage and then remember our decision. However, we have to think about whether that's going to increase planning time in a noticeable way. I wonder why we currently postpone this until such a late phase of planning. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawadawrote: >>> I suggest that a good thing to do more or less immediately, regardless >>> of when this patch ends up being ready, would be to insert an >>> insertion that LockAcquire() is never called while holding a lock of >>> one of these types. If that assertion ever fails, then the whole >>> theory that these lock types don't need deadlock detection is wrong, >>> and we'd like to find out about that sooner or later. >> >> I understood. I'll check that first. > > I've checked whether LockAcquire is called while holding a lock of one > of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, > LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that > we cannot move these four lock types together out of heavy-weight > lock, but can move only the relation extension lock with tricks. > > Here is detail of the survey. Thanks for these details, but I'm not sure I fully understand. > * LOCKTAG_RELATION_EXTENSION > There is a path that LockRelationForExtension() could be called while > holding another relation extension lock. In brin_getinsertbuffer(), we > acquire a relation extension lock for a index relation and could > initialize a new buffer (brin_initailize_empty_new_buffer()). During > initializing a new buffer, we call RecordPageWithFreeSpace() which > eventually can call fsm_readbuf(rel, addr, true) where the third > argument is "extend". We can process this problem by having the list > (or local hash) of acquired locks and skip acquiring the lock if > already had. For other call paths calling LockRelationForExtension, I > don't see any problem. Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock? Basically, what matters here in the end is whether we can articulate a deadlock-proof rule around the order in which these locks are acquired. The simplest such rule would be "you can only acquire one lock of any of these types at a time, and you can't subsequently acquire a heavyweight lock". But a more complicated rule would be OK too, e.g. "you can acquire as many heavyweight locks as you want, and after that you can optionally acquire one page, tuple, or speculative token lock, and after that you can acquire a relation extension lock". The latter rule, although more complex, is still deadlock-proof, because the heavyweight locks still use the deadlock detector, and the rest has a consistent order of lock acquisition that precludes one backend taking A then B while another backend takes B then A. I'm not entirely clear whether your survey leads us to a place where we can articulate such a deadlock-proof rule. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
Hi, On 2017-11-06 10:56:43 +0530, Amit Kapila wrote: > On Sun, Nov 5, 2017 at 6:54 AM, Andres Freundwrote > > On 2017-11-05 01:05:59 +0100, Robert Haas wrote: > >> skip-gather-project-v1.patch does what it says on the tin. I still > >> don't have a test case for this, and I didn't find that it helped very > >> much, > > I am also wondering in which case it can help and I can't think of the > case. I'm confused? Isn't it fairly obvious that unnecessarily projecting at the gather node is wasteful? Obviously depending on the query you'll see smaller / bigger gains, but that there's beenfdits should be fairly obvious? > Basically, as part of projection in the gather, I think we are just > deforming the tuple which we anyway need to perform before sending the > tuple to the client (printtup) or probably at the upper level of the > node. But in most cases you're not going to print millions of tuples, instead you're going to apply some further operators ontop (e.g. the OFFSET/LIMIT in my example). > >> and you said this looked like a big bottleneck in your > >> testing, so here you go. > Is it possible that it shows the bottleneck only for 'explain analyze' > statement as we don't deform the tuple for that at a later stage? Doesn't matter, there's a OFFSET/LIMIT ontop of the query. Could just as well be a sort node or something. > > The query where that showed a big benefit was > > > > SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 10 LIMIT 1; > > > > (i.e a not very selective filter, and then just throwing the results away) > > > > still shows quite massive benefits: > > Do you see the benefit if the query is executed without using Explain Analyze? Yes. Before: tpch_5[11878][1]=# SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 10 LIMIT 1;^[[A ... Time: 7590.196 ms (00:07.590) After: Time: 3862.955 ms (00:03.863) Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Hi Michael, On Mon, Nov 6, 2017 at 6:18 PM, Michael Paquierwrote: > Did you really test WAL replay? Is there a way to test this other than installcheck-world? The only failure we've run into at the moment is in the snapshot-too-old tests. Maybe we're not configuring with all the tests enabled? >> The assertion added caught at least one code path where TestForOldSnapshot >> calls PageGetLSN without holding any lock. The snapshot_too_old test in >> "check-world" failed due to the assertion failure. This needs to be fixed, >> see the open question in the opening mail on this thread. > > Good point. I am looping Kevin Grittner here for his input, as the > author and committer of old_snapshot_threshold. Things can be > addressed with a separate patch by roughly moving the check on > PageGetLSN() to TestForOldSnapshot_impl() and use BufferGetLSNAtomic() > instead. It still doesn't pass the tests, as not all code paths ensure that a content lock is held before calling TestForOldSnapshot. BufferGetLSNAtomic only adds the spinlock. > The commit fest has lost view of this entry, and so did I. So I have > added a new entry: > https://commitfest.postgresql.org/16/1371/ Thank you! > BufferGetLSNAtomic() could really use LWLockHeldByMe(). Could you > consider it with an extra patch on top of 0001? The LWLockHeldByMe() calls are added to BufferGetLSNAtomic in patch 0002 (because it does all its work through PageGetLSN). > It seems to me that 0001 is good for a committer lookup, that will get > rid of all existing bugs. For 0002, what you are proposing is still > not a good idea for anything using page copies. I think there is still significant confusion here. To be clear: this patch is intended to add no changes for local page copies. As I've tried to say (three or four times now): to our understanding, local page copies are not allocated in the shared BufferBlocks region and are therefore not subject to the new assertions. Am I missing a corner case, or completely misunderstanding your point? I never got a direct response to my earlier questions on this. --Jacob -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove secondary checkpoint
On 31 October 2017 at 12:01, Michael Paquierwrote: > On Tue, Oct 31, 2017 at 2:00 PM, Simon Riggs wrote: >> On 30 October 2017 at 18:58, Simon Riggs wrote: >>> On 30 October 2017 at 15:22, Simon Riggs wrote: >>> > You forgot to update this formula in xlog.c: > distance = (2.0 + CheckPointCompletionTarget) * > CheckPointDistanceEstimate; > /* add 10% for good measure. */ > distance *= 1.10; > You can guess easily where the mistake is. Doh - fixed that before posting, so I must have sent the wrong patch. It's the 10%, right? ;-) >>> >>> So, there are 2 locations that mention 2.0 in xlog.c >>> >>> I had already fixed one, which is why I remembered editing it. >>> >>> The other one you mention has a detailed comment above it explaining >>> why it should be 2.0 rather than 1.0, so I left it. >>> >>> Let me know if you still think it should be removed? I can see the >>> argument both ways. >>> Or maybe we need another patch to account for manual checkpoints. >> >> Revised patch to implement this > > Here is the comment and the formula: > * The reason this calculation is done from the prior > checkpoint, not the > * one that just finished, is that this behaves better if some > checkpoint > * cycles are abnormally short, like if you perform a manual > checkpoint > * right after a timed one. The manual checkpoint will make > almost a full > * cycle's worth of WAL segments available for recycling, because the > * segments from the prior's prior, fully-sized checkpoint cycle are > no > * longer needed. However, the next checkpoint will make only > few segments > * available for recycling, the ones generated between the timed > * checkpoint and the manual one right after that. If at the manual > * checkpoint we only retained enough segments to get us to > the next timed > * one, and removed the rest, then at the next checkpoint we would not > * have enough segments around for recycling, to get us to the > checkpoint > * after that. Basing the calculations on the distance from > the prior redo > * pointer largely fixes that problem. > */ > distance = (2.0 + CheckPointCompletionTarget) * > CheckPointDistanceEstimate; > > While the mention about a manual checkpoint happening after a timed > one will cause a full range of WAL segments to be recycled, it is not > actually true that segments of the prior's prior checkpoint are not > needed, because with your patch the segments of the prior checkpoint > are getting recycled. So it seems to me that based on that the formula > ought to use 1.0 instead of 2.0... I think the argument in the comment is right, in that CheckPointDistanceEstimate is better if we use multiple checkpoint cycles. But the implementation of that is bogus and multiplying by 2.0 wouldn't make it better if CheckPointDistanceEstimate is wrong. So I will change to 1.0 as you say and rewrite the comment. Thanks for your review. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix bloom WAL tap test
Hi! On Tue, Nov 7, 2017 at 4:34 PM, Masahiko Sawadawrote: > I understood the necessity of this patch and reviewed two patches. > Good, thank you. > For /fix-bloom-wal-check.patch, it looks good to me. I found no > problem. But for wal-check-on-bloom-check.patch, if you want to run > wal-check even when running 'make check' or 'make check-world', we can > just add wal-check test as follows? > > diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile > index 13bd397..c1566d4 100644 > --- a/contrib/bloom/Makefile > +++ b/contrib/bloom/Makefile > @@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global > include $(top_srcdir)/contrib/contrib-global.mk > endif > > +check: wal-check > + > wal-check: temp-install > $(prove_check) > I've tried this version Makefile. And I've seen the only difference: when tap tests are enabled, this version of Makefile runs tap tests before regression tests. While my version of Makefile runs tap tests after regression tests. That seems like more desirable behavior for me. But it would be sill nice to simplify Makefile. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Fix bloom WAL tap test
On Tue, Nov 7, 2017 at 4:26 PM, Fabrízio Mellowrote: > The following review has been posted through the commitfest application: > make installcheck-world: not tested > Implements feature: not tested > Spec compliant: not tested > Documentation:not tested > > The patch doesn't apply against master: > > fabrizio@macanudo:/d/postgresql (master) > $ git apply /tmp/wal-check-on-bloom-check.patch > error: contrib/bloom/Makefile: already exists in working directory > Apparently I sent patches whose are ok for "patch -p1", but not ok for "git apply". Sorry for that. I resubmit both of them. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company fix-bloom-wal-check-2.patch Description: Binary data wal-check-on-bloom-check-2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exclude pg_internal.init from base backup
On 11/7/17 11:03 AM, Simon Riggs wrote: > On 5 November 2017 at 11:55, Magnus Haganderwrote: >> >> So +1 for documenting the difference in how these are handled, as this is >> important to know for somebody writing an external tool for it. > > Changes made, moving to commit the attached patch. Thanks, Simon! This was on my to do list today -- glad I checked my email first. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 6 November 2017 at 17:35, Simon Riggswrote: > I read that step 3 in Approach2 is some kind of problem in MVCC > semantics. My understanding is that SQL Standard allows us to define > what the semantics of the statement are in relation to concurrency, so > any semantic issue can be handled by defining it to work the way we > want. The semantics are: > a) when a unique index is available we avoid errors by using semantics > of INSERT .. ON CONFLICT UPDATE. > b) when a unique index is not available we use other semantics. I'm obviously being obtuse. If a unique index is not available, then surely there won't _be_ a failure? The INSERT (or indeed UPDATE) that results in two similar records will simply happen, and you will end up with two records the same. That's OK, based on the semantics of MERGE, no? At the transaction-start INSERT was the correct thing to do. Geoff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exclude pg_internal.init from base backup
On 5 November 2017 at 11:55, Magnus Haganderwrote: > On Sat, Nov 4, 2017 at 4:04 AM, Michael Paquier > wrote: >> >> On Fri, Nov 3, 2017 at 4:04 PM, Petr Jelinek >> wrote: >> > Not specific problem to this patch, but I wonder if it should be made >> > more clear that those files (there are couple above of what you added) >> > are skipped no matter which directory they reside in. >> >> Agreed, it is a good idea to tell in the docs how this behaves. We >> could always change things so as the comparison is based on the full >> path like what is done for pg_control, but that does not seem worth >> complicating the code. > > > pg_internal.init can, and do, appear in multiple different directories. > pg_control is always in the same place. So they're not the same thing. > > So +1 for documenting the difference in how these are handled, as this is > important to know for somebody writing an external tool for it. Changes made, moving to commit the attached patch. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services pg_basebackup-exclusion-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional logging for VACUUM and ANALYZE
On Tue, Nov 7, 2017 at 1:35 PM, Bossart, Nathanwrote: > > On 11/7/17, 9:13 AM, "Fabrízio Mello" wrote: > >> int save_nestlevel; > >> + boolrel_lock; > >> > > > > Just remove the additional tab indentation before rel_lock variable. > > I've removed the extra tab in v4. > Great. Changed status to ready for commiter. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello
Re: [HACKERS] Small improvement to compactify_tuples
On 2017-11-07 12:12:02 -0300, Claudio Freire wrote: > If you need it. I'm not particularly fond of writing code before it's needed. +1 > Otherwise, if it's a rarely-encountered corner case, I'd recommend > simply calling the stdlib's qsort. FWIW, we always map qsort onto our own implementation: #define qsort(a,b,c,d) pg_qsort(a,b,c,d) Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
On Tue, Nov 7, 2017 at 11:42 AM, Юрий Соколовwrote: > > > 2017-11-07 17:15 GMT+03:00 Claudio Freire : >> Aside from requiring all that include magic, if you place specialized >> sort functions in a reusable header, using it is as simple as >> including the type-specific header (or declaring the type macros and >> including the template), and using them as regular functions. There's >> no runtime overhead involved, especially if you declare the comparison >> function as a macro or a static inline function. The sort itself can >> be declared static inline as well, and the compiler will decide >> whether it's worth inlining. > > Ok, if no one will complain against another one qsort implementation, > I will add template header for qsort. Since qsort needs insertion sort, > it will be in a same file. > Do you approve of this? > > With regards, > Sokolov Yura If you need it. I'm not particularly fond of writing code before it's needed. If you can measure the impact for that particular case where qsort might be needed, and it's a real-world case, then by all means. Otherwise, if it's a rarely-encountered corner case, I'd recommend simply calling the stdlib's qsort. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional logging for VACUUM and ANALYZE
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed > int save_nestlevel; > + boolrel_lock; > Just remove the additional tab indentation before rel_lock variable. The rest looks good to me. Regards, Fabrízio Mello The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
2017-11-07 17:15 GMT+03:00 Claudio Freire: > > On Mon, Nov 6, 2017 at 9:08 PM, Юрий Соколов wrote: > > 2017-11-07 1:14 GMT+03:00 Claudio Freire : > >> > >> I haven't seen this trick used in postgres, nor do I know whether it > >> would be well received, so this is more like throwing an idea to see > >> if it sticks... > >> > >> But a way to do this without macros is to have an includable > >> "template" algorithm that simply doesn't define the comparison > >> function/type, it rather assumes it: > >> > >> qsort_template.h > >> > >> #define QSORT_NAME qsort_ ## QSORT_SUFFIX > >> > >> static void QSORT_NAME(ELEM_TYPE arr, size_t num_elems) > >> { > >> ... if (ELEM_LESS(arr[a], arr[b])) > >> ... > >> } > >> > >> #undef QSORT_NAME > >> > >> Then, in "offset_qsort.h": > >> > >> #define QSORT_SUFFIX offset > >> #define ELEM_TYPE offset > >> #define ELEM_LESS(a,b) ((a) < (b)) > >> > >> #include "qsort_template.h" > >> > >> #undef QSORT_SUFFIX > >> #undef ELEM_TYPE > >> #undef ELEM_LESS > >> > >> Now, I realize this may have its cons, but it does simplify > >> maintainance of type-specific or parameterized variants of > >> performance-critical functions. > >> > >> > I can do specialized qsort for this case. But it will be larger bunch of > >> > code, than > >> > shell sort. > >> > > >> >> And I'd recommend doing that when there is a need, and I don't think > >> >> this patch really needs it, since bucket sort handles most cases > >> >> anyway. > >> > > >> > And it still needs insertion sort for buckets. > >> > I can agree to get rid of shell sort. But insertion sort is necessary. > >> > >> I didn't suggest getting rid of insertion sort. But the trick above is > >> equally applicable to insertion sort. > > > > This trick is used in simplehash.h . I agree, it could be useful for qsort. > > This will not make qsort inlineable, but will reduce overhead much. > > > > This trick is too heavy-weight for insertion sort alone, though. Without > > shellsort, insertion sort could be expressed in 14 line macros ( 8 lines > > without curly braces). But if insertion sort will be defined together with > > qsort (because qsort still needs it), then it is justifiable. > > What do you mean by heavy-weight? I mean, I've already made reusable sort implementation with macros that is called like a function (with type parameter). If we are talking only about insertion sort, then such macros looks much prettier than including file. But qsort is better implemented with included template-header. BTW, there is example of defining many functions with call to template macro instead of including template header: https://github.com/attractivechaos/klib/blob/master/khash.h But it looks ugly. > > Aside from requiring all that include magic, if you place specialized > sort functions in a reusable header, using it is as simple as > including the type-specific header (or declaring the type macros and > including the template), and using them as regular functions. There's > no runtime overhead involved, especially if you declare the comparison > function as a macro or a static inline function. The sort itself can > be declared static inline as well, and the compiler will decide > whether it's worth inlining. Ok, if no one will complain against another one qsort implementation, I will add template header for qsort. Since qsort needs insertion sort, it will be in a same file. Do you approve of this? With regards, Sokolov Yura
Re: [HACKERS] Small improvement to compactify_tuples
On Mon, Nov 6, 2017 at 9:08 PM, Юрий Соколовwrote: > 2017-11-07 1:14 GMT+03:00 Claudio Freire : >> >> I haven't seen this trick used in postgres, nor do I know whether it >> would be well received, so this is more like throwing an idea to see >> if it sticks... >> >> But a way to do this without macros is to have an includable >> "template" algorithm that simply doesn't define the comparison >> function/type, it rather assumes it: >> >> qsort_template.h >> >> #define QSORT_NAME qsort_ ## QSORT_SUFFIX >> >> static void QSORT_NAME(ELEM_TYPE arr, size_t num_elems) >> { >> ... if (ELEM_LESS(arr[a], arr[b])) >> ... >> } >> >> #undef QSORT_NAME >> >> Then, in "offset_qsort.h": >> >> #define QSORT_SUFFIX offset >> #define ELEM_TYPE offset >> #define ELEM_LESS(a,b) ((a) < (b)) >> >> #include "qsort_template.h" >> >> #undef QSORT_SUFFIX >> #undef ELEM_TYPE >> #undef ELEM_LESS >> >> Now, I realize this may have its cons, but it does simplify >> maintainance of type-specific or parameterized variants of >> performance-critical functions. >> >> > I can do specialized qsort for this case. But it will be larger bunch of >> > code, than >> > shell sort. >> > >> >> And I'd recommend doing that when there is a need, and I don't think >> >> this patch really needs it, since bucket sort handles most cases >> >> anyway. >> > >> > And it still needs insertion sort for buckets. >> > I can agree to get rid of shell sort. But insertion sort is necessary. >> >> I didn't suggest getting rid of insertion sort. But the trick above is >> equally applicable to insertion sort. > > This trick is used in simplehash.h . I agree, it could be useful for qsort. > This will not make qsort inlineable, but will reduce overhead much. > > This trick is too heavy-weight for insertion sort alone, though. Without > shellsort, insertion sort could be expressed in 14 line macros ( 8 lines > without curly braces). But if insertion sort will be defined together with > qsort (because qsort still needs it), then it is justifiable. What do you mean by heavy-weight? Aside from requiring all that include magic, if you place specialized sort functions in a reusable header, using it is as simple as including the type-specific header (or declaring the type macros and including the template), and using them as regular functions. There's no runtime overhead involved, especially if you declare the comparison function as a macro or a static inline function. The sort itself can be declared static inline as well, and the compiler will decide whether it's worth inlining. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional logging for VACUUM and ANALYZE
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested The patch doesn't apply against master anymore: fabrizio@macanudo:/d/postgresql (master) $ git apply /tmp/new_vacuum_log_messages_v2.patch error: patch failed: doc/src/sgml/config.sgml:5899 error: doc/src/sgml/config.sgml: patch does not apply Regards, Fabrízio Mello The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix bloom WAL tap test
On Fri, Sep 29, 2017 at 10:32 PM, Alexander Korotkovwrote: > On Wed, Sep 6, 2017 at 5:06 PM, Alexander Korotkov > wrote: >> >> On Wed, Sep 6, 2017 at 4:08 PM, Alexander Korotkov >> wrote: >>> >>> I just realized that these lines of contrib/bloom/t/001_wal.pl don't >>> check that queries give same results on master and standby. They just check >>> that *return codes* of psql are equal. >>> # Run test queries and compare their result my $master_result = $node_master->psql("postgres", $queries); my $standby_result = $node_standby->psql("postgres", $queries); is($master_result, $standby_result, "$test_name: query result matches"); >>> >>> >>> Attached patch fixes this problem by using safe_psql() which returns psql >>> output instead of return code. For safety, this patch replaces psql() with >>> safe_psql() in other places too. >>> >>> I think this should be backpatched to 9.6 as bugfix. >> >> >> Also, it would be nice to call wal-check on bloom check (now wal-check >> isn't called even on check-world). >> See attached patch. My changes to Makefile could be cumbersome. Sorry >> for that, I don't have much experience with them... > > > This topic didn't receive any attention yet. Apparently, it's because of > in-progress commitfest and upcoming release. > I've registered it on upcoming commitfest as bug fix. > https://commitfest.postgresql.org/15/1313/ > I understood the necessity of this patch and reviewed two patches. For /fix-bloom-wal-check.patch, it looks good to me. I found no problem. But for wal-check-on-bloom-check.patch, if you want to run wal-check even when running 'make check' or 'make check-world', we can just add wal-check test as follows? diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile index 13bd397..c1566d4 100644 --- a/contrib/bloom/Makefile +++ b/contrib/bloom/Makefile @@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk endif +check: wal-check + wal-check: temp-install $(prove_check) Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix bloom WAL tap test
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested The patch doesn't apply against master: fabrizio@macanudo:/d/postgresql (master) $ git apply /tmp/wal-check-on-bloom-check.patch error: contrib/bloom/Makefile: already exists in working directory Regards, Fabrízio Mello The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] A hook for session start
On Tue, Nov 7, 2017 at 1:15 AM, Michael Paquierwrote: > > On Sun, Nov 5, 2017 at 3:14 AM, Fabrízio de Royes Mello > wrote: > > On Sat, Nov 4, 2017 at 1:23 AM, Michael Paquier < michael.paqu...@gmail.com> > > wrote: > >> On Fri, Nov 3, 2017 at 1:55 PM, Fabrízio de Royes Mello > >> wrote: > >> >> Passing the database name and user name does not look much useful to > >> >> me. You can have access to this data already with CurrentUserId and > >> >> MyDatabaseId. > >> > > >> > This way we don't need to convert oid to names inside hook code. > >> > >> Well, arguments of hooks are useful if they are used. Now if I look at > >> the two examples mainly proposed in this thread, be it in your set of > >> patches or the possibility to do some SQL transaction, I see nothing > >> using them. So I'd vote for keeping an interface minimal. > >> > > > > Maybe the attached patch with improved test module can illustrate better the > > feature. > > I was going to to hack something like that. That's interesting for the > use case Robert has mentioned. > > Well, in the case of the end session hook, those variables are passed > to the hook by being directly taken from the context from MyProcPort: > + (*session_end_hook) (MyProcPort->database_name, > MyProcPort->user_name); > In the case of the start hook, those are directly taken from the > command outer caller, but similarly MyProcPort is already set, so > those are strings available (your patch does so in the end session > hook)... Variables in hooks are useful if those are not available > within the memory context, and refer to a specific state where the > hook is called. For example, take the password hook, this uses the > user name and the password because those values are not available > within the session context. The same stands for other hooks as well. > Keeping the interface minimal helps in readability and maintenance. > See for the attached example that can be applied on top of 0003, which > makes use of the session context, the set regression tests does not > pass but this shows how I think those hooks had better be shaped. > Makes sense... fixed. > + (*session_end_hook) (MyProcPort->database_name, MyProcPort->user_name); > + > + /* Make don't leave any active transactions and/or locks behind */ > + AbortOutOfAnyTransaction(); > + LockReleaseAll(USER_LOCKMETHOD, true); > Let's leave this work to people actually implementing the hook contents. > Fixed. Attached new version. I unify all three patches in just one because this is a small patch and it's most part is test code. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 2c7260e..52a9641 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -169,6 +169,9 @@ static ProcSignalReason RecoveryConflictReason; static MemoryContext row_description_context = NULL; static StringInfoData row_description_buf; +/* Hook for plugins to get control at start of session */ +session_start_hook_type session_start_hook = NULL; + /* * decls for routines only used in this file * @@ -3857,6 +3860,9 @@ PostgresMain(int argc, char *argv[], if (!IsUnderPostmaster) PgStartTime = GetCurrentTimestamp(); + if (session_start_hook) + (*session_start_hook) (); + /* * POSTGRES main processing loop begins here * diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 20f1d27..ffaf51f 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -76,6 +76,8 @@ static bool ThereIsAtLeastOneRole(void); static void process_startup_options(Port *port, bool am_superuser); static void process_settings(Oid databaseid, Oid roleid); +/* Hook for plugins to get control at end of session */ +session_end_hook_type session_end_hook = NULL; /*** InitPostgres support ***/ @@ -1154,6 +1156,10 @@ ShutdownPostgres(int code, Datum arg) * them explicitly. */ LockReleaseAll(USER_LOCKMETHOD, true); + + /* Hook just normal backends */ + if (session_end_hook && MyBackendId != InvalidBackendId) + (*session_end_hook) (); } diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h index f8c535c..9f05bfb 100644 --- a/src/include/tcop/tcopprot.h +++ b/src/include/tcop/tcopprot.h @@ -35,6 +35,13 @@ extern PGDLLIMPORT const char *debug_query_string; extern int max_stack_depth; extern int PostAuthDelay; +/* Hook for plugins to get control at start and end of session */
Re: [HACKERS] [PATCH] Improve geometric types
Hello, thanks for the new patch. 0004 failed to be applied on the underneath patches. At Sun, 5 Nov 2017 15:54:19 +0100, Emre Hasegeliwrote in > > I am not sure how useful NaNs are in geometric types context, but we > > allow them, so inconsistent hypot() would be a problem. I will change > > my patches to keep pg_hypot(). > > New versions of the patches are attached with 2 additional ones. The > new versions leave pg_hypot() in place. One of the new patches > improves the test coverage. The line coverage of geo_ops.c increases > from 55% to 81%. The other one fixes -0 values to 0 on float > operators. I am not sure about performance implication of this, so > kept it separate. It may be a better idea to check this only on the > platforms that has tendency to produce -0. > > While working on the tests, I found some unreachable code and removed > it. I also found that lseg ## lseg operator returning wrong results. > It is defined as "closest point to first segment on the second > segment", but: > > > # select '[(1,2),(3,4)]'::lseg ## '[(0,0),(6,6)]'::lseg; > > ?column? > > -- > > (1,2) > > (1 row) > > I appended the fix to the patches. This is also effecting lseg ## box > operator. Mmm.. It returns (1.5, 1.5) with the 0004 patch. It is surely a point on the second operand but I'm not sure it's right that the operator returns a specific point for two parallel segments. > I also changed recently band-aided point ## lseg operator to return > the point instead of NULL when it cannot find the correct result to > avoid the operators depending on this one to crash. I'd like to put comments on 0001 and 0004 only now: - Adding [LR]DELIM_L looks good but they should be described in the comment just above. - Renaming float8_slope to slope seems no problem. - I'm not sure the change of box_construct is good but currently all callers fits the new interface (giving two points, not four coordinates). - close_lseg seems forgetting the case where the two given segments are crossing (and parallel). For example, select '(-8,-8),(1,1)'::lseg ## '(-10,0),(2,0)'::lseg; is expected to return (0,0), which is the crossing point of the two segments, but it returns (1,0) (and returned (1,1) before the patch), which is the point on the l2 closest to the closer end of l1 to l2. As mentioned above, it is difficult to dicide what is the proper result for parallel segments. I suppose that the following operation should return an invalid value as a point. select '(-1,0),(1,0)'::lseg ## '(-1,1),(1,1)'::lseg; close_lseg does the following operations in the else case of 'if (float8_...)'. If i'm not missing something, the result of the following operation is always the closer end of l2. In other words it seems a waste of cycles. | point = DirectFunctionCall2(close_ps, | PointPGetDatum(>p[closer_end2]), | LsegPGetDatum(l1)); | return DirectFunctionCall2(close_ps, point, LsegPGetDatum(l2)); - make_bound_box operates directly on the poly->boundbox. I'm afraid that such coding hinders compiers from using registers. This is a bit apart from this patch, it would be better if we could eliminate internal calls using DirectFunctionCall. reagrds, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Flexible configuration for full-text search
On Tue, 31 Oct 2017 09:47:57 +0100 Emre Hasegeliwrote: > > If we want to save this behavior, we should somehow pass a stopword > > to tsvector composition function (parsetext in ts_parse.c) for > > counter increment or increment it in another way. Currently, an > > empty lexemes array is passed as a result of LexizeExec. > > > > One of possible way to do so is something like: > > CASE polish_stopword > > WHEN MATCH THEN KEEP -- stopword counting > > ELSE polish_isspell > > END > > This would mean keeping the stopwords. What we want is > > CASE polish_stopword-- stopword counting > WHEN NO MATCH THEN polish_isspell > END > > Do you think it is possible? Hi Emre, I thought how it can be implemented. The way I see is to increment word counter in case if any chcked dictionary matched the word even without returning lexeme. Main drawback is that counter increment is implicit. -- Aleksandr Parfenov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Mon, Oct 30, 2017 at 9:00 AM, Amit Kapilawrote: > On Wed, Oct 11, 2017 at 9:24 PM, Robert Haas wrote: >> On Mon, Oct 9, 2017 at 5:56 AM, Amit Kapila wrote: >>> How about always returning false for PARAM_EXTERN? >> >> Yeah, I think that's what we should do. Let's do that first as a >> separate patch, which we might even want to back-patch, then return to >> this. >> > > Now that the PARAM_EXTERN issue is fixed, I have rebased this patch. > This patch had been switched to Ready For Committer in last CF, then > Robert had comments which I have addressed, so I think the status > should be switched back to Ready For committer. > As mentioned, changed the status of the patch in CF app. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Custom compression methods
On Thu, 2 Nov 2017 23:02:34 +0800 Craig Ringerwrote: > On 2 November 2017 at 17:41, Ildus Kurbangaliev > wrote: > > > In this patch compression methods is suitable for MAIN and EXTENDED > > storages like in current implementation in postgres. Just instead > > only of LZ4 you can specify any other compression method. > > We've had this discussion before. > > Please read the "pluggable compression support" thread. See you in a > few days ;) sorry, it's kinda long. > > https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.ga12...@alap2.anarazel.de > > IIRC there were some concerns about what happened with pg_upgrade, > with consuming precious toast bits, and a few other things. > Thank you for the link, I didn't see that thread when I looked over mailing lists. I read it briefly, and I can address few things relating to my patch. Most concerns have been related with legal issues. Actually that was the reason I did not include any new compression algorithms to my patch. Unlike that patch mine only provides syntax and is just a way to give the users use their own compression algorithms and deal with any legal issues themselves. I use only one unused bit in header (there's still one free ;), that's enough to determine that data is compressed or not. I did found out that pg_upgrade doesn't work properly with my patch, soon I will send fix for it. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall
On Thu, Oct 26, 2017 at 10:01 PM, Robert Haaswrote: > On Mon, Oct 23, 2017 at 7:36 AM, Haribabu Kommi > wrote: > > Apologies for not providing much details. > > > > pg_dumpall is used to produce the following statements for database, > > > > "Create database" (other than default database) or > > "Alter database set tablespace" for default database (if required) > > > > ACL queries related to database > > Alter database config > > Alter database role config > > > > whereas, pg_dump used to produce only "create database statement". > > How about adding a new flag --set-db-properties that doesn't produce > CREATE DATABASE but does dump the other stuff? -C would dump both > CREATE DATABASE *and* the other stuff. Then you could dump built-in > databases with --set-db-properties and others with -C. Thanks for the idea, Here I attached the patch that implements the same. The newly added option is not recommended to be used in normal cases and it is used only for upgrade utilities. In case if user issues pg_dump with --set-db-properties option along with --create or --clean options, an error is raised. Currently there is no way to throw an error in case if the dump is generated with --set-db-properties and try to restore with --clean option. To avoid this change, we may need to add additional details in the archive handler, but is it really needed? Regards, Hari Babu Fujitsu Australia pg_dump-and-pg_dumpall-database-handling-refactoring_v9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_wal_write statistics view
On Wed, Sep 27, 2017 at 6:58 PM, Haribabu Kommiwrote: > > Updated patch attached. > Patch rebased. Regards, Hari Babu Fujitsu Australia pg_stat_walwrites-statistics-view_v10.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers