Re: [HACKERS] Declarative partitioning
> On 2016/04/15 18:46, Ashutosh Bapat wrote: > > > > 3. PartitionKeyData contains KeyTypeCollInfo, whose contents can be > > obtained by calling functions exprType, exprTypemod on partexprs. Why do > we > > need to store that information as a separate member? > > There was no KeyTypeCollInfo in early days of the patch and then I found > myself doing a lot of: > > partexprs_item = list_head(key->partexprs); > for (attr in key->partattrs) > { > if (attr->attnum != 0) > { > // simple column reference, get type from attr > } > else > { > // expression, get type using exprType, etc. > partexprs_item = lnext(partexprs_item); > } > } > At least the two loops can be flattened to a single loop if we keep only expressions list with attributes being just Var nodes. exprType() etc. would then work seemlessly. > > That ended up being quite a few places (though I managed to reduce the > number of places over time). So, I created this struct which is > initialized when partition key is built (on first open of the partitioned > table). > Hmm, I am just afraid that we might end up with some code using cached information and some using exprType, exprTypmod etc. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Declarative partitioning
> >> Regarding 6, it seems to me that because Append does not have a > associated > >> relid (like scan nodes have with scanrelid). Maybe we need to either > fix > >> Append or create some enhanced version of Append which would also > support > >> dynamic pruning. > >> > > > > Right, I think, Append might store the relid of the parent table as well > as > > partitioning information at that level along-with the subplans. > > For time being, I will leave this as yet unaddressed (I am thinking about > what is reasonable to do for this also considering Robert's comment). Is > that OK? > Right now EXPLAIN of select * from t1, where t1 is partitioned table shows Append -> Seq Scan on t1 -> Seq scan on partition 1 -> seq scan on partition 2 ... which shows t1 as well as all the partitions on the same level. Users might have accepted that for inheritance hierarchy but for partitioning that can be confusing, esp. when all the error messages and documentation indicate that t1 is an empty (shell?) table. Instead showing it like Append for t1 -- (essentially show that this is Append for partitioned table t1, exact text might vary) -> Seq scan on partition 1 -> would be more readable. Similarly if we are going to collapse all the Append hierarchy, it might get even more confusing. Listing all the intermediate partitioned tables as Seq Scan on them would be confusing for the reasons mentioned above, and not listing them might make user wonder about the reasons for their disappearance. I am not sure what's the solution their. > > Some more comments > > Would it be better to declare PartitionDescData as > > { > > int nparts; > > PartitionInfo *partinfo; /* array of partition information structures. */ > > } > > I think that might be better. Will do it the way you suggest. > > > The new syntax allows CREATE TABLE to be specified as partition of an > > already partitioned table. Is it possible to do the same for CREATE > FOREIGN > > TABLE? Or that's material for v2? Similarly for ATTACH PARTITION. > > OK, I will address this in the next version. One question though: should > foreign table be only allowed to be leaf partitions (ie, no PARTITION BY > clause in CREATE FOREIGN TABLE ... PARTITION OF)? That seems a better way. Otherwise users might wonder whether we keep the partitions of a foreign table on the foreign server which won't be true. But then we allow foreign tables to have local tables as children in inheritance, so somebody from that background might find it incompatible. I think we shouldn't let the connection between partitioning and inheritance linger longer than necessary. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Declarative partitioning
On Fri, Apr 15, 2016 at 10:30 PM, Robert Haas wrote: > On Fri, Apr 15, 2016 at 5:46 AM, Ashutosh Bapat > wrote: > > Retaining the partition hierarchy would help to push-down join across > > partition hierarchy effectively. > > -1. You don't get to insert cruft into the final plan for the > convenience of the optimizer. I think the AppendPath needs to be > annotated with sufficient information to do whatever query planning > optimizations we want, and some or all of that may need to carry over > to the Append plan to allow run-time partition pruning. But I think > that flattening nests of Appends is a good optimization and we should > preserve it. If that makes the additional information that any given > Append needs to carry a bit more complex, so be it. > > I also think it's very good that Amit has kept the query planner > unchanged in this initial patch. Let's leave that work to phase two. > What I suggest we do when the time comes is invent new nodes > RangePartitionMap, ListPartitionMap, HashPartitionMap. Each contains > minimal metadata needed for tuple routing or planner transformation. > For example, RangePartitionMap can contain an array of partition > boundaries - represented as Datums - and an array of mappings, each a > Node *. The associated value can be another PartitionMap object if > there is subpartitioning in use, or an OID. range table index instead of OID to make it easy to lookup the > This can be used both for > matching up partitions for join pushdown, and also for fast tuple > routing and runtime partition pruning. > I was thinking about join pushdown (in partitioning hierarchy) of multiple tables which have similar partitioning structure for few upper levels but the entire partitioning hierarchy does not match. Pushing down the join as much possible into the partition hierarchy will help. We might end up with joins between a plain table and Append relation at the leaf level. For such join push-down it looks like we will have to construct corresponding Append hierarchy, push the joins down and then (may be) collapse it to just collect the results of joins at the leaf level. But preparing for that kind of optimization need not be part of this work. > > > 2. The new syntax allows CREATE TABLE to be specified as partition of an > > already partitioned table. Is it possible to do the same for CREATE > FOREIGN > > TABLE? Or that's material for v2? Similarly for ATTACH PARTITION. > > +1 for making CREATE FOREIGN TABLE support that also, and in version > 1. And same for ATTACH PARTITION. > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] checkpoint_flush_after documentation inconsistency
On Fri, Apr 15, 2016 at 6:56 PM, Magnus Hagander wrote: > The documentation says that the default value is 128Kb on Linux, but the > code says it's 256Kb. > > Not sure which one is correct, but the other one should be updated :) I'm > guessing it's a copy/paste mistake in the docs, but since I'm not sure I'm > not just pushing a fix. I think you're right. I also found another several small problems regarding XXX_flush_after parameters. (1) checkpoint_flush_after, backend_flush_after and bgwriter_flush_after don't exist in postgresql.conf.sample. If there is no special reason for that, it's better to add them to postgresql.conf.sample. (2) > /* see bufmgr.h: 16 on Linux, 0 otherwise */ The source code comment says that the default value of bgwriter_flush_after is 16 on Linux, but it's actually 64. The parameter which its default is 16 is backend_flush_after. This souce comment needs to be updated. (3) The config groups assigned to them look strange. The group of checkpoint_flush_after is RESOURCES_ASYNCHRONOUS, but it's documented under the section "Checkpoints". IMO, it's better to change the group to WAL_CHECKPOINTS. The group of bgwriter_flush_after is WAL_CHECKPOINTS, but it's documented under the section Background Writer. IMO, it's better to change the group to RESOURCES_BGWRITER. (4) > This parameter can only be set in the postgresql.conf file or on > the server command line. The above description should be used for a PGC_SIGHUP parameter. But it's used for backend_flush_after which is PGC_USERSET parameter. (5) bgwriter_flush_after (int) backend_flush_after (int) checkpoint_flush_after (int) In the doc, "int" should be "integer" for the sake of consistency. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default Roles
On Mon, Apr 18, 2016 at 12:05 PM, Noah Misch wrote: > On Sun, Apr 17, 2016 at 08:04:03PM -0400, Noah Misch wrote: >> On Thu, Apr 07, 2016 at 03:50:47PM -0400, Stephen Frost wrote: >> > I'm planning to continue going over the patch tomorrow morning with >> > plans to push this before the feature freeze deadline. >> >> > --- a/src/test/regress/expected/rolenames.out >> > +++ b/src/test/regress/expected/rolenames.out >> >> > +GRANT testrol0 TO pg_abc; -- error >> > +ERROR: role "pg_abc" is reserved >> > +DETAIL: Cannot GRANT roles to a reserved role. >> >> The server still accepts "ALTER ROLE testrol0 USER pg_signal_backend". It >> should block this ALTER ROLE if it blocks the corresponding GRANT. Following this logic, shouldn't CREATE ROLE USER be forbidden as well? =# create role toto1 user pg_signal_backend; CREATE ROLE =# create role toto2; CREATE ROLE =# alter role toto2 user pg_signal_backend; ALTER ROLE =# \dgS+ pg_signal_backend List of roles Role name | Attributes | Member of | Description ---+--+---+- pg_signal_backend | Cannot login | {toto1,toto2} | In short a reserved role should never be member of another role/group, as per the attached. -- Michael catalog-acl-group.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pgbench with -f and -S
For the future 9.6, scripts options are cumulatives, so -f & -S can be combined. Indeed, for the <= 9.5 it seems that some options are silently ignores, when combining -S/-f, only the last one is kept, it should be warned about. Thanks Fabien, for confirming about the missing warning. Also, by 'combined' I think you mean that both (built-in SELECTs & Custom Script) run, although the dev docs don't (yet) say anything about that. Hmmm... I think it does implicitely, with "add" and "and" in the following: From http://www.postgresql.org/docs/devel/static/pgbench.html: -b scriptname[@weight] Add the specified builtin script to the list of executed scripts. [...] Idem -f. And later at the beginning of the Notes: pgbench executes test scripts chosen randomly from a specified list. They include built-in scripts with -b and user-provided custom scripts with -f. [...] -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning
Hi, On 2016/04/15 18:46, Ashutosh Bapat wrote: > > 3. PartitionKeyData contains KeyTypeCollInfo, whose contents can be > obtained by calling functions exprType, exprTypemod on partexprs. Why do we > need to store that information as a separate member? There was no KeyTypeCollInfo in early days of the patch and then I found myself doing a lot of: partexprs_item = list_head(key->partexprs); for (attr in key->partattrs) { if (attr->attnum != 0) { // simple column reference, get type from attr } else { // expression, get type using exprType, etc. partexprs_item = lnext(partexprs_item); } } That ended up being quite a few places (though I managed to reduce the number of places over time). So, I created this struct which is initialized when partition key is built (on first open of the partitioned table). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
At Fri, 15 Apr 2016 17:36:57 +0900, Masahiko Sawada wrote in > >> How about if we do all the parsing stuff in temporary context and then copy > >> the results using TopMemoryContext? I don't think it will be a leak in > >> TopMemoryContext, because next time we try to check/assign s_s_names, it > >> will free the previous result. > > > > I agree with you. A temporary context for the parser seems > > reasonable. TopMemoryContext is created very early in main() so > > palloc on it is effectively the same with malloc. > > One problem is that only the top memory block is assumed to be > > free()'d, not pfree()'d by guc_set_extra. It makes this quite > > ugly.. > > > > Maybe we shouldn't use the extra for this purpose. > > > > Thoughts? > > > > How about if check_hook just parses parameter in > CurrentMemoryContext(i.g., T_AllocSetContext), and then the > assign_hook copies syncrep_parse_result to TopMemoryContext. > Because syncrep_parse_result is a global variable, these hooks can see it. Hmm. Somewhat uneasy but should work. The attached patch does it. > Here are some comments. > > -SyncRepUpdateConfig(void) > +SyncRepFreeConfig(SyncRepConfigData *config, bool itself, MemoryContext cxt) > > Sorry, it's my bad. itself variables is no longer needed because > SyncRepFreeConfig is called by only one function. > > -void > -SyncRepFreeConfig(SyncRepConfigData *config) > +SyncRepConfigData * > +SyncRepCopyConfig(SyncRepConfigData *oldconfig, MemoryContext targetcxt) > > I'm not sure targetcxt argument is necessary. Yes, these are just for signalling so removal doesn't harm. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index 3c9142e..3d68fb5 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -68,6 +68,7 @@ #include "storage/proc.h" #include "tcop/tcopprot.h" #include "utils/builtins.h" +#include "utils/memutils.h" #include "utils/ps_status.h" /* User-settable parameters for sync rep */ @@ -361,11 +362,6 @@ SyncRepInitConfig(void) { int priority; - /* Update the config data of synchronous replication */ - SyncRepFreeConfig(SyncRepConfig); - SyncRepConfig = NULL; - SyncRepUpdateConfig(); - /* * Determine if we are a potential sync standby and remember the result * for handling replies from standby. @@ -868,47 +864,50 @@ SyncRepUpdateSyncStandbysDefined(void) } /* - * Parse synchronous_standby_names and update the config data - * of synchronous standbys. + * Free a previously-allocated config data of synchronous replication. */ void -SyncRepUpdateConfig(void) +SyncRepFreeConfig(SyncRepConfigData *config) { - int parse_rc; - - if (!SyncStandbysDefined()) + if (!config) return; - /* - * check_synchronous_standby_names() verifies the setting value of - * synchronous_standby_names before this function is called. So - * syncrep_yyparse() must not cause an error here. - */ - syncrep_scanner_init(SyncRepStandbyNames); - parse_rc = syncrep_yyparse(); - syncrep_scanner_finish(); - - if (parse_rc != 0) - ereport(ERROR, -(errcode(ERRCODE_SYNTAX_ERROR), - errmsg_internal("synchronous_standby_names parser returned %d", - parse_rc))); - - SyncRepConfig = syncrep_parse_result; - syncrep_parse_result = NULL; + list_free_deep(config->members); + pfree(config); } /* - * Free a previously-allocated config data of synchronous replication. + * Returns a copy of a replication config data into the TopMemoryContext. */ -void -SyncRepFreeConfig(SyncRepConfigData *config) +SyncRepConfigData * +SyncRepCopyConfig(SyncRepConfigData *oldconfig) { - if (!config) - return; + MemoryContext oldcxt; + SyncRepConfigData *newconfig; + ListCell *lc; - list_free_deep(config->members); - pfree(config); + if (!oldconfig) + return NULL; + + oldcxt = MemoryContextSwitchTo(TopMemoryContext); + + newconfig = (SyncRepConfigData *) palloc(sizeof(SyncRepConfigData)); + newconfig->num_sync = oldconfig->num_sync; + newconfig->members = list_copy(oldconfig->members); + + /* + * The new members list is a combination of list cells on the new context + * and data pointed from each cell on the old context. So we explicitly + * copy the data. + */ + foreach (lc, newconfig->members) + { + lfirst(lc) = pstrdup((char *) lfirst(lc)); + } + + MemoryContextSwitchTo(oldcxt); + + return newconfig; } #ifdef USE_ASSERT_CHECKING @@ -957,6 +956,8 @@ check_synchronous_standby_names(char **newval, void **extra, GucSource source) { int parse_rc; + Assert(syncrep_parse_result == NULL); + if (*newval != NULL && (*newval)[0] != '\0') { syncrep_scanner_init(*newval); @@ -965,6 +966,7 @@ check_synchronous_standby_names(char **newval, void **extra, GucSource source) if (parse_rc != 0) { + syncrep_parse_result = NULL; GUC_check_errcode(ERRCODE_SYNTAX_ERROR); GUC_check_errdetail("synchronous_standby_names parser returned %d",
Re: [HACKERS] Declarative partitioning
Hi Ashutosh, On 2016/04/15 18:48, Ashutosh Bapat wrote: > With the new set of patches, I am getting following warnings but no > compilation failures. Patches apply smoothly. > partition.c:1216:21: warning: variable ‘form’ set but not used > [-Wunused-but-set-variable] > partition.c:1637:20: warning: variable ‘form’ set but not used > [-Wunused-but-set-variable] Ah, will fix. > For creating partition the documentation seems to suggest the syntax > create table t1_p1 partition of t1 for values {start {0} end {100} > exclusive; > but following syntax works > create table t1_p1 partition of t1 for values start (0) end (100) exclusive; Hmm, I see the following in docs: ... and partition_bound_spec is: FOR VALUES { list_spec | range_spec } ... list_spec in FOR VALUES is: { IN ( expression [, ...] ) } range_spec in FOR VALUES is: { START (lower-bound) [ INCLUSIVE | EXCLUSIVE ] | END (upper-bound) [ INCLUSIVE | EXCLUSIVE ] | START (lower-bound) [ INCLUSIVE | EXCLUSIVE ] END (upper-bound) [ INCLUSIVE | EXCLUSIVE ] } By the way, I see that you are always specifying "exclusive" for end bound in your examples. The specification is optional if that wasn't apparent. Default (ie, without explicit specification) is [start, end). >> I got rid of all the optimizer changes in the new version (except a line >> or two). I did that by switching to storing parent-child relationships in >> pg_inherits so that all the existing optimizer code for inheritance sets >> works unchanged. Also, the implementation detail that required to put >> only leaf partitions in the append list is also gone. Previously no >> storage was allocated for partitioned tables (either root or any of the >> internal partitions), so it was being done that way. >> > > With this set of patches we are still flattening all the partitions. > postgres=# create table t1 (a int, b int) partition by range(a); > CREATE TABLE > postgres=# create table t1_p1 partition of t1 for values start (0) end > (100) exclusive partition by range(b); > CREATE TABLE > postgres=# create table t1_p1_p1 partition of t1_p1 for values start (0) > end (100) exclusive; > CREATE TABLE > postgres=# explain verbose select * from t1; >QUERY PLAN > - > Append (cost=0.00..32.60 rows=2262 width=8) >-> Seq Scan on public.t1 (cost=0.00..0.00 rows=1 width=8) > Output: t1.a, t1.b >-> Seq Scan on public.t1_p1 (cost=0.00..0.00 rows=1 width=8) > Output: t1_p1.a, t1_p1.b >-> Seq Scan on public.t1_p1_p1 (cost=0.00..32.60 rows=2260 width=8) > Output: t1_p1_p1.a, t1_p1_p1.b > (7 rows) > Retaining the partition hierarchy would help to push-down join across > partition hierarchy effectively. Yes, it's still a flattened append (basically what optimizer currently creates for arbitrary inheritance trees). I didn't modify any of that yet. >> Regarding 6, it seems to me that because Append does not have a associated >> relid (like scan nodes have with scanrelid). Maybe we need to either fix >> Append or create some enhanced version of Append which would also support >> dynamic pruning. >> > > Right, I think, Append might store the relid of the parent table as well as > partitioning information at that level along-with the subplans. For time being, I will leave this as yet unaddressed (I am thinking about what is reasonable to do for this also considering Robert's comment). Is that OK? > Some more comments > Would it be better to declare PartitionDescData as > { > int nparts; > PartitionInfo *partinfo; /* array of partition information structures. */ > } I think that might be better. Will do it the way you suggest. > The new syntax allows CREATE TABLE to be specified as partition of an > already partitioned table. Is it possible to do the same for CREATE FOREIGN > TABLE? Or that's material for v2? Similarly for ATTACH PARTITION. OK, I will address this in the next version. One question though: should foreign table be only allowed to be leaf partitions (ie, no PARTITION BY clause in CREATE FOREIGN TABLE ... PARTITION OF)? By the way, ATTACH PARTITION only allows leaf partitions anyway. Thanks a lot for the review! Thanks, Amt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusing comment in pg_upgrade in regards to VACUUM FREEZE
On 18 April 2016 at 12:11, David Rowley wrote: > this is not helping me much as I don't really understand why > pg_statistic need to be frozen? > Yeah, in particular it's not clear to me why pg_upgrade goes to such efforts to freeze everything when it copies pg_clog over in copy_clog_xlog_xid() . Is it mainly a defense against the multixact format change? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.
On Mon, Apr 18, 2016 at 12:31 PM, Peter Eisentraut wrote: > I don't know if it's worth tracking this as an open item and thus > kind-of release blocker if no one else has the problem. But I > definitely still have it. My initial suspicion was that this had > something to do with a partial upgrade to gcc 6 (not yet released), in > other words a messed up system. But I was able to reproduce it in a > freshly installed chroot. It only happens with various versions of gcc, > but not with clang. So I'm going to have to keep digging. gcc is moving slowly but surely to have 6.0 in a released state btw: https://gcc.gnu.org/ml/gcc/2016-04/msg00103.html -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: fix typo, duplicated word in indexam.sgml
On Tue, Apr 5, 2016 at 12:11 AM, Artur Zakirov wrote: > Hello, > > There is a duplicated word in indexam.sgml. The patch is attached. Applied. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
At Sat, 16 Apr 2016 12:50:30 +0530, Amit Kapila wrote in > On Fri, Apr 15, 2016 at 11:30 AM, Kyotaro HORIGUCHI < > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > > At Fri, 15 Apr 2016 08:52:56 +0530, Amit Kapila > wrote : > > > > > > How about if we do all the parsing stuff in temporary context and then > copy > > > the results using TopMemoryContext? I don't think it will be a leak in > > > TopMemoryContext, because next time we try to check/assign s_s_names, it > > > will free the previous result. > > > > I agree with you. A temporary context for the parser seems > > reasonable. TopMemoryContext is created very early in main() so > > palloc on it is effectively the same with malloc. > > > > One problem is that only the top memory block is assumed to be > > free()'d, not pfree()'d by guc_set_extra. It makes this quite > > ugly.. > > > > + newconfig = (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData)); > Is there a reason to use malloc here, can't we use palloc directly? The reason is the memory block is to released using free() in guc_extra_field (not guc_set_extra). Even if we allocate and deallocate it using palloc/pfree, the 'extra' pointer to the block in gconf cannot be NULLed there and guc_extra_field tries freeing it again using free() then bang. > Also > for both the functions SyncRepCopyConfig() and SyncRepFreeConfig(), if we > directly use TopMemoryContext inside the function (if required) rather than > taking it as argument, then it will simplify the code a lot. Either is fine. I placed the parameter in order to emphasize where the memory block is placed on, other than current memory context nor bare heap, rather than for some practical reasons. > +SyncRepFreeConfig(SyncRepConfigData *config, bool itself, MemoryContext > cxt) > > Do we really need 'bool itself' parameter in above function? > > + if (cxt) > > + oldcxt = MemoryContextSwitchTo(cxt); > > + list_free_deep(config->members); > > + > > + if(oldcxt) > > + MemoryContextSwitchTo(oldcxt); > Why do you need MemoryContextSwitchTo for freeing members? Ah, sorry. It's just a slip of my fingers. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Confusing comment in pg_upgrade in regards to VACUUM FREEZE
I'd been wondering why a VACUUM FREEZE is performed during pg_upgrade, so I opened up the code to see if that would help me determine why this is. I'm confronted with a comment, which states; /* * We do freeze after analyze so pg_statistic is also frozen. template0 is * not frozen here, but data rows were frozen by initdb, and we set its * datfrozenxid, relfrozenxids, and relminmxid later to match the new xid * counter later. */ this is not helping me much as I don't really understand why pg_statistic need to be frozen? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix of doc for synchronous_standby_names.
Hello, now the synchronous_standby_names can teach to ensure more then one synchronous standbys. But the doc for it seems assuming only one synchronous standby. > There is no mechanism to enforce uniqueness. In case of > duplicates one of the matching standbys will be considered as > higher priority, though exactly which one is indeterminate. The patch attatched edits the above to the following. > There is no mechanism to enforce uniqueness. In case of > duplicates some of the matching standbys will be considered as > higher priority, though they are chosen in an indeterminate way. Is this makes sense? regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index f9ba148..7b2edbe 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3028,9 +3028,9 @@ include_dir 'conf.d' The name of a standby server for this purpose is the application_name setting of the standby, as set in the primary_conninfo of the standby's WAL receiver. There is -no mechanism to enforce uniqueness. In case of duplicates one of the -matching standbys will be considered as higher priority, though -exactly which one is indeterminate. +no mechanism to enforce uniqueness. In case of duplicates some of the +matching standbys will be considered as higher priority, though they +are chosen in an indeterminate way. The special entry * matches any application_name, including the default application name of walreceiver. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapshot too old, configured by time
Tom Lane wrote: > After struggling with back-patching a GIN bug fix, I wish to offer up the > considered opinion that this was an impressively bad idea. It's inserted > 450 or so pain points for back-patching, which we will have to deal with > for the next five years. Moreover, I do not believe that it will do a > damn thing for ensuring that future calls of BufferGetPage think about > what to do; they'll most likely be copied-and-pasted from nearby calls, > just as people have always done. With luck, the nearby calls will have > the right semantics, but this change won't help very much at all if they > don't. I disagree. A developer that sees an unadorned BufferGetPage() call doesn't stop to think twice about whether they need to add a snapshot test. Many reviewers will miss the necessary addition also. A developer that sees BufferGetPage(NO_SNAPSHOT_TEST) will at least consider the idea that the flag might be right; if that developer doesn't think about it, some reviewer may notice a new call with the flag and consider the idea that the flag may be wrong. I understand the backpatching pain argument, but my opinion was the contrary of yours even so. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.
On 04/15/2016 02:04 PM, Robert Haas wrote: > Peter, are you going to look into this further? This is on the open > items list, but there seems to be nothing that can be done about it by > anyone other than, maybe, you. I don't know if it's worth tracking this as an open item and thus kind-of release blocker if no one else has the problem. But I definitely still have it. My initial suspicion was that this had something to do with a partial upgrade to gcc 6 (not yet released), in other words a messed up system. But I was able to reproduce it in a freshly installed chroot. It only happens with various versions of gcc, but not with clang. So I'm going to have to keep digging. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default Roles
On Sun, Apr 17, 2016 at 08:04:03PM -0400, Noah Misch wrote: > On Thu, Apr 07, 2016 at 03:50:47PM -0400, Stephen Frost wrote: > > I'm planning to continue going over the patch tomorrow morning with > > plans to push this before the feature freeze deadline. > > > --- a/src/test/regress/expected/rolenames.out > > +++ b/src/test/regress/expected/rolenames.out > > > +GRANT testrol0 TO pg_abc; -- error > > +ERROR: role "pg_abc" is reserved > > +DETAIL: Cannot GRANT roles to a reserved role. > > The server still accepts "ALTER ROLE testrol0 USER pg_signal_backend". It > should block this ALTER ROLE if it blocks the corresponding GRANT. One more thing: > --- a/src/bin/pg_dump/pg_dumpall.c > +++ b/src/bin/pg_dump/pg_dumpall.c > @@ -665,7 +665,7 @@ dumpRoles(PGconn *conn) > int i; > > /* note: rolconfig is dumped later */ > - if (server_version >= 90500) > + if (server_version >= 90600) This need distinct branches for 9.5 and for 9.6+. Today's code would treat a 9.5 cluster like a 9.1 cluster and fail to dump rolbypassrls attributes. > printfPQExpBuffer(buf, > "SELECT oid, rolname, > rolsuper, rolinherit, " > "rolcreaterole, rolcreatedb, " > @@ -674,6 +674,7 @@ dumpRoles(PGconn *conn) >"pg_catalog.shobj_description(oid, 'pg_authid') as > rolcomment, " > "rolname = current_user AS > is_current_user " > "FROM pg_authid " > + "WHERE rolname !~ '^pg_' " > "ORDER BY 2"); > else if (server_version >= 90100) > printfPQExpBuffer(buf, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump dump catalog ACLs
On Tue, Apr 05, 2016 at 05:50:18PM -0400, Stephen Frost wrote: > I'll be doing more testing, review and clean-up (there are some > whitespace only changes in the later patches that really should be > included in the earlier patches where the code was actually changed) > tonight with plans to push this tomorrow night. (1) I ran into trouble reconciling the scope of dumpable ACLs. The commit message indicated this scope: > Subject: [PATCH 4/5] In pg_dump, include pg_catalog and extension ACLs, if > changed > > Now that all of the infrastructure exists, add in the ability to > dump out the ACLs of the objects inside of pg_catalog or the ACLs > for objects which are members of extensions, but only if they have > been changed from their original values. I wrote the attached test script to verify which types of ACLs dump/reload covers. Based on the commit message, I expected these results: Dumped: type, class, attribute, proc, largeobject_metadata, foreign_data_wrapper, foreign_server, language(in-extension), namespace(in-extension) Not dumped: database, tablespace, language(from-initdb), namespace(from-initdb) Did I interpret the commit message correctly? The script gave these results: Dumped: type, class, attribute, namespace(from-initdb) Not dumped: database, tablespace, proc, language(from-initdb) Not tested: largeobject_metadata, foreign_data_wrapper, foreign_server, language(in-extension), namespace(in-extension) I didn't dig into why that happened; the choice of object type may not even be the root cause in each case. Which of those results, if any, surprise you? (2) pg_dump queries: > @@ -14187,18 +14869,65 @@ dumpTable(Archive *fout, TableInfo *tbinfo) > + "FROM > pg_catalog.pg_attribute at " > +"JOIN pg_catalog.pg_class c ON > (at.attrelid = c.oid) " > + "LEFT JOIN > pg_init_privs pip ON " Since pg_attribute and pg_class require schema qualification here, so does pg_init_privs. Likewise elsewhere in the patch's pg_dump changes. (3) pg_dumpall became much slower around the time of these commits. On one machine (POWER7 3.55 GHz), a pg_dumpall just after initdb slowed from 0.25s at commit 6c268df^ to 4.0s at commit 7a54270. On a slower machine (Opteron 1210), pg_dumpall now takes 19s against such a fresh cluster. I doubt I'll review this patch, but those things have come to my attention. SELECT proacl FROM pg_proc WHERE proname = 'pg_sleep'; \dT+ money \z pg_catalog.pg_database \z information_schema.domains \dL+ sql \dn+ public \l+ template1 \db+ pg_default REVOKE EXECUTE ON FUNCTION pg_sleep(float8) FROM PUBLIC; REVOKE USAGE ON TYPE money FROM PUBLIC; REVOKE SELECT ON pg_database FROM PUBLIC; GRANT SELECT (oid) ON pg_database TO PUBLIC; REVOKE SELECT ON information_schema.domains FROM PUBLIC; GRANT SELECT (domain_name) ON information_schema.domains TO PUBLIC; REVOKE USAGE ON LANGUAGE sql FROM PUBLIC; -- TODO pg_largeobject_metadata.lomacl REVOKE USAGE ON SCHEMA public FROM PUBLIC; REVOKE CONNECT ON DATABASE template1 FROM PUBLIC; GRANT CREATE ON TABLESPACE pg_default TO PUBLIC; -- SKIP pltemplate: deprecated; no DDL for it -- TODO pg_foreign_data_wrapper.fdwacl -- TODO pg_foreign_server.srvacl SELECT proacl FROM pg_proc WHERE proname = 'pg_sleep'; \dT+ money \z pg_catalog.pg_database \z information_schema.domains \dL+ sql \dn+ public \l+ template1 \db+ pg_default /* RESULT: $ pg_dumpall | grep -E '(GRANT|REVOKE)' GRANT ALL ON DATABASE template1 TO nm; REVOKE ALL ON SCHEMA public FROM PUBLIC; GRANT CREATE ON SCHEMA public TO PUBLIC; REVOKE ALL ON TYPE money FROM PUBLIC; REVOKE SELECT ON TABLE pg_database FROM PUBLIC; GRANT SELECT(oid) ON TABLE pg_database TO PUBLIC; */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapshot too old, configured by time
On Mon, Apr 18, 2016 at 9:52 AM, Kevin Grittner wrote: > On Sun, Apr 17, 2016 at 5:15 PM, Tom Lane wrote: >> Kevin Grittner writes: >>> On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera> >>> wrote: Kevin Grittner wrote: >> I think we should revert BufferGetPage to be what it was before (with >> no snapshot test) and invent BufferGetPageExtended or similar to be >> used in the small number of places that need a snapshot test. > > I'm not sure what BufferGetPageExtended() buys us over simply > inserting TestForOldSnapshot() where it is needed. Other than that > question, I have no objections to the course outlined, but figure I > should not jump on it without allowing at least a couple days for > discussion. That also may give me time to perform the benchmarks I > wanted -- VPN issues have blocked me from the big test machines so > far. I think I see where the time may be going when the feature is > disabled, and if I'm right I have a fix; but without a big NUMA > machine there is no way to confirm it. TBH, BufferGetPageExtended() still looks like a good idea to me. Backpatching those code paths is going to make the maintenance far harder, on top of the compilation of extensions for perhaps no good reason. Even if this is a low-level change, if this feature goes in with 9.6, it would be really good to mention as well that callers of BufferGetPage should update their calls accordingly if they care about the checks with the old snapshot. This is a routine used a lot in many plugins and extensions. Usually such low-level things are not mentioned in the release notes, but this time I think that's really important to say it loudly. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapshot too old, configured by time
On Sun, Apr 17, 2016 at 5:15 PM, Tom Lane wrote: > Kevin Grittner writes: >> On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera> >> wrote: >>> Kevin Grittner wrote: On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera wrote: I said that we should change BufferGetPage into having the snapshot check built-in, except in the cases where a flag is passed; and the flag would be passed in all cases except those 30-something you identified. In other words, the behavior in all the current callsites would be identical to what's there today; we could have a macro do the first check so that we don't introduce the overhead of a function call in the 450 cases where it's not needed. > >> Attached is what I think you're talking about for the first patch. >> AFAICS this should generate identical executable code to unpatched. >> Then the patch to actually implement the feature would, instead >> of adding 30-some lines with TestForOldSnapshot() would implement >> that as the behavior for the other enum value, and alter those >> 30-some BufferGetPage() calls. > >> Álvaro and Michael, is this what you were looking for? > >> Is everyone else OK with this approach? > > After struggling with back-patching a GIN bug fix, I wish to offer up the > considered opinion that this was an impressively bad idea. It's inserted > 450 or so pain points for back-patching, which we will have to deal with > for the next five years. Moreover, I do not believe that it will do a > damn thing for ensuring that future calls of BufferGetPage think about > what to do; they'll most likely be copied-and-pasted from nearby calls, > just as people have always done. With luck, the nearby calls will have > the right semantics, but this change won't help very much at all if they > don't. > > I think we should revert BufferGetPage to be what it was before (with > no snapshot test) and invent BufferGetPageExtended or similar to be > used in the small number of places that need a snapshot test. I'm not sure what BufferGetPageExtended() buys us over simply inserting TestForOldSnapshot() where it is needed. Other than that question, I have no objections to the course outlined, but figure I should not jump on it without allowing at least a couple days for discussion. That also may give me time to perform the benchmarks I wanted -- VPN issues have blocked me from the big test machines so far. I think I see where the time may be going when the feature is disabled, and if I'm right I have a fix; but without a big NUMA machine there is no way to confirm it. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default Roles
On Thu, Apr 07, 2016 at 03:50:47PM -0400, Stephen Frost wrote: > I'm planning to continue going over the patch tomorrow morning with > plans to push this before the feature freeze deadline. > --- a/src/test/regress/expected/rolenames.out > +++ b/src/test/regress/expected/rolenames.out > +GRANT testrol0 TO pg_abc; -- error > +ERROR: role "pg_abc" is reserved > +DETAIL: Cannot GRANT roles to a reserved role. The server still accepts "ALTER ROLE testrol0 USER pg_signal_backend". It should block this ALTER ROLE if it blocks the corresponding GRANT. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pgbench with -f and -S
On 17 April 2016 at 21:24, Fabien COELHO wrote: > For the future 9.6, scripts options are cumulatives, so -f & -S can be > combined. > > Indeed, for the <= 9.5 it seems that some options are silently ignores, > when combining -S/-f, only the last one is kept, it should be warned about. > Thanks Fabien, for confirming about the missing warning. Also, by 'combined' I think you mean that both (built-in SELECTs & Custom Script) run, although the dev docs don't (yet) say anything about that. - robins
Re: [HACKERS] snapshot too old, configured by time
Kevin Grittner writes: > On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera > wrote: >> Kevin Grittner wrote: >>> On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera >>> wrote: >>> I said that we should change BufferGetPage into having the snapshot >>> check built-in, except in the cases where a flag is passed; and the flag >>> would be passed in all cases except those 30-something you identified. >>> In other words, the behavior in all the current callsites would be >>> identical to what's there today; we could have a macro do the first >>> check so that we don't introduce the overhead of a function call in the >>> 450 cases where it's not needed. > Attached is what I think you're talking about for the first patch. > AFAICS this should generate identical executable code to unpatched. > Then the patch to actually implement the feature would, instead > of adding 30-some lines with TestForOldSnapshot() would implement > that as the behavior for the other enum value, and alter those > 30-some BufferGetPage() calls. > Ãlvaro and Michael, is this what you were looking for? > Is everyone else OK with this approach? After struggling with back-patching a GIN bug fix, I wish to offer up the considered opinion that this was an impressively bad idea. It's inserted 450 or so pain points for back-patching, which we will have to deal with for the next five years. Moreover, I do not believe that it will do a damn thing for ensuring that future calls of BufferGetPage think about what to do; they'll most likely be copied-and-pasted from nearby calls, just as people have always done. With luck, the nearby calls will have the right semantics, but this change won't help very much at all if they don't. I think we should revert BufferGetPage to be what it was before (with no snapshot test) and invent BufferGetPageExtended or similar to be used in the small number of places that need a snapshot test. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL certificate location
## Terence Ferraro (terencejferr...@gmail.com): > At the moment, if a user has multiple applications on a single machine > connecting with different SSL certificates, each process must be launched > by a different logical user and the certificates placed within that user's > home directory (and this is just for *nix, forget about Windows). The > current method is not scalable, either. That is incorrect. http://www.postgresql.org/docs/current/static/libpq-ssl.html http://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS http://www.postgresql.org/docs/current/static/libpq-envars.html Connection parameters are "sslcert" and "sslkey", environment variables "PGSSLCERT" and "PGSSLKEY". You can also specify parameters in your .pg_service.conf. Regards, Christoph -- Spare Space -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SSL certificate location
I'm not sure if this may be of any utility value to anyone else, but, the attached patch enables an environment variable to be provided to libpq to specify where to find the SSL certificate/key files used for a secure connection. At the moment, if a user has multiple applications on a single machine connecting with different SSL certificates, each process must be launched by a different logical user and the certificates placed within that user's home directory (and this is just for *nix, forget about Windows). The current method is not scalable, either. With the attached patch, the user just sets the environment variable e.g. PGSQL_SSL_PATH=/home/test/cert_directory/app_1/ /usr/local/pgsql/bin/psql -U postgres -h 127.0.0.1 -p 5432 PGSQL_SSL_PATH=/home/test/cert_directory/app_2/ /usr/local/pgsql/bin/psql -U postgres -h 127.0.0.1 -p 5433 It follows the same existing conventions by looking for the actual certificates within the .postgresql sub-directory of the provided path. *Terence J. Ferraro* --- a/postgresql-9.5.2/src/interfaces/libpq/fe-secure-openssl.c 2016-03-28 16:07:39.0 -0400 +++ b/postgresql-9.5.2/src/interfaces/libpq/fe-secure-openssl.c 2016-04-15 23:12:17.493355856 -0400 @@ -35,6 +35,7 @@ #else #include #include +#include #include #include #ifdef HAVE_NETINET_TCP_H @@ -936,7 +937,14 @@ boolhave_homedir; boolhave_cert; EVP_PKEY *pkey = NULL; - + char*custom_homedir; + boolhave_custom_homedir; + + custom_homedir = getenv("PGSQL_SSL_PATH"); + + if(custom_homedir == NULL) { have_custom_homedir = false; } + else { have_custom_homedir = true; } + /* * We'll need the home directory if any of the relevant parameters are * defaulted. If pqGetHomeDirectory fails, act as though none of the @@ -953,6 +961,9 @@ /* Read the client certificate file */ if (conn->sslcert && strlen(conn->sslcert) > 0) strlcpy(fnbuf, conn->sslcert, sizeof(fnbuf)); + /* ENV variable specified, load that certificate file */ + else if (have_custom_homedir) + snprintf(fnbuf, sizeof(fnbuf), "%s/%s", custom_homedir, USER_CERT_FILE); else if (have_homedir) snprintf(fnbuf, sizeof(fnbuf), "%s/%s", homedir, USER_CERT_FILE); else @@ -1146,6 +1157,11 @@ strlcpy(fnbuf, conn->sslkey, sizeof(fnbuf)); } } + else if (have_custom_homedir) + { + /* ENV variable specified, load that file */ + snprintf(fnbuf, sizeof(fnbuf), "%s/%s", custom_homedir, USER_KEY_FILE); + } else if (have_homedir) { /* No PGSSLKEY specified, load default file */ @@ -1207,6 +1223,8 @@ */ if (conn->sslrootcert && strlen(conn->sslrootcert) > 0) strlcpy(fnbuf, conn->sslrootcert, sizeof(fnbuf)); + else if (have_custom_homedir) + snprintf(fnbuf, sizeof(fnbuf), "%s/%s", custom_homedir, ROOT_CERT_FILE); else if (have_homedir) snprintf(fnbuf, sizeof(fnbuf), "%s/%s", homedir, ROOT_CERT_FILE); else @@ -1245,6 +1263,8 @@ { if (conn->sslcrl && strlen(conn->sslcrl) > 0) strlcpy(fnbuf, conn->sslcrl, sizeof(fnbuf)); + else if (have_custom_homedir) + snprintf(fnbuf, sizeof(fnbuf), "%s/%s", custom_homedir, ROOT_CRL_FILE); else if (have_homedir) snprintf(fnbuf, sizeof(fnbuf), "%s/%s", homedir, ROOT_CRL_FILE); else -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN data corruption bug(s) in 9.6devel
On Tue, Apr 12, 2016 at 9:53 AM, Teodor Sigaev wrote: > > With pending cleanup patch backend will try to get lock on metapage with > ConditionalLockPage. Will it interrupt autovacum worker? Correct, ConditionalLockPage should not interrupt the autovacuum worker. >> >> Alvaro's recommendation, to let the cleaner off the hook once it >> passes the page which was the tail page at the time it started, would >> prevent any process from getting pinned down indefinitely, but would >> not prevent the size of the list from increasing without bound. I >> think that would probably be good enough, because the current >> throttling behavior is purely accidentally and doesn't *guarantee* a >> limit on the size of the pending list. > > Added, see attached patch (based on v3.1) With this applied, I am getting a couple errors I have not seen before after extensive crash recovery testing: ERROR: attempted to delete invisible tuple ERROR: unexpected chunk number 1 (expected 2) for toast value 100338365 in pg_toast_16425 I've restarted the test harness with intentional crashes turned off, to see if the problems are related to crash recovery or are more generic than that. I've never seen these particular problems before, so don't have much insight into what might be going on or how to debug it. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Can we improve this error message?
Here's an interesting scenario I happened across recently. If you have a single line in the pg_hba.conf: hostssl all all 0.0.0.0/0 md5 Attempting to log in with an incorrect password results in an error message about there not being a pg_hba.conf entry for the user. Reading carefully, the error message states that there's no pg_hba.conf for the user with **ssl off**. What I believe is happening, is that the pg connection libs first try to connect via ssl and get a password failed error, then fallback to trying to connect without ssl, and get a "no pg_hba.conf entry" error. The problem is that the second error masks the first one, hiding the real cause of the connection failure, and causing a lot of confusion. If we could keep both errors and report them both, I feel like it would be an improvement to our client library behavior. -- Bill Moran -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
> Second, I don't think it will improve and become performant without > exposure to a wider audience. Huh? The issue is a relatively simple to spot architectural issue (taking a single exclusive lock during snapshot acquiration which only needs shared locks otherwise) - I don't see how any input it's needed. And for that matter, I don't see why such a lock got through review. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Fri, Apr 15, 2016 at 1:59 AM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Thu, Apr 14, 2016 at 5:35 AM, Andres Freund wrote: > >> On 2016-04-14 07:59:07 +0530, Amit Kapila wrote: >> > What you want to see by prewarming? >> >> Prewarming appears to greatly reduce the per-run variance on that >> machine, making it a lot easier to get meaningful results. Thus it'd >> make it easier to compare pre/post padding numbers. >> >> > Will it have safe effect, if the tests are run for 10 or 15 mins >> > rather than 5 mins? >> >> s/safe/same/? If so, no, I doesn't look that way. The order of buffers >> appears to play a large role; and it won't change in a memory resident >> workload over one run. >> > > I've tried to run read-only benchmark of pad_pgxact_v1.patch on 4x18 Intel > machine. The results are following. > > clients no-pad pad > 1 12997 13381 > 2 26728 25645 > 4 52539 51738 > 8 103785 102337 > 10 132606 126174 > 20 255844 252143 > 30 371359 357629 > 40 450429 443053 > 50 497705 488250 > 60 564385 564877 > 70 718860 725149 > 80 934170 931218 > 90 1152961 1146498 > 100 1240055 1300369 > 110 1207727 1375706 > 120 1167681 1417032 > 130 1120891 1448408 > 140 1085904 1449027 > 150 1022160 1437545 > 160 976487 1441720 > 170 978120 1435848 > 180 953843 1414925 > > snapshot_too_old patch was reverted in the both cases. > On high number of clients padding gives very significant benefit. > These results indicates that the patch is a win. Are these results median of 3 runs or single run data. By the way can you share the output of lscpu command on this m/c. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Thu, Apr 14, 2016 at 8:05 AM, Andres Freund wrote: > > On 2016-04-14 07:59:07 +0530, Amit Kapila wrote: > > What you want to see by prewarming? > > Prewarming appears to greatly reduce the per-run variance on that > machine, making it a lot easier to get meaningful results. > I think you are referring the tests done by Robert on power-8 m/c, but the performance results I have reported were on intel x86. In last two days, I have spent quite some effort to do the performance testing of this patch with pre-warming by using the same query [1] as used by Robert in his tests. The tests are done such that first it start server, pre-warms the relations, ran read-only test, stop server, again repeat this for next test. I have observed that the variance in run-to-run performance still occurs especially at higher client count (128). Below are results for 128 client count both when the tests ran first with patch and then with HEAD and vice versa. Test-1 -- client count - 128 (basically -c 128 -j 128) first tests ran with patch and then with HEAD Patch_ver/Runs HEAD (commit -70715e6a) Patch Run-1 156748 174640 Run-2 151352 150115 Run-3 177940 165269 Test-2 -- client count - 128 (basically -c 128 -j 128) first tests ran with HEAD and then with patch Patch_ver/Runs HEAD (commit -70715e6a) Patch Run-1 173063 151282 Run-2 173187 140676 Run-3 177046 166726 I think this patch (padding pgxact) certainly is beneficial as reported above thread. At very high client count some variation in performance is observed with and without patch, but I feel in general it is a win. [1] - psql -c "select sum(x.x) from (select pg_prewarm(oid) as x from pg_class where relkind in ('i', 'r') order by oid) x;" With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Pgbench with -f and -S
Pgbench allows -f and -S combinations together where the doc says that -S effectively uses the internal select-only script. Is it okay to assume that -f is disregarded here? Or are they run in round-robin fashion (although then, how does it know which read-only part of my script to run?) or something else like that? Effectively, I think it should raise NOTICE that -f is useless here. For the future 9.6, scripts options are cumulatives, so -f & -S can be combined. Indeed, for the <= 9.5 it seems that some options are silently ignores, when combining -S/-f, only the last one is kept, it should be warned about. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
On 4/16/16 4:44 PM, Noah Misch wrote: > The key judgment to finalize here is whether it's okay to release this feature > given its current effect[1], when enabled, on performance. That is more > controversial than the potential ~2% regression for old_snapshot_threshold=-1. > Alvaro[2] and Robert[3] are okay releasing that way, and Andres[4] is not. If > anyone else wants to weigh in, now is the time. I'm in favor of releasing the feature even with the performance regression when enabled. First, there are use cases where a feature like this is absolutely critical. Second, I don't think it will improve and become performant without exposure to a wider audience. I think it's entirely within the PostgreSQL philosophy to release a feature that has warts and doesn't perform as well as we'd like as long as it is stable and does not corrupt data. In my opinion this feature meets these criteria and it is an important capability to add to PostgreSQL. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN VERBOSE with parallel Aggregate
David Rowley writes: > On 16 April 2016 at 04:27, Tom Lane wrote: >> +1 for the latter, if we can do it conveniently. I think exposing >> the names of the aggregate implementation functions would be very >> user-unfriendly, as nobody but us hackers knows what those are. > It does not really seem all that convenient to do this. It also seems > a bit strange to me to have a parent node report a column which does > not exist in any nodes descending from it. Remember that the combine > Aggref does not have the same ->args as its corresponding partial > Aggref. It's not all that clear to me if there is any nice way to do > have this work the way you'd like. If we were to just call > get_rule_expr() on the first arg of the combine aggregate node, it > would re-print the PARTIAL keyword again. This suggests to me that the parsetree representation for partial aggregates was badly chosen. If EXPLAIN can't recognize them, then neither can anything else, and we may soon find needs for telling the difference that are not just cosmetic. Maybe we need more fields in Aggref. > Note that I've done nothing for the weird schema prefixing problem I > mentioned. I think I'd need input on the best way to solve this. If > it's actually a problem. It sure looks like a problem to me. Maybe it's only cosmetic, but it's going to cause confusion and bug reports. EXPLAIN output for parallel queries is going to be confusing enough anyway; we need to avoid having artifacts like this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timeline following for logical slots
Hi all I made an unfortunate oversight in the logical decoding timeline following code: it doesn't work for logical decoding from the walsender, because I left the glue code required out of the final cut of the patch. The reason the new src/test/recovery/ tests don't notice this is that using pg_recvlogical from the TAP tests is currently pretty awkward. pg_recvlogical has no way to specify a stop-point or return when there's no immediately pending data like the SQL interface does. So you have to read from it until you figure it's not going to return anything more then kill it and look at what it did return and hope you don't lose anything in buffering.I don't much like relying on timeouts as part of normal successful results since they can upset some of the older and slower buildfarm members. I'd rather be able to pass a --stoppos= or a --n-xacts= option, but it was a bit too late to add those. Per the separate mail I sent to hackers, xlogreader is currently pretty much timeline-agnostic. The timeline tracking code needs to know when the xlogreader does a random read and do a fresh lookup so its state is part of the XLogReaderState struct. But the walsender's xlogreader page read callback doesn't use the xlogreader's state, and it can't because it's also used for physical replication, which communicates the timeline to read from with the page read function via globals. So for now, I just set the globals before calling the page read function: + XLogReadDetermineTimeline(state); + sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID; + sendTimeLine = state->currTLI; + sendTimeLineValidUpto = state->currTLIValidUntil; + sendTimeLineNextTLI = state->nextTLI; Per separate mail, I'd quite like to tackle the level of duplication in timeline following logic in 9.7 and maybe see if the _three_ separate read xlog page functions can be unified at the same time. But that sure isn't 9.6 material. From c56a32b5ace8a48908da366e5f778fa98a125740 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Thu, 14 Apr 2016 15:43:20 +0800 Subject: [PATCH] Enable logical timeline following in the walsender --- src/backend/access/transam/xlogutils.c | 7 +++ src/backend/replication/walsender.c| 11 +++ src/include/access/xlogreader.h| 3 +++ src/include/access/xlogutils.h | 2 ++ 4 files changed, 15 insertions(+), 8 deletions(-) diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index c3213ac..c3b0e5c 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -773,7 +773,7 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count) * might be reading from historical timeline data on a segment that's been * copied to a new timeline. */ -static void +void XLogReadDetermineTimeline(XLogReaderState *state) { /* Read the history on first time through */ @@ -856,12 +856,11 @@ XLogReadDetermineTimeline(XLogReaderState *state) state->currTLIValidUntil == InvalidXLogRecPtr) { XLogRecPtr tliSwitch; - TimeLineID nextTLI; CHECK_FOR_INTERRUPTS(); tliSwitch = tliSwitchPoint(state->currTLI, state->timelineHistory, - &nextTLI); + &state->nextTLI); /* round ValidUntil down to start of seg containing the switch */ state->currTLIValidUntil = @@ -875,7 +874,7 @@ XLogReadDetermineTimeline(XLogReaderState *state) * * If that's the current TLI we'll stop searching. */ -state->currTLI = nextTLI; +state->currTLI = state->nextTLI; state->currTLIValidUntil = InvalidXLogRecPtr; } } diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index e4a0119..495bff2 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -47,6 +47,7 @@ #include "access/transam.h" #include "access/xact.h" #include "access/xlog_internal.h" +#include "access/xlogutils.h" #include "catalog/pg_type.h" #include "commands/dbcommands.h" @@ -756,6 +757,12 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req XLogRecPtr flushptr; int count; + XLogReadDetermineTimeline(state); + sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID; + sendTimeLine = state->currTLI; + sendTimeLineValidUpto = state->currTLIValidUntil; + sendTimeLineNextTLI = state->nextTLI; + /* make sure we have enough WAL available */ flushptr = WalSndWaitForWal(targetPagePtr + reqLen); @@ -984,10 +991,6 @@ StartLogicalReplication(StartReplicationCmd *cmd) pq_endmessage(&buf); pq_flush(); - /* setup state for XLogReadPage */ - sendTimeLineIsHistoric = false; - sendTimeLine = ThisTimeLineID; - /* * Initialize position to the last ack'ed one, then the xlog records begin * to be shipped from that position. diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h index 300747d..bbee552 100644 --- a/src
[HACKERS] Timeline following is a bit tangled...
Hi all As part of the work I did on timeline following for logical decoding I mapped out the various code paths relating to timeline following in Pg. https://wiki.postgresql.org/wiki/TimelineFollowing97 It's surprisingly complex (to me), with lots of completely separate logic for each different path. Redo has one way to decide when to switch timelines and which WAL segment to read from. The walsender has two, one for physical replication and one (with a small overlap) for logical replication. Logical replication over the SQL interface now has another, which overlaps mostly but not entirely with the logical walsender one. One thing that makes it very hard to follow the code (IMO) is that the xlogreader is totally timeline agnostic. The xlogreader's callers decide which timeline to read from and when to switch timelines. The actual WAL segment to read from is determined by the read page callback that the xlogreader invokes. The callback figures out the timeline by looking "around" the xlogreader at global state in xlog.c (for redo) or walsender.c (for phys/logical walsender). Each place has its own logic for things like the early timeline switch required to ensure that we read from a segment that's actually locally present, since older timelines of the same segment won't be present or will be renamed .partial . I'd like to reduce the duplication here and try to make it a bit easier to follow. If doing so doesn't seem worth the (undeniable) risks when messing with redo then I'll just leave it untouched, I don't feel so strongly about it as all that. Because physical rep doesn't use the xlogreader it doesn't make sense to just add timeline following to the xlogreader directly. It has to be separate, usable by physical rep and the xlogreader. I think it should be reasonable to have them both use the same state struct and function though, where they can just call a func before reading each page to update the timeline to read from if needed, then have their page read callback use that timeline. It can keep track of the next timeline, the TLI switchpoint, whether the timeline became historical since the last page was read, a copy of the latest timeline history, etc etc, and should probably live in timeline.c. I'm not especially thrilled with the code I wrote for logical decoding timeline following there, and I'm actually more inclined to base that logic on the physical walsender's code, which is IMO the clearest we currently have. Extract it, move its state globals into a struct, generalize it. Sound not completely insane? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] EXPLAIN VERBOSE with parallel Aggregate
On 16 April 2016 at 04:27, Tom Lane wrote: > Robert Haas writes: >> I definitely agree that the current output is messed up, but I'm not >> sure your proposed output is much better. I wonder if it shouldn't >> say something like: >> Output: serialfn(transfn(args)) >> for the partial aggregate and >> Output: finalfn(combinefn(deserialfn(args))) >> for the finalize aggregate step. > >> Or maybe just insert the word PARTIAL before each partial aggregate >> step, like PARTIAL sum(num) for the partial step and then just >> sum(num) for the final step. > > +1 for the latter, if we can do it conveniently. I think exposing > the names of the aggregate implementation functions would be very > user-unfriendly, as nobody but us hackers knows what those are. It does not really seem all that convenient to do this. It also seems a bit strange to me to have a parent node report a column which does not exist in any nodes descending from it. Remember that the combine Aggref does not have the same ->args as its corresponding partial Aggref. It's not all that clear to me if there is any nice way to do have this work the way you'd like. If we were to just call get_rule_expr() on the first arg of the combine aggregate node, it would re-print the PARTIAL keyword again. In the attached I have it displaying as: postgres=# explain verbose select count(*),max(a),avg(a) filter (where a = 0) from t; QUERY PLAN --- Finalize Aggregate (cost=14739.56..14739.57 rows=1 width=44) Output: pg_catalog.count(*), max((max(a))), pg_catalog.avg((PARTIAL avg(a) FILTER (WHERE (a = 0 -> Gather (cost=14739.33..14739.54 rows=2 width=44) Output: (count(*)), (max(a)), (PARTIAL avg(a) FILTER (WHERE (a = 0))) Workers Planned: 2 -> Partial Aggregate (cost=13739.33..13739.34 rows=1 width=44) Output: count(*), max(a), PARTIAL avg(a) FILTER (WHERE (a = 0)) -> Parallel Seq Scan on public.t (cost=0.00..9572.67 rows=416667 width=4) Output: a (9 rows) I like this much better, as there's no fudging of any function arguments to print them as something they're not. Note that max() does not get tagged with PARTIAL since it has no finalfn. Obtaining this information does require a syscache lookup in get_agg_expr(), but I've managed to make that only happen when aggpartial is true. >> I think ending up with sum(sum(num)) is >> right out. It doesn't look so bad for that case but avg(avg(num)) >> would certainly imply something that's not the actual behavior. > > Agreed. Note that I've done nothing for the weird schema prefixing problem I mentioned. I think I'd need input on the best way to solve this. If it's actually a problem. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services parallel_agg_explain_verbose_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers