Re: [HACKERS] Declarative partitioning

2016-04-17 Thread Ashutosh Bapat
> On 2016/04/15 18:46, Ashutosh Bapat wrote:
> >
> > 3. PartitionKeyData contains KeyTypeCollInfo, whose contents can be
> > obtained by calling functions exprType, exprTypemod on partexprs. Why do
> we
> > need to store that information as a separate member?
>
> There was no KeyTypeCollInfo in early days of the patch and then I found
> myself doing a lot of:
>
> partexprs_item = list_head(key->partexprs);
> for (attr in key->partattrs)
> {
> if (attr->attnum != 0)
> {
> // simple column reference, get type from attr
> }
> else
> {
> // expression, get type using exprType, etc.
> partexprs_item = lnext(partexprs_item);
> }
> }
>

At least the two loops can be flattened to a single loop if we keep only
expressions list with attributes being just Var nodes. exprType() etc.
would then work seemlessly.


>
> That ended up being quite a few places (though I managed to reduce the
> number of places over time).  So, I created this struct which is
> initialized when partition key is built (on first open of the partitioned
> table).
>

Hmm, I am just afraid that we might end up with some code using cached
information and some using exprType, exprTypmod etc.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning

2016-04-17 Thread Ashutosh Bapat
> >> Regarding 6, it seems to me that because Append does not have a
> associated
> >> relid (like scan nodes have with scanrelid).  Maybe we need to either
> fix
> >> Append or create some enhanced version of Append which would also
> support
> >> dynamic pruning.
> >>
> >
> > Right, I think, Append might store the relid of the parent table as well
> as
> > partitioning information at that level along-with the subplans.
>
> For time being, I will leave this as yet unaddressed (I am thinking about
> what is reasonable to do for this also considering Robert's comment).  Is
> that OK?
>

Right now EXPLAIN of select * from t1, where t1 is partitioned table shows
Append
-> Seq Scan on t1
-> Seq scan on partition 1
-> seq scan on partition 2
...
which shows t1 as well as all the partitions on the same level. Users might
have accepted that for inheritance hierarchy but for partitioning that can
be confusing, esp. when all the error messages and documentation indicate
that t1 is an empty (shell?) table. Instead showing it like
Append for t1 -- (essentially show that this is Append for partitioned
table t1, exact text might vary)
-> Seq scan on partition 1
-> 
would be more readable. Similarly if we are going to collapse all the
Append hierarchy, it might get even more confusing. Listing all the
intermediate partitioned tables as Seq Scan on them would be confusing for
the reasons mentioned above, and not listing them might make user wonder
about the reasons for their disappearance. I am not sure what's the
solution their.


> > Some more comments
> > Would it be better to declare PartitionDescData as
> > {
> > int nparts;
> > PartitionInfo *partinfo; /* array of partition information structures. */
> > }
>
> I think that might be better.  Will do it the way you suggest.
>
> > The new syntax allows CREATE TABLE to be specified as partition of an
> > already partitioned table. Is it possible to do the same for CREATE
> FOREIGN
> > TABLE? Or that's material for v2? Similarly for ATTACH PARTITION.
>
> OK, I will address this in the next version.  One question though: should
> foreign table be only allowed to be leaf partitions (ie, no PARTITION BY
> clause in CREATE FOREIGN TABLE ... PARTITION OF)?


That seems a better way. Otherwise users might wonder whether we keep the
partitions of a foreign table on the foreign server which won't be true.
But then we allow foreign tables to have local tables as children in
inheritance, so somebody from that background might find it incompatible. I
think we shouldn't let the connection between partitioning and inheritance
linger longer than necessary.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning

2016-04-17 Thread Ashutosh Bapat
On Fri, Apr 15, 2016 at 10:30 PM, Robert Haas  wrote:

> On Fri, Apr 15, 2016 at 5:46 AM, Ashutosh Bapat
>  wrote:
> > Retaining the partition hierarchy would help to push-down join across
> > partition hierarchy effectively.
>
> -1.  You don't get to insert cruft into the final plan for the
> convenience of the optimizer.  I think the AppendPath needs to be
> annotated with sufficient information to do whatever query planning
> optimizations we want, and some or all of that may need to carry over
> to the Append plan to allow run-time partition pruning.  But I think
> that flattening nests of Appends is a good optimization and we should
> preserve it.  If that makes the additional information that any given
> Append needs to carry a bit more complex, so be it.
>
> I also think it's very good that Amit has kept the query planner
> unchanged in this initial patch.  Let's leave that work to phase two.
> What I suggest we do when the time comes is invent new nodes
> RangePartitionMap, ListPartitionMap, HashPartitionMap.  Each contains
> minimal metadata needed for tuple routing or planner transformation.
> For example, RangePartitionMap can contain an array of partition
> boundaries - represented as Datums - and an array of mappings, each a
> Node *.  The associated value can be another PartitionMap object if
> there is subpartitioning in use, or an OID.


range table index instead of OID to make it easy to lookup the


>   This can be used both for
> matching up partitions for join pushdown, and also for fast tuple
> routing and runtime partition pruning.
>

I was thinking about join pushdown (in partitioning hierarchy) of multiple
tables which have similar partitioning structure for few upper levels but
the entire partitioning hierarchy does not match. Pushing down the join as
much possible into the partition hierarchy will help. We might end up with
joins between a plain table and Append relation at the leaf level. For such
join push-down it looks like we will have to construct corresponding Append
hierarchy, push the joins down and then (may be) collapse it to just
collect the results of joins at the leaf level. But preparing for that kind
of optimization need not be part of this work.


>
> > 2. The new syntax allows CREATE TABLE to be specified as partition of an
> > already partitioned table. Is it possible to do the same for CREATE
> FOREIGN
> > TABLE? Or that's material for v2? Similarly for ATTACH PARTITION.
>
> +1 for making CREATE FOREIGN TABLE support that also, and in version
> 1.  And same for ATTACH PARTITION.
>
>
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] checkpoint_flush_after documentation inconsistency

2016-04-17 Thread Fujii Masao
On Fri, Apr 15, 2016 at 6:56 PM, Magnus Hagander  wrote:
> The documentation says that the default value is 128Kb on Linux, but the
> code says it's 256Kb.
>
> Not sure which one is correct, but the other one should be updated :) I'm
> guessing it's a copy/paste mistake in the docs, but since I'm not sure I'm
> not just pushing a fix.

I think you're right.

I also found another several small problems regarding XXX_flush_after
parameters.

(1)
checkpoint_flush_after, backend_flush_after and bgwriter_flush_after
don't exist in postgresql.conf.sample. If there is no special reason for
that, it's better to add them to postgresql.conf.sample.

(2)
> /* see bufmgr.h: 16 on Linux, 0 otherwise */

The source code comment says that the default value of bgwriter_flush_after
is 16 on Linux, but it's actually 64. The parameter which its default is 16 is
backend_flush_after. This souce comment needs to be updated.

(3)
The config groups assigned to them look strange.

The group of checkpoint_flush_after is RESOURCES_ASYNCHRONOUS,
but it's documented under the section "Checkpoints". IMO, it's better
to change the group to WAL_CHECKPOINTS.

The group of bgwriter_flush_after is WAL_CHECKPOINTS,
but it's documented under the section Background Writer. IMO,
it's better to change the group to RESOURCES_BGWRITER.

(4)
> This parameter can only be set in the postgresql.conf file or on
> the server command line.

The above description should be used for a PGC_SIGHUP parameter.
But it's used for backend_flush_after which is PGC_USERSET parameter.

(5)
bgwriter_flush_after (int)
backend_flush_after (int)
checkpoint_flush_after (int)

In the doc, "int" should be "integer" for the sake of consistency.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Roles

2016-04-17 Thread Michael Paquier
On Mon, Apr 18, 2016 at 12:05 PM, Noah Misch  wrote:
> On Sun, Apr 17, 2016 at 08:04:03PM -0400, Noah Misch wrote:
>> On Thu, Apr 07, 2016 at 03:50:47PM -0400, Stephen Frost wrote:
>> > I'm planning to continue going over the patch tomorrow morning with
>> > plans to push this before the feature freeze deadline.
>>
>> > --- a/src/test/regress/expected/rolenames.out
>> > +++ b/src/test/regress/expected/rolenames.out
>>
>> > +GRANT testrol0 TO pg_abc; -- error
>> > +ERROR:  role "pg_abc" is reserved
>> > +DETAIL:  Cannot GRANT roles to a reserved role.
>>
>> The server still accepts "ALTER ROLE testrol0 USER pg_signal_backend".  It
>> should block this ALTER ROLE if it blocks the corresponding GRANT.

Following this logic, shouldn't CREATE ROLE USER be forbidden as well?
=# create role toto1 user pg_signal_backend;
CREATE ROLE
=# create role toto2;
CREATE ROLE
=# alter role toto2 user pg_signal_backend;
ALTER ROLE
=# \dgS+ pg_signal_backend
 List of roles
 Role name |  Attributes  |   Member of   | Description
---+--+---+-
 pg_signal_backend | Cannot login | {toto1,toto2} |

In short a reserved role should never be member of another role/group,
as per the attached.
-- 
Michael


catalog-acl-group.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pgbench with -f and -S

2016-04-17 Thread Fabien COELHO



For the future 9.6, scripts options are cumulatives, so -f & -S can be
combined.

Indeed, for the <= 9.5 it seems that some options are silently ignores,
when combining -S/-f, only the last one is kept, it should be warned about.

​
​Thanks Fabien, for confirming about the missing warning.

Also, by 'combined' I think you mean that both (built-in SELECTs & Custom
Script) run, although the dev docs don't (yet) say anything about that.


Hmmm... I think it does implicitely, with "add" and "and" in the 
following:



From http://www.postgresql.org/docs/devel/static/pgbench.html:


  -b scriptname[@weight]
  Add the specified builtin script to the list of executed scripts. [...]

Idem -f.

And later at the beginning of the Notes:

  pgbench executes test scripts chosen randomly from a specified list.
  They include built-in scripts with -b and user-provided custom scripts with 
-f.
  [...]

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-04-17 Thread Amit Langote

Hi,

On 2016/04/15 18:46, Ashutosh Bapat wrote:
> 
> 3. PartitionKeyData contains KeyTypeCollInfo, whose contents can be
> obtained by calling functions exprType, exprTypemod on partexprs. Why do we
> need to store that information as a separate member?

There was no KeyTypeCollInfo in early days of the patch and then I found
myself doing a lot of:

partexprs_item = list_head(key->partexprs);
for (attr in key->partattrs)
{
if (attr->attnum != 0)
{
// simple column reference, get type from attr
}
else
{
// expression, get type using exprType, etc.
partexprs_item = lnext(partexprs_item);
}
}

That ended up being quite a few places (though I managed to reduce the
number of places over time).  So, I created this struct which is
initialized when partition key is built (on first open of the partitioned
table).

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-17 Thread Kyotaro HORIGUCHI
At Fri, 15 Apr 2016 17:36:57 +0900, Masahiko Sawada  
wrote in 
> >> How about if we do all the parsing stuff in temporary context and then copy
> >> the results using TopMemoryContext?  I don't think it will be a leak in
> >> TopMemoryContext, because next time we try to check/assign s_s_names, it
> >> will free the previous result.
> >
> > I agree with you. A temporary context for the parser seems
> > reasonable. TopMemoryContext is created very early in main() so
> > palloc on it is effectively the same with malloc.
> > One problem is that only the top memory block is assumed to be
> > free()'d, not pfree()'d by guc_set_extra. It makes this quite
> > ugly..
> >
> > Maybe we shouldn't use the extra for this purpose.
> >
> > Thoughts?
> >
> 
> How about if check_hook just parses parameter in
> CurrentMemoryContext(i.g., T_AllocSetContext), and then the
> assign_hook copies syncrep_parse_result to TopMemoryContext.
> Because syncrep_parse_result is a global variable, these hooks can see it.

Hmm. Somewhat uneasy but should work. The attached patch does it.

> Here are some comments.
> 
> -SyncRepUpdateConfig(void)
> +SyncRepFreeConfig(SyncRepConfigData *config, bool itself, MemoryContext cxt)
> 
> Sorry, it's my bad. itself variables is no longer needed because
> SyncRepFreeConfig is called by only one function.
> 
> -void
> -SyncRepFreeConfig(SyncRepConfigData *config)
> +SyncRepConfigData *
> +SyncRepCopyConfig(SyncRepConfigData *oldconfig, MemoryContext targetcxt)
> 
> I'm not sure targetcxt argument is necessary.

Yes, these are just for signalling so removal doesn't harm.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 3c9142e..3d68fb5 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -68,6 +68,7 @@
 #include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/ps_status.h"
 
 /* User-settable parameters for sync rep */
@@ -361,11 +362,6 @@ SyncRepInitConfig(void)
 {
 	int			priority;
 
-	/* Update the config data of synchronous replication */
-	SyncRepFreeConfig(SyncRepConfig);
-	SyncRepConfig = NULL;
-	SyncRepUpdateConfig();
-
 	/*
 	 * Determine if we are a potential sync standby and remember the result
 	 * for handling replies from standby.
@@ -868,47 +864,50 @@ SyncRepUpdateSyncStandbysDefined(void)
 }
 
 /*
- * Parse synchronous_standby_names and update the config data
- * of synchronous standbys.
+ * Free a previously-allocated config data of synchronous replication.
  */
 void
-SyncRepUpdateConfig(void)
+SyncRepFreeConfig(SyncRepConfigData *config)
 {
-	int	parse_rc;
-
-	if (!SyncStandbysDefined())
+	if (!config)
 		return;
 
-	/*
-	 * check_synchronous_standby_names() verifies the setting value of
-	 * synchronous_standby_names before this function is called. So
-	 * syncrep_yyparse() must not cause an error here.
-	 */
-	syncrep_scanner_init(SyncRepStandbyNames);
-	parse_rc = syncrep_yyparse();
-	syncrep_scanner_finish();
-
-	if (parse_rc != 0)
-		ereport(ERROR,
-(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg_internal("synchronous_standby_names parser returned %d",
- parse_rc)));
-
-	SyncRepConfig = syncrep_parse_result;
-	syncrep_parse_result = NULL;
+	list_free_deep(config->members);
+	pfree(config);
 }
 
 /*
- * Free a previously-allocated config data of synchronous replication.
+ * Returns a copy of a replication config data into the TopMemoryContext.
  */
-void
-SyncRepFreeConfig(SyncRepConfigData *config)
+SyncRepConfigData *
+SyncRepCopyConfig(SyncRepConfigData *oldconfig)
 {
-	if (!config)
-		return;
+	MemoryContext		oldcxt;
+	SyncRepConfigData  *newconfig;
+	ListCell		   *lc;
 
-	list_free_deep(config->members);
-	pfree(config);
+	if (!oldconfig)
+		return NULL;
+
+	oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+
+	newconfig = (SyncRepConfigData *) palloc(sizeof(SyncRepConfigData));
+	newconfig->num_sync = oldconfig->num_sync;
+	newconfig->members = list_copy(oldconfig->members);
+
+	/*
+	 * The new members list is a combination of list cells on the new context
+	 * and data pointed from each cell on the old context. So we explicitly
+	 * copy the data.
+	 */
+	foreach (lc, newconfig->members)
+	{
+		lfirst(lc) = pstrdup((char *) lfirst(lc));
+	}
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return newconfig;
 }
 
 #ifdef USE_ASSERT_CHECKING
@@ -957,6 +956,8 @@ check_synchronous_standby_names(char **newval, void **extra, GucSource source)
 {
 	int	parse_rc;
 
+	Assert(syncrep_parse_result == NULL);
+
 	if (*newval != NULL && (*newval)[0] != '\0')
 	{
 		syncrep_scanner_init(*newval);
@@ -965,6 +966,7 @@ check_synchronous_standby_names(char **newval, void **extra, GucSource source)
 
 		if (parse_rc != 0)
 		{
+			syncrep_parse_result = NULL;
 			GUC_check_errcode(ERRCODE_SYNTAX_ERROR);
 			GUC_check_errdetail("synchronous_standby_names parser returned %d",
 		

Re: [HACKERS] Declarative partitioning

2016-04-17 Thread Amit Langote

Hi Ashutosh,

On 2016/04/15 18:48, Ashutosh Bapat wrote:
> With the new set of patches, I am getting following warnings but no
> compilation failures. Patches apply smoothly.
> partition.c:1216:21: warning: variable ‘form’ set but not used
> [-Wunused-but-set-variable]
> partition.c:1637:20: warning: variable ‘form’ set but not used
> [-Wunused-but-set-variable]

Ah, will fix.

> For creating partition the documentation seems to suggest the syntax
> create table t1_p1 partition of t1 for values {start {0} end {100}
> exclusive;
> but following syntax works
> create table t1_p1 partition of t1 for values start (0) end (100) exclusive;

Hmm, I see the following in docs:

...
and partition_bound_spec is:

FOR VALUES { list_spec | range_spec }

...

list_spec in FOR VALUES is:

{ IN ( expression [, ...] ) }

range_spec in FOR VALUES is:

{ START (lower-bound) [ INCLUSIVE | EXCLUSIVE ] |
  END (upper-bound) [ INCLUSIVE | EXCLUSIVE ] |
  START (lower-bound) [ INCLUSIVE | EXCLUSIVE ] END (upper-bound) [
INCLUSIVE | EXCLUSIVE ] }

By the way, I see that you are always specifying "exclusive" for end bound
in your examples.  The specification is optional if that wasn't apparent.
 Default (ie, without explicit specification) is [start, end).

>> I got rid of all the optimizer changes in the new version (except a line
>> or two).  I did that by switching to storing parent-child relationships in
>> pg_inherits so that all the existing optimizer code for inheritance sets
>> works unchanged.  Also, the implementation detail that required to put
>> only leaf partitions in the append list is also gone.  Previously no
>> storage was allocated for partitioned tables (either root or any of the
>> internal partitions), so it was being done that way.
>>
> 
> With this set of patches we are still flattening all the partitions.
> postgres=# create table t1 (a int, b int) partition by range(a);
> CREATE TABLE
> postgres=# create table t1_p1 partition of t1 for values start (0) end
> (100) exclusive partition by range(b);
> CREATE TABLE
> postgres=# create table t1_p1_p1 partition of t1_p1 for values start (0)
> end (100) exclusive;
> CREATE TABLE
> postgres=# explain verbose select * from t1;
>QUERY PLAN
> -
>  Append  (cost=0.00..32.60 rows=2262 width=8)
>->  Seq Scan on public.t1  (cost=0.00..0.00 rows=1 width=8)
>  Output: t1.a, t1.b
>->  Seq Scan on public.t1_p1  (cost=0.00..0.00 rows=1 width=8)
>  Output: t1_p1.a, t1_p1.b
>->  Seq Scan on public.t1_p1_p1  (cost=0.00..32.60 rows=2260 width=8)
>  Output: t1_p1_p1.a, t1_p1_p1.b
> (7 rows)
> Retaining the partition hierarchy would help to push-down join across
> partition hierarchy effectively.

Yes, it's still a flattened append (basically what optimizer currently
creates for arbitrary inheritance trees).  I didn't modify any of that yet.

>> Regarding 6, it seems to me that because Append does not have a associated
>> relid (like scan nodes have with scanrelid).  Maybe we need to either fix
>> Append or create some enhanced version of Append which would also support
>> dynamic pruning.
>>
> 
> Right, I think, Append might store the relid of the parent table as well as
> partitioning information at that level along-with the subplans.

For time being, I will leave this as yet unaddressed (I am thinking about
what is reasonable to do for this also considering Robert's comment).  Is
that OK?

> Some more comments
> Would it be better to declare PartitionDescData as
> {
> int nparts;
> PartitionInfo *partinfo; /* array of partition information structures. */
> }

I think that might be better.  Will do it the way you suggest.

> The new syntax allows CREATE TABLE to be specified as partition of an
> already partitioned table. Is it possible to do the same for CREATE FOREIGN
> TABLE? Or that's material for v2? Similarly for ATTACH PARTITION.

OK, I will address this in the next version.  One question though: should
foreign table be only allowed to be leaf partitions (ie, no PARTITION BY
clause in CREATE FOREIGN TABLE ... PARTITION OF)?  By the way, ATTACH
PARTITION only allows leaf partitions anyway.

Thanks a lot for the review!

Thanks,
Amt




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confusing comment in pg_upgrade in regards to VACUUM FREEZE

2016-04-17 Thread Craig Ringer
On 18 April 2016 at 12:11, David Rowley 
wrote:


> this is not helping me much as I don't really understand why
> pg_statistic need to be frozen?
>

Yeah, in particular it's not clear to me why pg_upgrade goes to such
efforts to freeze everything when it copies pg_clog over in
copy_clog_xlog_xid() .

Is it mainly a defense against the multixact format change?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-17 Thread Michael Paquier
On Mon, Apr 18, 2016 at 12:31 PM, Peter Eisentraut  wrote:
> I don't know if it's worth tracking this as an open item and thus
> kind-of release blocker if no one else has the problem.  But I
> definitely still have it.  My initial suspicion was that this had
> something to do with a partial upgrade to gcc 6 (not yet released), in
> other words a messed up system.  But I was able to reproduce it in a
> freshly installed chroot.  It only happens with various versions of gcc,
> but not with clang.  So I'm going to have to keep digging.

gcc is moving slowly but surely to have 6.0 in a released state btw:
https://gcc.gnu.org/ml/gcc/2016-04/msg00103.html
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: fix typo, duplicated word in indexam.sgml

2016-04-17 Thread Fujii Masao
On Tue, Apr 5, 2016 at 12:11 AM, Artur Zakirov  wrote:
> Hello,
>
> There is a duplicated word in indexam.sgml. The patch is attached.

Applied. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-17 Thread Kyotaro HORIGUCHI
At Sat, 16 Apr 2016 12:50:30 +0530, Amit Kapila  wrote 
in 
> On Fri, Apr 15, 2016 at 11:30 AM, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
> >
> > At Fri, 15 Apr 2016 08:52:56 +0530, Amit Kapila 
> wrote :
> > >
> > > How about if we do all the parsing stuff in temporary context and then
> copy
> > > the results using TopMemoryContext?  I don't think it will be a leak in
> > > TopMemoryContext, because next time we try to check/assign s_s_names, it
> > > will free the previous result.
> >
> > I agree with you. A temporary context for the parser seems
> > reasonable. TopMemoryContext is created very early in main() so
> > palloc on it is effectively the same with malloc.
> >
> > One problem is that only the top memory block is assumed to be
> > free()'d, not pfree()'d by guc_set_extra. It makes this quite
> > ugly..
> >
> 
> + newconfig = (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData));
> Is there a reason to use malloc here, can't we use palloc directly?

The reason is the memory block is to released using free() in
guc_extra_field (not guc_set_extra). Even if we allocate and
deallocate it using palloc/pfree, the 'extra' pointer to the
block in gconf cannot be NULLed there and guc_extra_field tries
freeing it again using free() then bang.

> Also
> for both the functions SyncRepCopyConfig() and SyncRepFreeConfig(), if we
> directly use TopMemoryContext inside the function (if required) rather than
> taking it as argument, then it will simplify the code a lot.

Either is fine. I placed the parameter in order to emphasize
where the memory block is placed on, other than current memory
context nor bare heap, rather than for some practical reasons.

> +SyncRepFreeConfig(SyncRepConfigData *config, bool itself, MemoryContext
> cxt)
> 
> Do we really need 'bool itself' parameter in above function?
> 
> + if (cxt)
> 
> + oldcxt = MemoryContextSwitchTo(cxt);
> 
> + list_free_deep(config->members);
> 
> +
> 
> + if(oldcxt)
> 
> + MemoryContextSwitchTo(oldcxt);
> Why do you need MemoryContextSwitchTo for freeing members?

Ah, sorry. It's just a slip of my fingers.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Confusing comment in pg_upgrade in regards to VACUUM FREEZE

2016-04-17 Thread David Rowley
I'd been wondering why a VACUUM FREEZE is performed during pg_upgrade,
so I opened up the code to see if that would help me determine why
this is.

I'm confronted with a comment, which states;

/*
* We do freeze after analyze so pg_statistic is also frozen. template0 is
* not frozen here, but data rows were frozen by initdb, and we set its
* datfrozenxid, relfrozenxids, and relminmxid later to match the new xid
* counter later.
*/

this is not helping me much as I don't really understand why
pg_statistic need to be frozen?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix of doc for synchronous_standby_names.

2016-04-17 Thread Kyotaro HORIGUCHI
Hello, now the synchronous_standby_names can teach to ensure more
then one synchronous standbys. But the doc for it seems assuming
only one synchronous standby.

> There is no mechanism to enforce uniqueness. In case of
> duplicates one of the matching standbys will be considered as
> higher priority, though exactly which one is indeterminate.

The patch attatched edits the above to the following.

> There is no mechanism to enforce uniqueness. In case of
> duplicates some of the matching standbys will be considered as
> higher priority, though they are chosen in an indeterminate way.

Is this makes sense?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f9ba148..7b2edbe 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3028,9 +3028,9 @@ include_dir 'conf.d'
 The name of a standby server for this purpose is the
 application_name setting of the standby, as set in the
 primary_conninfo of the standby's WAL receiver.  There is
-no mechanism to enforce uniqueness. In case of duplicates one of the
-matching standbys will be considered as higher priority, though
-exactly which one is indeterminate.
+no mechanism to enforce uniqueness. In case of duplicates some of the
+matching standbys will be considered as higher priority, though they
+are chosen in an indeterminate way.
 The special entry * matches any
 application_name, including the default application name
 of walreceiver.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-17 Thread Alvaro Herrera
Tom Lane wrote:

> After struggling with back-patching a GIN bug fix, I wish to offer up the
> considered opinion that this was an impressively bad idea.  It's inserted
> 450 or so pain points for back-patching, which we will have to deal with
> for the next five years.  Moreover, I do not believe that it will do a
> damn thing for ensuring that future calls of BufferGetPage think about
> what to do; they'll most likely be copied-and-pasted from nearby calls,
> just as people have always done.  With luck, the nearby calls will have
> the right semantics, but this change won't help very much at all if they
> don't.

I disagree.  A developer that sees an unadorned BufferGetPage() call
doesn't stop to think twice about whether they need to add a snapshot
test.  Many reviewers will miss the necessary addition also.  A
developer that sees BufferGetPage(NO_SNAPSHOT_TEST) will at least
consider the idea that the flag might be right; if that developer
doesn't think about it, some reviewer may notice a new call with the
flag and consider the idea that the flag may be wrong.

I understand the backpatching pain argument, but my opinion was the
contrary of yours even so.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-17 Thread Peter Eisentraut
On 04/15/2016 02:04 PM, Robert Haas wrote:
> Peter, are you going to look into this further?  This is on the open
> items list, but there seems to be nothing that can be done about it by
> anyone other than, maybe, you.

I don't know if it's worth tracking this as an open item and thus
kind-of release blocker if no one else has the problem.  But I
definitely still have it.  My initial suspicion was that this had
something to do with a partial upgrade to gcc 6 (not yet released), in
other words a messed up system.  But I was able to reproduce it in a
freshly installed chroot.  It only happens with various versions of gcc,
but not with clang.  So I'm going to have to keep digging.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Roles

2016-04-17 Thread Noah Misch
On Sun, Apr 17, 2016 at 08:04:03PM -0400, Noah Misch wrote:
> On Thu, Apr 07, 2016 at 03:50:47PM -0400, Stephen Frost wrote:
> > I'm planning to continue going over the patch tomorrow morning with
> > plans to push this before the feature freeze deadline.
> 
> > --- a/src/test/regress/expected/rolenames.out
> > +++ b/src/test/regress/expected/rolenames.out
> 
> > +GRANT testrol0 TO pg_abc; -- error
> > +ERROR:  role "pg_abc" is reserved
> > +DETAIL:  Cannot GRANT roles to a reserved role.
> 
> The server still accepts "ALTER ROLE testrol0 USER pg_signal_backend".  It
> should block this ALTER ROLE if it blocks the corresponding GRANT.

One more thing:

> --- a/src/bin/pg_dump/pg_dumpall.c
> +++ b/src/bin/pg_dump/pg_dumpall.c
> @@ -665,7 +665,7 @@ dumpRoles(PGconn *conn)
>   int i;
>  
>   /* note: rolconfig is dumped later */
> - if (server_version >= 90500)
> + if (server_version >= 90600)

This need distinct branches for 9.5 and for 9.6+.  Today's code would treat a
9.5 cluster like a 9.1 cluster and fail to dump rolbypassrls attributes.

>   printfPQExpBuffer(buf,
> "SELECT oid, rolname, 
> rolsuper, rolinherit, "
> "rolcreaterole, rolcreatedb, "
> @@ -674,6 +674,7 @@ dumpRoles(PGconn *conn)
>"pg_catalog.shobj_description(oid, 'pg_authid') as 
> rolcomment, "
> "rolname = current_user AS 
> is_current_user "
> "FROM pg_authid "
> +   "WHERE rolname !~ '^pg_' "
> "ORDER BY 2");
>   else if (server_version >= 90100)
>   printfPQExpBuffer(buf,


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dump catalog ACLs

2016-04-17 Thread Noah Misch
On Tue, Apr 05, 2016 at 05:50:18PM -0400, Stephen Frost wrote:
> I'll be doing more testing, review and clean-up (there are some
> whitespace only changes in the later patches that really should be
> included in the earlier patches where the code was actually changed)
> tonight with plans to push this tomorrow night.

(1) I ran into trouble reconciling the scope of dumpable ACLs.  The commit
message indicated this scope:

> Subject: [PATCH 4/5] In pg_dump, include pg_catalog and extension ACLs, if
>  changed
> 
> Now that all of the infrastructure exists, add in the ability to
> dump out the ACLs of the objects inside of pg_catalog or the ACLs
> for objects which are members of extensions, but only if they have
> been changed from their original values.

I wrote the attached test script to verify which types of ACLs dump/reload
covers.  Based on the commit message, I expected these results:

  Dumped: type, class, attribute, proc, largeobject_metadata,
  foreign_data_wrapper, foreign_server,
  language(in-extension), namespace(in-extension)
  Not dumped: database, tablespace,
  language(from-initdb), namespace(from-initdb)

Did I interpret the commit message correctly?  The script gave these results:

  Dumped: type, class, attribute, namespace(from-initdb)
  Not dumped: database, tablespace, proc, language(from-initdb)
  Not tested: largeobject_metadata, foreign_data_wrapper, foreign_server,
  language(in-extension), namespace(in-extension)

I didn't dig into why that happened; the choice of object type may not even be
the root cause in each case.  Which of those results, if any, surprise you?


(2) pg_dump queries:

> @@ -14187,18 +14869,65 @@ dumpTable(Archive *fout, TableInfo *tbinfo)

> +   "FROM 
> pg_catalog.pg_attribute at "
> +"JOIN pg_catalog.pg_class c ON 
> (at.attrelid = c.oid) "
> +   "LEFT JOIN 
> pg_init_privs pip ON "

Since pg_attribute and pg_class require schema qualification here, so does
pg_init_privs.  Likewise elsewhere in the patch's pg_dump changes.


(3) pg_dumpall became much slower around the time of these commits.  On one
machine (POWER7 3.55 GHz), a pg_dumpall just after initdb slowed from 0.25s at
commit 6c268df^ to 4.0s at commit 7a54270.  On a slower machine (Opteron
1210), pg_dumpall now takes 19s against such a fresh cluster.


I doubt I'll review this patch, but those things have come to my attention.
SELECT proacl FROM pg_proc WHERE proname = 'pg_sleep';
\dT+ money
\z pg_catalog.pg_database
\z information_schema.domains
\dL+ sql
\dn+ public
\l+ template1
\db+ pg_default

REVOKE EXECUTE ON FUNCTION pg_sleep(float8) FROM PUBLIC;
REVOKE USAGE ON TYPE money FROM PUBLIC;
REVOKE SELECT ON pg_database FROM PUBLIC;
GRANT SELECT (oid) ON pg_database TO PUBLIC;
REVOKE SELECT ON information_schema.domains FROM PUBLIC;
GRANT SELECT (domain_name) ON information_schema.domains TO PUBLIC;
REVOKE USAGE ON LANGUAGE sql FROM PUBLIC;
-- TODO pg_largeobject_metadata.lomacl
REVOKE USAGE ON SCHEMA public FROM PUBLIC;
REVOKE CONNECT ON DATABASE template1 FROM PUBLIC;
GRANT CREATE ON TABLESPACE pg_default TO PUBLIC;
-- SKIP pltemplate: deprecated; no DDL for it
-- TODO pg_foreign_data_wrapper.fdwacl
-- TODO pg_foreign_server.srvacl

SELECT proacl FROM pg_proc WHERE proname = 'pg_sleep';
\dT+ money
\z pg_catalog.pg_database
\z information_schema.domains
\dL+ sql
\dn+ public
\l+ template1
\db+ pg_default

/* RESULT:
$ pg_dumpall | grep -E '(GRANT|REVOKE)'
GRANT ALL ON DATABASE template1 TO nm;
REVOKE ALL ON SCHEMA public FROM PUBLIC;
GRANT CREATE ON SCHEMA public TO PUBLIC;
REVOKE ALL ON TYPE money FROM PUBLIC;
REVOKE SELECT ON TABLE pg_database FROM PUBLIC;
GRANT SELECT(oid) ON TABLE pg_database TO PUBLIC;
*/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-17 Thread Michael Paquier
On Mon, Apr 18, 2016 at 9:52 AM, Kevin Grittner  wrote:
> On Sun, Apr 17, 2016 at 5:15 PM, Tom Lane  wrote:
>> Kevin Grittner  writes:
>>> On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera>  
>>> wrote:
 Kevin Grittner wrote:
>> I think we should revert BufferGetPage to be what it was before (with
>> no snapshot test) and invent BufferGetPageExtended or similar to be
>> used in the small number of places that need a snapshot test.
>
> I'm not sure what BufferGetPageExtended() buys us over simply
> inserting TestForOldSnapshot() where it is needed.  Other than that
> question, I have no objections to the course outlined, but figure I
> should not jump on it without allowing at least a couple days for
> discussion.  That also may give me time to perform the benchmarks I
> wanted -- VPN issues have blocked me from the big test machines so
> far.  I think I see where the time may be going when the feature is
> disabled, and if I'm right I have a fix; but without a big NUMA
> machine there is no way to confirm it.

TBH, BufferGetPageExtended() still looks like a good idea to me.
Backpatching those code paths is going to make the maintenance far
harder, on top of the compilation of extensions for perhaps no good
reason. Even if this is a low-level change, if this feature goes in
with 9.6, it would be really good to mention as well that callers of
BufferGetPage should update their calls accordingly if they care about
the checks with the old snapshot. This is a routine used a lot in many
plugins and extensions. Usually such low-level things are not
mentioned in the release notes, but this time I think that's really
important to say it loudly.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-17 Thread Kevin Grittner
On Sun, Apr 17, 2016 at 5:15 PM, Tom Lane  wrote:
> Kevin Grittner  writes:
>> On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera>  
>> wrote:
>>> Kevin Grittner wrote:
 On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera  
 wrote:
 I said that we should change BufferGetPage into having the snapshot
 check built-in, except in the cases where a flag is passed; and the flag
 would be passed in all cases except those 30-something you identified.
 In other words, the behavior in all the current callsites would be
 identical to what's there today; we could have a macro do the first
 check so that we don't introduce the overhead of a function call in the
 450 cases where it's not needed.
>
>> Attached is what I think you're talking about for the first patch.
>> AFAICS this should generate identical executable code to unpatched.
>> Then the patch to actually implement the feature would, instead
>> of adding 30-some lines with TestForOldSnapshot() would implement
>> that as the behavior for the other enum value, and alter those
>> 30-some BufferGetPage() calls.
>
>> Álvaro and Michael, is this what you were looking for?
>
>> Is everyone else OK with this approach?
>
> After struggling with back-patching a GIN bug fix, I wish to offer up the
> considered opinion that this was an impressively bad idea.  It's inserted
> 450 or so pain points for back-patching, which we will have to deal with
> for the next five years.  Moreover, I do not believe that it will do a
> damn thing for ensuring that future calls of BufferGetPage think about
> what to do; they'll most likely be copied-and-pasted from nearby calls,
> just as people have always done.  With luck, the nearby calls will have
> the right semantics, but this change won't help very much at all if they
> don't.
>
> I think we should revert BufferGetPage to be what it was before (with
> no snapshot test) and invent BufferGetPageExtended or similar to be
> used in the small number of places that need a snapshot test.

I'm not sure what BufferGetPageExtended() buys us over simply
inserting TestForOldSnapshot() where it is needed.  Other than that
question, I have no objections to the course outlined, but figure I
should not jump on it without allowing at least a couple days for
discussion.  That also may give me time to perform the benchmarks I
wanted -- VPN issues have blocked me from the big test machines so
far.  I think I see where the time may be going when the feature is
disabled, and if I'm right I have a fix; but without a big NUMA
machine there is no way to confirm it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Roles

2016-04-17 Thread Noah Misch
On Thu, Apr 07, 2016 at 03:50:47PM -0400, Stephen Frost wrote:
> I'm planning to continue going over the patch tomorrow morning with
> plans to push this before the feature freeze deadline.

> --- a/src/test/regress/expected/rolenames.out
> +++ b/src/test/regress/expected/rolenames.out

> +GRANT testrol0 TO pg_abc; -- error
> +ERROR:  role "pg_abc" is reserved
> +DETAIL:  Cannot GRANT roles to a reserved role.

The server still accepts "ALTER ROLE testrol0 USER pg_signal_backend".  It
should block this ALTER ROLE if it blocks the corresponding GRANT.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pgbench with -f and -S

2016-04-17 Thread Robins Tharakan
On 17 April 2016 at 21:24, Fabien COELHO  wrote:

> For the future 9.6, scripts options are cumulatives, so -f & -S can be
> combined.
>
> Indeed, for the <= 9.5 it seems that some options are silently ignores,
> when combining -S/-f, only the last one is kept, it should be warned about.
>

​
​Thanks Fabien, for confirming about the missing warning.

Also, by 'combined' I think you mean that both (built-in SELECTs & Custom
Script) run, although the dev docs don't (yet) say anything about that.

-
robins


Re: [HACKERS] snapshot too old, configured by time

2016-04-17 Thread Tom Lane
Kevin Grittner  writes:
> On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera
>  wrote:
>> Kevin Grittner wrote:
>>> On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera  
>>> wrote:
>>> I said that we should change BufferGetPage into having the snapshot
>>> check built-in, except in the cases where a flag is passed; and the flag
>>> would be passed in all cases except those 30-something you identified.
>>> In other words, the behavior in all the current callsites would be
>>> identical to what's there today; we could have a macro do the first
>>> check so that we don't introduce the overhead of a function call in the
>>> 450 cases where it's not needed.

> Attached is what I think you're talking about for the first patch.
> AFAICS this should generate identical executable code to unpatched.
> Then the patch to actually implement the feature would, instead
> of adding 30-some lines with TestForOldSnapshot() would implement
> that as the behavior for the other enum value, and alter those
> 30-some BufferGetPage() calls.

> Álvaro and Michael, is this what you were looking for?

> Is everyone else OK with this approach?

After struggling with back-patching a GIN bug fix, I wish to offer up the
considered opinion that this was an impressively bad idea.  It's inserted
450 or so pain points for back-patching, which we will have to deal with
for the next five years.  Moreover, I do not believe that it will do a
damn thing for ensuring that future calls of BufferGetPage think about
what to do; they'll most likely be copied-and-pasted from nearby calls,
just as people have always done.  With luck, the nearby calls will have
the right semantics, but this change won't help very much at all if they
don't.

I think we should revert BufferGetPage to be what it was before (with
no snapshot test) and invent BufferGetPageExtended or similar to be
used in the small number of places that need a snapshot test.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL certificate location

2016-04-17 Thread Christoph Moench-Tegeder
## Terence Ferraro (terencejferr...@gmail.com):

> At the moment, if a user has multiple applications on a single machine
> connecting with different SSL certificates, each process must be launched
> by a different logical user and the certificates placed within that user's
> home directory (and this is just for *nix, forget about Windows). The
> current method is not scalable, either.

That is incorrect.
http://www.postgresql.org/docs/current/static/libpq-ssl.html
http://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS
http://www.postgresql.org/docs/current/static/libpq-envars.html

Connection parameters are "sslcert" and "sslkey", environment variables
"PGSSLCERT" and "PGSSLKEY".
You can also specify parameters in your .pg_service.conf.

Regards,
Christoph

-- 
Spare Space


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SSL certificate location

2016-04-17 Thread Terence Ferraro
I'm not sure if this may be of any utility value to anyone else, but, the
attached patch enables an environment variable to be provided to libpq to
specify where to find the SSL certificate/key files used for a secure
connection.

At the moment, if a user has multiple applications on a single machine
connecting with different SSL certificates, each process must be launched
by a different logical user and the certificates placed within that user's
home directory (and this is just for *nix, forget about Windows). The
current method is not scalable, either.

With the attached patch, the user just sets the environment variable e.g.

PGSQL_SSL_PATH=/home/test/cert_directory/app_1/ /usr/local/pgsql/bin/psql
-U postgres -h 127.0.0.1 -p 5432
PGSQL_SSL_PATH=/home/test/cert_directory/app_2/ /usr/local/pgsql/bin/psql
-U postgres -h 127.0.0.1 -p 5433

It follows the same existing conventions by looking for the actual
certificates within the .postgresql sub-directory of the provided path.


*Terence J. Ferraro*
--- a/postgresql-9.5.2/src/interfaces/libpq/fe-secure-openssl.c 2016-03-28 
16:07:39.0 -0400
+++ b/postgresql-9.5.2/src/interfaces/libpq/fe-secure-openssl.c 2016-04-15 
23:12:17.493355856 -0400
@@ -35,6 +35,7 @@
 #else
 #include 
 #include 
+#include 
 #include 
 #include 
 #ifdef HAVE_NETINET_TCP_H
@@ -936,7 +937,14 @@
boolhave_homedir;
boolhave_cert;
EVP_PKEY   *pkey = NULL;
-
+   char*custom_homedir;
+   boolhave_custom_homedir;
+   
+   custom_homedir = getenv("PGSQL_SSL_PATH");
+   
+   if(custom_homedir == NULL) { have_custom_homedir = false; }
+   else { have_custom_homedir = true; }
+   
/*
 * We'll need the home directory if any of the relevant parameters are
 * defaulted.  If pqGetHomeDirectory fails, act as though none of the
@@ -953,6 +961,9 @@
/* Read the client certificate file */
if (conn->sslcert && strlen(conn->sslcert) > 0)
strlcpy(fnbuf, conn->sslcert, sizeof(fnbuf));
+   /* ENV variable specified, load that certificate file */
+   else if (have_custom_homedir)
+   snprintf(fnbuf, sizeof(fnbuf), "%s/%s", custom_homedir, 
USER_CERT_FILE);
else if (have_homedir)
snprintf(fnbuf, sizeof(fnbuf), "%s/%s", homedir, 
USER_CERT_FILE);
else
@@ -1146,6 +1157,11 @@
strlcpy(fnbuf, conn->sslkey, sizeof(fnbuf));
}
}
+   else if (have_custom_homedir)
+   {
+   /* ENV variable specified, load that file */
+   snprintf(fnbuf, sizeof(fnbuf), "%s/%s", custom_homedir, 
USER_KEY_FILE);
+   }
else if (have_homedir)
{
/* No PGSSLKEY specified, load default file */
@@ -1207,6 +1223,8 @@
 */
if (conn->sslrootcert && strlen(conn->sslrootcert) > 0)
strlcpy(fnbuf, conn->sslrootcert, sizeof(fnbuf));
+   else if (have_custom_homedir)
+   snprintf(fnbuf, sizeof(fnbuf), "%s/%s", custom_homedir, 
ROOT_CERT_FILE);
else if (have_homedir)
snprintf(fnbuf, sizeof(fnbuf), "%s/%s", homedir, 
ROOT_CERT_FILE);
else
@@ -1245,6 +1263,8 @@
{
if (conn->sslcrl && strlen(conn->sslcrl) > 0)
strlcpy(fnbuf, conn->sslcrl, sizeof(fnbuf));
+   else if (have_custom_homedir)
+   snprintf(fnbuf, sizeof(fnbuf), "%s/%s", 
custom_homedir, ROOT_CRL_FILE);
else if (have_homedir)
snprintf(fnbuf, sizeof(fnbuf), "%s/%s", 
homedir, ROOT_CRL_FILE);
else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-17 Thread Jeff Janes
On Tue, Apr 12, 2016 at 9:53 AM, Teodor Sigaev  wrote:
>
> With pending cleanup patch backend will try to get lock on metapage with
> ConditionalLockPage. Will it interrupt autovacum worker?


Correct, ConditionalLockPage should not interrupt the autovacuum worker.

>>
>> Alvaro's recommendation, to let the cleaner off the hook once it
>> passes the page which was the tail page at the time it started, would
>> prevent any process from getting pinned down indefinitely, but would
>> not prevent the size of the list from increasing without bound.  I
>> think that would probably be good enough, because the current
>> throttling behavior is purely accidentally and doesn't *guarantee* a
>> limit on the size of the pending list.
>
> Added, see attached patch (based on v3.1)

With this applied, I am getting a couple errors I have not seen before
after extensive crash recovery testing:

ERROR:  attempted to delete invisible tuple

ERROR:  unexpected chunk number 1 (expected 2) for toast value
100338365 in pg_toast_16425

I've restarted the test harness with intentional crashes turned off,
to see if the problems are related to crash recovery or are more
generic than that.

I've never seen these particular problems before, so don't have much
insight into what might be going on or how to debug it.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Can we improve this error message?

2016-04-17 Thread Bill Moran

Here's an interesting scenario I happened across recently.

If you have a single line in the pg_hba.conf:

hostssl all all 0.0.0.0/0 md5

Attempting to log in with an incorrect password results in an
error message about there not being a pg_hba.conf entry for the
user.

Reading carefully, the error message states that there's no
pg_hba.conf for the user with **ssl off**.

What I believe is happening, is that the pg connection libs
first try to connect via ssl and get a password failed error,
then fallback to trying to connect without ssl, and get a "no
pg_hba.conf entry" error. The problem is that the second error
masks the first one, hiding the real cause of the connection
failure, and causing a lot of confusion.

If we could keep both errors and report them both, I feel like
it would be an improvement to our client library behavior.

-- 
Bill Moran


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-17 Thread Andres Freund
> Second, I don't think it will improve and become performant without
> exposure to a wider audience.

Huh? The issue is a relatively simple to spot architectural issue
(taking a single exclusive lock during snapshot acquiration which only
needs shared locks otherwise) - I don't see how any input it's needed.
And for that matter, I don't see why such a lock got through review.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-17 Thread Amit Kapila
On Fri, Apr 15, 2016 at 1:59 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Thu, Apr 14, 2016 at 5:35 AM, Andres Freund  wrote:
>
>> On 2016-04-14 07:59:07 +0530, Amit Kapila wrote:
>> > What you want to see by prewarming?
>>
>> Prewarming appears to greatly reduce the per-run variance on that
>> machine, making it a lot easier to get meaningful results.  Thus it'd
>> make it easier to compare pre/post padding numbers.
>>
>> > Will it have safe effect, if the tests are run for 10 or 15 mins
>> > rather than 5 mins?
>>
>> s/safe/same/? If so, no, I doesn't look that way. The order of buffers
>> appears to play a large role; and it won't change in a memory resident
>> workload over one run.
>>
>
> I've tried to run read-only benchmark of pad_pgxact_v1.patch on 4x18 Intel
> machine.  The results are following.
>
> clients no-pad  pad
> 1   12997   13381
> 2   26728   25645
> 4   52539   51738
> 8   103785  102337
> 10  132606  126174
> 20  255844  252143
> 30  371359  357629
> 40  450429  443053
> 50  497705  488250
> 60  564385  564877
> 70  718860  725149
> 80  934170  931218
> 90  1152961 1146498
> 100 1240055 1300369
> 110 1207727 1375706
> 120 1167681 1417032
> 130 1120891 1448408
> 140 1085904 1449027
> 150 1022160 1437545
> 160 976487  1441720
> 170 978120  1435848
> 180 953843  1414925
>
> snapshot_too_old patch was reverted in the both cases.
> On high number of clients padding gives very significant benefit.
>


These results indicates that the patch is a win.   Are these results median
of 3 runs or single run data.  By the way can you share the output of lscpu
command on this m/c.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-17 Thread Amit Kapila
On Thu, Apr 14, 2016 at 8:05 AM, Andres Freund  wrote:
>
> On 2016-04-14 07:59:07 +0530, Amit Kapila wrote:
> > What you want to see by prewarming?
>
> Prewarming appears to greatly reduce the per-run variance on that
> machine, making it a lot easier to get meaningful results.
>

I think you are referring the tests done by Robert on power-8 m/c, but the
performance results I have reported were on intel x86.  In last two days, I
have spent quite some effort to do the performance testing of this patch
with pre-warming by using the same query [1] as used by Robert in his
tests.  The tests are done such that first it start server, pre-warms the
relations, ran read-only test, stop server, again repeat this for next
test.  I have observed that the variance in run-to-run performance still
occurs especially at higher client count (128).  Below are results for 128
client count both when the tests ran first with patch and then with HEAD
and vice versa.

Test-1
--
client count - 128 (basically -c 128 -j 128)

first tests ran with patch and then with HEAD

Patch_ver/Runs HEAD (commit -70715e6a) Patch
Run-1 156748 174640
Run-2 151352 150115
Run-3 177940 165269


Test-2
--
client count - 128 (basically -c 128 -j 128)

first tests ran with HEAD and then with patch

Patch_ver/Runs HEAD (commit -70715e6a) Patch
Run-1 173063 151282
Run-2 173187 140676
Run-3 177046 166726

I think this patch (padding pgxact) certainly is beneficial as reported
above thread. At very high client count some variation in performance is
observed with and without patch, but I feel in general it is a win.


[1] - psql -c "select sum(x.x) from (select pg_prewarm(oid) as x from
pg_class where relkind in ('i', 'r') order by oid) x;"

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Pgbench with -f and -S

2016-04-17 Thread Fabien COELHO



Pgbench allows -f and -S combinations together where the doc says that -S
effectively uses the internal select-only script.

Is it okay to assume that -f is disregarded here? Or are they run in
round-robin fashion (although then, how does it know which read-only part
of my script to run?) or something else like that?

Effectively, I think it should raise NOTICE that -f is useless here.


For the future 9.6, scripts options are cumulatives, so -f & -S can be 
combined.


Indeed, for the <= 9.5 it seems that some options are silently ignores, 
when combining -S/-f, only the last one is kept, it should be warned 
about.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-17 Thread David Steele
On 4/16/16 4:44 PM, Noah Misch wrote:

> The key judgment to finalize here is whether it's okay to release this feature
> given its current effect[1], when enabled, on performance.  That is more
> controversial than the potential ~2% regression for old_snapshot_threshold=-1.
> Alvaro[2] and Robert[3] are okay releasing that way, and Andres[4] is not.  If
> anyone else wants to weigh in, now is the time.

I'm in favor of releasing the feature even with the performance
regression when enabled.  First, there are use cases where a feature
like this is absolutely critical.  Second, I don't think it will improve
and become performant without exposure to a wider audience.

I think it's entirely within the PostgreSQL philosophy to release a
feature that has warts and doesn't perform as well as we'd like as long
as it is stable and does not corrupt data.

In my opinion this feature meets these criteria and it is an important
capability to add to PostgreSQL.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN VERBOSE with parallel Aggregate

2016-04-17 Thread Tom Lane
David Rowley  writes:
> On 16 April 2016 at 04:27, Tom Lane  wrote:
>> +1 for the latter, if we can do it conveniently.  I think exposing
>> the names of the aggregate implementation functions would be very
>> user-unfriendly, as nobody but us hackers knows what those are.

> It does not really seem all that convenient to do this. It also seems
> a bit strange to me to have a parent node report a column which does
> not exist in any nodes descending from it. Remember that the combine
> Aggref does not have the same ->args as its corresponding partial
> Aggref. It's not all that clear to me if there is any nice way to do
> have this work the way you'd like. If we were to just call
> get_rule_expr() on the first arg of the combine aggregate node, it
> would re-print the PARTIAL keyword again.

This suggests to me that the parsetree representation for partial
aggregates was badly chosen.  If EXPLAIN can't recognize them, then
neither can anything else, and we may soon find needs for telling
the difference that are not just cosmetic.  Maybe we need more
fields in Aggref.

> Note that I've done nothing for the weird schema prefixing problem I
> mentioned. I think I'd need input on the best way to solve this. If
> it's actually a problem.

It sure looks like a problem to me.  Maybe it's only cosmetic, but
it's going to cause confusion and bug reports.  EXPLAIN output for
parallel queries is going to be confusing enough anyway; we need
to avoid having artifacts like this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-04-17 Thread Craig Ringer
Hi all

I made an unfortunate oversight in the logical decoding timeline following
code: it doesn't work for logical decoding from the walsender, because I
left the glue code required out of the final cut of the patch.

The reason the new src/test/recovery/ tests don't notice this is that using
pg_recvlogical from the TAP tests is currently pretty awkward.
pg_recvlogical has no way to specify a stop-point or return when there's no
immediately pending data like the SQL interface does. So you have to read
from it until you figure it's not going to return anything more then kill
it and look at what it did return and hope you don't lose anything in
buffering.I don't much like relying on timeouts as part of normal
successful results since they can upset some of the older and slower
buildfarm members. I'd rather be able to pass a --stoppos= or a --n-xacts=
option, but it was a bit too late to add those.

Per the separate mail I sent to hackers, xlogreader is currently pretty
much timeline-agnostic. The timeline tracking code needs to know when the
xlogreader does a random read and do a fresh lookup so its state is part of
the XLogReaderState struct. But the walsender's xlogreader page read
callback doesn't use the xlogreader's state, and it can't because it's also
used for physical replication, which communicates the timeline to read from
with the page read function via globals. So for now, I just set the globals
before calling the page read function:

+   XLogReadDetermineTimeline(state);
+   sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
+   sendTimeLine = state->currTLI;
+   sendTimeLineValidUpto = state->currTLIValidUntil;
+   sendTimeLineNextTLI = state->nextTLI;

Per separate mail, I'd quite like to tackle the level of duplication in
timeline following logic in 9.7 and maybe see if the _three_ separate read
xlog page functions can be unified at the same time. But that sure isn't
9.6 material.
From c56a32b5ace8a48908da366e5f778fa98a125740 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 14 Apr 2016 15:43:20 +0800
Subject: [PATCH] Enable logical timeline following in the walsender

---
 src/backend/access/transam/xlogutils.c |  7 +++
 src/backend/replication/walsender.c| 11 +++
 src/include/access/xlogreader.h|  3 +++
 src/include/access/xlogutils.h |  2 ++
 4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index c3213ac..c3b0e5c 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -773,7 +773,7 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count)
  * might be reading from historical timeline data on a segment that's been
  * copied to a new timeline.
  */
-static void
+void
 XLogReadDetermineTimeline(XLogReaderState *state)
 {
 	/* Read the history on first time through */
@@ -856,12 +856,11 @@ XLogReadDetermineTimeline(XLogReaderState *state)
 			   state->currTLIValidUntil == InvalidXLogRecPtr)
 		{
 			XLogRecPtr	tliSwitch;
-			TimeLineID	nextTLI;
 
 			CHECK_FOR_INTERRUPTS();
 
 			tliSwitch = tliSwitchPoint(state->currTLI, state->timelineHistory,
-	   &nextTLI);
+	   &state->nextTLI);
 
 			/* round ValidUntil down to start of seg containing the switch */
 			state->currTLIValidUntil =
@@ -875,7 +874,7 @@ XLogReadDetermineTimeline(XLogReaderState *state)
  *
  * If that's the current TLI we'll stop searching.
  */
-state->currTLI = nextTLI;
+state->currTLI = state->nextTLI;
 state->currTLIValidUntil = InvalidXLogRecPtr;
 			}
 		}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e4a0119..495bff2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -47,6 +47,7 @@
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog_internal.h"
+#include "access/xlogutils.h"
 
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
@@ -756,6 +757,12 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
 	XLogRecPtr	flushptr;
 	int			count;
 
+	XLogReadDetermineTimeline(state);
+	sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
+	sendTimeLine = state->currTLI;
+	sendTimeLineValidUpto = state->currTLIValidUntil;
+	sendTimeLineNextTLI = state->nextTLI;
+
 	/* make sure we have enough WAL available */
 	flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
 
@@ -984,10 +991,6 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	pq_endmessage(&buf);
 	pq_flush();
 
-	/* setup state for XLogReadPage */
-	sendTimeLineIsHistoric = false;
-	sendTimeLine = ThisTimeLineID;
-
 	/*
 	 * Initialize position to the last ack'ed one, then the xlog records begin
 	 * to be shipped from that position.
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 300747d..bbee552 100644
--- a/src

[HACKERS] Timeline following is a bit tangled...

2016-04-17 Thread Craig Ringer
Hi all

As part of the work I did on timeline following for logical decoding I
mapped out the various code paths relating to timeline following in Pg.

https://wiki.postgresql.org/wiki/TimelineFollowing97

It's surprisingly complex (to me), with lots of completely separate logic
for each different path. Redo has one way to decide when to switch
timelines and which WAL segment to read from. The walsender has two, one
for physical replication and one (with a small overlap) for logical
replication. Logical replication over the SQL interface now has another,
which overlaps mostly but not entirely with the logical walsender one.

One thing that makes it very hard to follow the code (IMO) is that the
xlogreader is totally timeline agnostic. The xlogreader's callers decide
which timeline to read from and when to switch timelines. The actual WAL
segment to read from is determined by the read page callback that the
xlogreader invokes. The callback figures out the timeline by looking
"around" the xlogreader at global state in xlog.c (for redo) or walsender.c
(for phys/logical walsender).

Each place has its own logic for things like the early timeline switch
required to ensure that we read from a segment that's actually locally
present, since older timelines of the same segment won't be present or will
be renamed .partial .

I'd like to reduce the duplication here and try to make it a bit easier to
follow. If doing so doesn't seem worth the (undeniable) risks when messing
with redo then I'll just leave it untouched, I don't feel so strongly about
it as all that.


Because physical rep doesn't use the xlogreader it doesn't make sense to
just add timeline following to the xlogreader directly. It has to be
separate, usable by physical rep and the xlogreader. I think it should be
reasonable to have them both use the same state struct and function though,
where they can just call a func before reading each page to update the
timeline to read from if needed, then have their page read callback use
that timeline. It can keep track of the next timeline, the TLI switchpoint,
whether the timeline became historical since the last page was read, a copy
of the latest timeline history, etc etc, and should probably live in
timeline.c.

I'm not especially thrilled with the code I wrote for logical decoding
timeline following there, and I'm actually more inclined to base that logic
on the physical walsender's code, which is IMO the clearest we currently
have. Extract it, move its state globals into a struct, generalize it.

Sound not completely insane?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] EXPLAIN VERBOSE with parallel Aggregate

2016-04-17 Thread David Rowley
On 16 April 2016 at 04:27, Tom Lane  wrote:
> Robert Haas  writes:
>> I definitely agree that the current output is messed up, but I'm not
>> sure your proposed output is much better.  I wonder if it shouldn't
>> say something like:
>> Output: serialfn(transfn(args))
>> for the partial aggregate and
>> Output: finalfn(combinefn(deserialfn(args)))
>> for the finalize aggregate step.
>
>> Or maybe just insert the word PARTIAL before each partial aggregate
>> step, like PARTIAL sum(num) for the partial step and then just
>> sum(num) for the final step.
>
> +1 for the latter, if we can do it conveniently.  I think exposing
> the names of the aggregate implementation functions would be very
> user-unfriendly, as nobody but us hackers knows what those are.

It does not really seem all that convenient to do this. It also seems
a bit strange to me to have a parent node report a column which does
not exist in any nodes descending from it. Remember that the combine
Aggref does not have the same ->args as its corresponding partial
Aggref. It's not all that clear to me if there is any nice way to do
have this work the way you'd like. If we were to just call
get_rule_expr() on the first arg of the combine aggregate node, it
would re-print the PARTIAL keyword again.

In the attached I have it displaying as:

postgres=# explain verbose select count(*),max(a),avg(a) filter (where
a = 0) from t;
  QUERY PLAN
---
 Finalize Aggregate  (cost=14739.56..14739.57 rows=1 width=44)
   Output: pg_catalog.count(*), max((max(a))), pg_catalog.avg((PARTIAL
avg(a) FILTER (WHERE (a = 0
   ->  Gather  (cost=14739.33..14739.54 rows=2 width=44)
 Output: (count(*)), (max(a)), (PARTIAL avg(a) FILTER (WHERE (a = 0)))
 Workers Planned: 2
 ->  Partial Aggregate  (cost=13739.33..13739.34 rows=1 width=44)
   Output: count(*), max(a), PARTIAL avg(a) FILTER (WHERE (a = 0))
   ->  Parallel Seq Scan on public.t  (cost=0.00..9572.67
rows=416667 width=4)
 Output: a
(9 rows)

I like this much better, as there's no fudging of any function
arguments to print them as something they're not.

Note that max() does not get tagged with PARTIAL since it has no
finalfn. Obtaining this information does require a syscache lookup in
get_agg_expr(), but I've managed to make that only happen when
aggpartial is true.

>> I think ending up with sum(sum(num)) is
>> right out.  It doesn't look so bad for that case but avg(avg(num))
>> would certainly imply something that's not the actual behavior.
>
> Agreed.

Note that I've done nothing for the weird schema prefixing problem I
mentioned. I think I'd need input on the best way to solve this. If
it's actually a problem.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


parallel_agg_explain_verbose_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers