date:20150306

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
 On Thu, Mar  5, 2015 at 11:15:55AM -0500, Stephen Frost wrote:
  * Bruce Momjian (br...@momjian.us) wrote:
   On Wed, Mar  4, 2015 at 05:56:25PM -0800, Josh Berkus wrote:
So, are we more worried about attackers getting a copy of pg_authid, or
sniffing the hash on the wire?
   
   Both.  Stephen is more worried about pg_authid, but I am more worried
   about sniffing.
  
  I'm also worried about both, but if the admin is worried about sniffing
  in their environment, they're much more likely to use TLS than to set up
  client side certificates, kerberos, or some other strong auth mechanism,
  simply because TLS is pretty darn easy to get working and distros set it
  up for you by default.
 
 I think your view might be skewed.  I think there many people who care
 about password security who don't care to do TLS.

I'm not quite sure that I follow what you mean about caring for
password security.

I'll try to clarify by suggesting a few things that I think you might
mean and will respond to them.  Please clarify if I've completely missed
what you're getting at here.

If the concern is database access due to an attacker who can sniff the
network data then the approach which I suggested would make things
materially worse for users who do not employ TLS.  Once the attacker has
sniffed the network for a little while, they'll have one and likely more
challenge/responses which they could use to attempt to log in with.  As
discussed, our existing challenge/response system is vulnerable to
network sniffing based replay attacks also.

If the concern is database access due to an attacker who can see the
on-disk data, then the current situation makes it trivial for the
attacker to log in, while the approach I've suggested would require the
attacker to reverse hash(salt+auth_token) (where auth_token here is
taken to represent what we currently store on disk; even with my
suggestion, the attacker would not need to reverse the
hash(username+password) to gain database access).

If the concern is about getting at the cleartext password, then I don't
believe things are materially worse with either approach assuming a
network-based sniff attack.  Both require that the attacker reverse
hash(salt+hash(username+password)) where the salt and password are not
known.

If the concern is about getting the cleartext password as it resides on
disk or in backups, we are not in a good position today.  While the
password is hash'd, the salt used is the username which the attacker
may know ahead of the compromise.  The approach I am suggesting improves
that situation because it would bring it up to the same level of
difficulty as that of the network-based sniff attack: the attacker would
have to reverse hash(salt+hash(username+password)).

 Also, my suggestion to use a counter for the session salt, to reduce
 replay from 16k to 4 billion, has not received any comments, and it does
 not break the wire protocol.  I feel that is an incremental improvement
 we should consider.

You are correct, that would improve the existing protocol regarding
database-access risk due to network-based sniffing attacks.
Unfortunately, it would not improve cleartext or database access
risk due to disk-based attacks.

 I think you are minimizing the downsize of your idea using X challenges
 instead of 16k challenges to get in.  Again, if my idea is valid, it
 would be X challenges vs 4 billion challenges.

The reason I have not been as excited by this approach is that we
already have a solution for network-based sniffing attacks.  As Josh
mentioned, there are users out there who even go to extraordinary
lengths to thwart network-based sniffing attacks by using stunnel with
pg_bouncer.  On the flip side, while there are ways to protect backups
through encryption, many other vectors exist for disk-based attacks
which result in either database access or finding the cleartext
password with much less difficulty.

Further, this only improves the authentication handling and doesn't
improve our risk to other network-based attacks, including connection
hijacking, sniffing during password set/reset, data compromise as it's
sent across the wire, etc.  Encouraging use of TLS addresses all of
those risks.  I don't recall any complaints about these other
network-based attacks and I do believe that's because TLS is available.
Combined with the approach I've suggested, we would reduce the risk of
disk-based attacks to the extent we're able to without breaking the
protocol.

For my part, doing this, or going with my suggestion, or doing nothing
with md5, really doesn't move us forward very much, which frustrates me
greatly.  I brought this suggestion to this list because it was
suggested to me as one change we could make that would reduce the risk
of disk-based attacks while trading that off for a higher risk on the
side of network-based attacks while not breaking the existing network
protocol.  To make it very clear- it is not a

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Bruce Momjian

On Thu, Mar  5, 2015 at 11:15:55AM -0500, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Wed, Mar  4, 2015 at 05:56:25PM -0800, Josh Berkus wrote:
   So, are we more worried about attackers getting a copy of pg_authid, or
   sniffing the hash on the wire?
  
  Both.  Stephen is more worried about pg_authid, but I am more worried
  about sniffing.
 
 I'm also worried about both, but if the admin is worried about sniffing
 in their environment, they're much more likely to use TLS than to set up
 client side certificates, kerberos, or some other strong auth mechanism,
 simply because TLS is pretty darn easy to get working and distros set it
 up for you by default.

I think your view might be skewed.  I think there many people who care
about password security who don't care to do TLS.

Also, my suggestion to use a counter for the session salt, to reduce
replay from 16k to 4 billion, has not received any comments, and it does
not break the wire protocol.  I feel that is an incremental improvement
we should consider.

I think you are minimizing the downsize of your idea using X challenges
instead of 16k challenges to get in.  Again, if my idea is valid, it
would be X challenges vs 4 billion challenges.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Tom Lane

I wrote:
 Hm, the problem evidently is that we get a default selectivity estimate
 for the ON condition.  I think a proper fix for this would involve
 teaching eqjoinsel (and ideally other join selectivity functions) how
 to drill down into appendrels and combine estimates for the child
 relations.

I wrote a prototype patch for this.  The additions to examine_variable()
seem pretty reasonable.  However, the only selectivity function I've fixed
is eqjoinsel_inner().  If we do it like this, we're going to need
similar recursive-boilerplate additions in basically every selectivity
function, which seems like a PITA as well as a lot of code bloat.
Can anyone think of a cute way to minimize that overhead?

regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4dd3f9f..aaf76be 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*** static double convert_one_bytea_to_scala
*** 181,187 
  			int rangelo, int rangehi);
  static char *convert_string_datum(Datum value, Oid typid);
  static double convert_timevalue_to_scalar(Datum value, Oid typid);
! static void examine_simple_variable(PlannerInfo *root, Var *var,
  		VariableStatData *vardata);
  static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
     Oid sortop, Datum *min, Datum *max);
--- 181,187 
  			int rangelo, int rangehi);
  static char *convert_string_datum(Datum value, Oid typid);
  static double convert_timevalue_to_scalar(Datum value, Oid typid);
! static void examine_simple_variable(PlannerInfo *root, Var *var, Index varno,
  		VariableStatData *vardata);
  static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
     Oid sortop, Datum *min, Datum *max);
*** eqjoinsel_inner(Oid operator,
*** 2221,2226 
--- 2221,2276 
  	float4	   *numbers2 = NULL;
  	int			nnumbers2 = 0;
  
+ 	if (vardata1-children)
+ 	{
+ 		/* Recurse to appendrel children and compute weighted selectivity */
+ 		double		appendrelrows;
+ 		ListCell   *lc;
+ 
+ 		selec = 0;
+ 		appendrelrows = 0;
+ 		foreach(lc, vardata1-children)
+ 		{
+ 			VariableStatData *childdata = (VariableStatData *) lfirst(lc);
+ 			double		cselec;
+ 
+ 			if (childdata-rel == NULL)
+ continue;		/* safety check */
+ 			cselec = eqjoinsel_inner(operator, childdata, vardata2);
+ 			selec += cselec * childdata-rel-rows;
+ 			appendrelrows += childdata-rel-rows;
+ 		}
+ 		if (appendrelrows  0)
+ 			selec /= appendrelrows;
+ 		CLAMP_PROBABILITY(selec);
+ 		return selec;
+ 	}
+ 
+ 	if (vardata2-children)
+ 	{
+ 		/* Recurse to appendrel children and compute weighted selectivity */
+ 		double		appendrelrows;
+ 		ListCell   *lc;
+ 
+ 		selec = 0;
+ 		appendrelrows = 0;
+ 		foreach(lc, vardata2-children)
+ 		{
+ 			VariableStatData *childdata = (VariableStatData *) lfirst(lc);
+ 			double		cselec;
+ 
+ 			if (childdata-rel == NULL)
+ continue;		/* safety check */
+ 			cselec = eqjoinsel_inner(operator, vardata1, childdata);
+ 			selec += cselec * childdata-rel-rows;
+ 			appendrelrows += childdata-rel-rows;
+ 		}
+ 		if (appendrelrows  0)
+ 			selec /= appendrelrows;
+ 		CLAMP_PROBABILITY(selec);
+ 		return selec;
+ 	}
+ 
  	nd1 = get_variable_numdistinct(vardata1, isdefault1);
  	nd2 = get_variable_numdistinct(vardata2, isdefault2);
  
*** get_restriction_variable(PlannerInfo *ro
*** 4192,4198 
  	{
  		*varonleft = true;
  		*other = estimate_expression_value(root, rdata.var);
! 		/* Assume we need no ReleaseVariableStats(rdata) here */
  		return true;
  	}
  
--- 4242,4248 
  	{
  		*varonleft = true;
  		*other = estimate_expression_value(root, rdata.var);
! 		ReleaseVariableStats(rdata);	/* usually unnecessary, but ... */
  		return true;
  	}
  
*** get_restriction_variable(PlannerInfo *ro
*** 4200,4206 
  	{
  		*varonleft = false;
  		*other = estimate_expression_value(root, vardata-var);
! 		/* Assume we need no ReleaseVariableStats(*vardata) here */
  		*vardata = rdata;
  		return true;
  	}
--- 4250,4256 
  	{
  		*varonleft = false;
  		*other = estimate_expression_value(root, vardata-var);
! 		ReleaseVariableStats(*vardata);
  		*vardata = rdata;
  		return true;
  	}
*** get_join_variables(PlannerInfo *root, Li
*** 4259,4283 
   *	node: the expression tree to examine
   *	varRelid: see specs for restriction selectivity functions
   *
!  * Outputs: *vardata is filled as follows:
!  *	var: the input expression (with any binary relabeling stripped, if
!  *		it is or contains a variable; but otherwise the type is preserved)
!  *	rel: RelOptInfo for relation containing variable; NULL if expression
!  *		contains no Vars (NOTE this could point to a RelOptInfo of a
!  *		subquery, not one in the current query).
!  *	statsTuple: the pg_statistic entry for the variable, if one exists;
!  *		otherwise NULL.
!  *	freefunc: pointer to a

Re: [HACKERS] alter user/role CURRENT_USER

2015-03-06 Thread Alvaro Herrera

There is something odd here going on:

alvherre=# alter group test add user current_user;
ERROR:  role test is a member of role (null)

Surely (null) is not the right name to be reporting there ...

Attached is a rebased patch, which also has some incomplete doc changes.

With this patch applied, doing
\h ALTER ROLE
in psql looks quite odd: note how wide it has become.  Maybe we should
be doing this differently?  (Hmm, why don't we accept ALL in the first SET
line?  Maybe that's just a mistake and the four lines should be all
identical in the first half ...)


alvherre=# \h alter role
Command: ALTER ROLE
Description: change a database role
Syntax:
ALTER ROLE { name | CURRENT_USER | SESSION_USER } [ WITH ] option [ ... ]

where option can be:

  SUPERUSER | NOSUPERUSER
| CREATEDB | NOCREATEDB
| CREATEROLE | NOCREATEROLE
| CREATEUSER | NOCREATEUSER
| INHERIT | NOINHERIT
| LOGIN | NOLOGIN
| REPLICATION | NOREPLICATION
| BYPASSRLS | NOBYPASSRLS
| CONNECTION LIMIT connlimit
| [ ENCRYPTED | UNENCRYPTED ] PASSWORD 'password'
| VALID UNTIL 'timestamp'

ALTER ROLE name RENAME TO new_name

ALTER ROLE { name | CURRENT_USER | SESSION_USER } [ IN DATABASE database_name ] 
SET configuration_parameter { TO | = } { value | DEFAULT }
ALTER ROLE { name | CURRENT_USER | SESSION_USER | ALL } [ IN DATABASE 
database_name ] SET configuration_parameter FROM CURRENT
ALTER ROLE { name | CURRENT_USER | SESSION_USER | ALL } [ IN DATABASE 
database_name ] RESET configuration_parameter
ALTER ROLE { name | CURRENT_USER | SESSION_USER | ALL } [ IN DATABASE 
database_name ] RESET ALL

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
diff --git a/doc/src/sgml/ref/alter_group.sgml b/doc/src/sgml/ref/alter_group.sgml
index 1432242..5f0d8b4 100644
--- a/doc/src/sgml/ref/alter_group.sgml
+++ b/doc/src/sgml/ref/alter_group.sgml
@@ -21,10 +21,10 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-ALTER GROUP replaceable class=PARAMETERgroup_name/replaceable ADD USER replaceable class=PARAMETERuser_name/replaceable [, ... ]
-ALTER GROUP replaceable class=PARAMETERgroup_name/replaceable DROP USER replaceable class=PARAMETERuser_name/replaceable [, ... ]
+ALTER GROUP { replaceable class=PARAMETERgroup_name/replaceable | CURRENT_USER | SESSION_USER } ADD USER replaceable class=PARAMETERuser_name/replaceable [, ... ]
+ALTER GROUP { replaceable class=PARAMETERgroup_name/replaceable | CURRENT_USER | SESSION_USER } DROP USER replaceable class=PARAMETERuser_name/replaceable [, ... ]
 
-ALTER GROUP replaceable class=PARAMETERgroup_name/replaceable RENAME TO replaceablenew_name/replaceable
+ALTER GROUP { replaceable class=PARAMETERgroup_name/replaceable | CURRENT_USER | SESSION_USER } RENAME TO replaceablenew_name/replaceable
 /synopsis
  /refsynopsisdiv
 
diff --git a/doc/src/sgml/ref/alter_role.sgml b/doc/src/sgml/ref/alter_role.sgml
index 0471daa..59f6499 100644
--- a/doc/src/sgml/ref/alter_role.sgml
+++ b/doc/src/sgml/ref/alter_role.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-ALTER ROLE replaceable class=PARAMETERname/replaceable [ [ WITH ] replaceable class=PARAMETERoption/replaceable [ ... ] ]
+ALTER ROLE { replaceable class=PARAMETERname/replaceable | CURRENT_USER | SESSION_USER } [ WITH ] replaceable class=PARAMETERoption/replaceable [ ... ]
 
 phrasewhere replaceable class=PARAMETERoption/replaceable can be:/phrase
 
@@ -39,10 +39,10 @@ ALTER ROLE replaceable class=PARAMETERname/replaceable [ [ WITH ] replace
 
 ALTER ROLE replaceable class=PARAMETERname/replaceable RENAME TO replaceablenew_name/replaceable
 
-ALTER ROLE replaceable class=PARAMETERname/replaceable [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable { TO | = } { replaceablevalue/replaceable | DEFAULT }
-ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable FROM CURRENT
-ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET replaceableconfiguration_parameter/replaceable
-ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET ALL
+ALTER ROLE { replaceable class=PARAMETERname/replaceable | CURRENT_USER | SESSION_USER } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable { TO | = } { replaceablevalue/replaceable | DEFAULT }
+ALTER ROLE { replaceable class=PARAMETERname/replaceable | CURRENT_USER | SESSION_USER | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable FROM CURRENT
+ALTER ROLE { replaceable

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Bruce Momjian

On Fri, Mar  6, 2015 at 12:50:14PM -0800, Josh Berkus wrote:
 On 03/06/2015 08:19 AM, Stephen Frost wrote:
  * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
  Stephen Frost wrote:
  Sure.  I was thinking we would have some mechanism to authenticate the
  connection as coming from a pooler that has been previously authorized;
  something simple as a new pg_hba.conf entry type for poolers that are
  only authorized to connect to such-and-such databases, perhaps limit to
  such-and-such users, etc.
  
  Well, server-side, we already have that- have pgbouncer run on the
  database server (something which I'm typically in favor of anyway) and
  use 'peer'.  If it supported TLS then it could use certificates instead.
  The question is what to do after the pooler has connected and that's
  actually a generic issue which goes beyond poolers and into
  applications, basically, how can I re-authenticate this connection
  using a different role.  We have SET ROLE, but that gives a lot of
  power to the role the pooler logs in as.  It'd definitely be neat to
  provide a way to use SCRAM or similar to do that re-authentication after
  the initial connection.
 
 Using pgbouncer on the DB server is common, but less common that using
 it on an intermediate server or even the app server itself.  So anything
 we create needs to be implementable with all three configurations in
 some way.

I think the best solution to this would be to introduce a per-cluster
salt that is used for every password hash.  That way, you could not
replay a pg_authid hash on another server _unless_ you had manually
assigned the same cluster salt to both servers, or connection pooler.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
 On Fri, Mar  6, 2015 at 12:50:14PM -0800, Josh Berkus wrote:
  On 03/06/2015 08:19 AM, Stephen Frost wrote:
   Well, server-side, we already have that- have pgbouncer run on the
   database server (something which I'm typically in favor of anyway) and
   use 'peer'.  If it supported TLS then it could use certificates instead.
   The question is what to do after the pooler has connected and that's
   actually a generic issue which goes beyond poolers and into
   applications, basically, how can I re-authenticate this connection
   using a different role.  We have SET ROLE, but that gives a lot of
   power to the role the pooler logs in as.  It'd definitely be neat to
   provide a way to use SCRAM or similar to do that re-authentication after
   the initial connection.
  
  Using pgbouncer on the DB server is common, but less common that using
  it on an intermediate server or even the app server itself.  So anything
  we create needs to be implementable with all three configurations in
  some way.
 
 I think the best solution to this would be to introduce a per-cluster
 salt that is used for every password hash.  That way, you could not
 replay a pg_authid hash on another server _unless_ you had manually
 assigned the same cluster salt to both servers, or connection pooler.

Wouldn't that break the wireline protocol, unless you used a fixed set
of known challenges?  Perhaps I'm not following what you mean by a
cluster-wide salt here.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Stephen Frost wrote:
  * Josh Berkus (j...@agliodbs.com) wrote:
 
3)  Using the user name for the MD5 storage salt allows the MD5 stored
hash to be used on a different cluster if the user used the same
password. 
   
   This is a feature as well as a bug. For example, pgBouncer relies on
   this aspect of md5 auth.
  
  It's not a feature and pgBouncer could be made to not rely on this.
 
 Perhaps one of the requirements of a new auth method should be to allow
 middlemen such as connection poolers.  It's been over two years since I
 had a look, but IIRC pgbouncer had the very ugly requirement of its own
 copy of user/passwords in a file, and of course you had to update it
 separately if you changed the password in the server.  We need to make
 it possible for it not to require any such thing.

If we go this direction, we've got to be *very* careful that it's only
when the admin enables it.  man-in-the-middle attacks are quite real and
you're essentially asking that we support them intentionally.  I agree
that we want to support connection poolers but they have an inherent
MITM profile.

Note that this is also something which is up to the pooling system and
which we can't control.  A good example is Kerberos.  Kerberos has had a
way for authentication to be proxied for a long time (with some controls
to say which principals are allowed to be proxied, and which systems are
allowed to proxy on behalf of other principals), but pgbouncer doesn't
support that even though it'd eliminate the need for it to have a user /
password file.

Also, I don't expect we're going to remove md5 any time soon and,
frankly, people using pgbouncer probably aren't worried about the issues
which exist with that mechanism and care much more about performance, as
it doesn't even support TLS..

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Rethinking pg_dump's function sorting code

2015-03-06 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 Comparing argument type names sounds fine.  Comparing argument type OID does
 not offer enough to justify the loss of cross-cluster sort equivalence.

Fair enough.

 So as to stably compare f(nsp0.t) to f(nsp1.t), this should also compare the
 dobj.namespace name.

Ah, good point.  Will fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add transforms feature

2015-03-06 Thread Pavel Stehule

Hi

I am checking this patch, but it is broken still

Regards

Pavel




2015-02-13 8:14 GMT+01:00 Michael Paquier michael.paqu...@gmail.com:



 On Mon, Dec 22, 2014 at 12:19 PM, Michael Paquier 
 michael.paqu...@gmail.com wrote:

 On Mon, Dec 15, 2014 at 3:10 PM, Peter Eisentraut pete...@gmx.net
 wrote:
  fixed
 This patch needs a rebase, it does not apply correctly in a couple of
 places on latest HEAD (699300a):
 ./src/include/catalog/catversion.h.rej
 ./src/include/catalog/pg_proc.h.rej
 ./src/pl/plpython/plpy_procedure.c.rej


 I moved this patch to 2015-02 to not lose track of it and because it did
 not receive much reviews...
 --
 Michael

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Alvaro Herrera

Stephen Frost wrote:
 * Josh Berkus (j...@agliodbs.com) wrote:

   3)  Using the user name for the MD5 storage salt allows the MD5 stored
   hash to be used on a different cluster if the user used the same
   password. 
  
  This is a feature as well as a bug. For example, pgBouncer relies on
  this aspect of md5 auth.
 
 It's not a feature and pgBouncer could be made to not rely on this.

Perhaps one of the requirements of a new auth method should be to allow
middlemen such as connection poolers.  It's been over two years since I
had a look, but IIRC pgbouncer had the very ugly requirement of its own
copy of user/passwords in a file, and of course you had to update it
separately if you changed the password in the server.  We need to make
it possible for it not to require any such thing.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
 just use it to make a compact example?  If it were something worth
 optimizing, it seems like we could teach the planner to pull up VALUES
 in the same way that it flattens sub-selects.  I'm not sure if this is
 worth the trouble or not, though.

I've certainly seen and used values() constructs in joins for a variety
of reasons and I do think it'd be worthwhile for the planner to know how
to pull up a VALUES construct.

Would that be a lot of effort, either code-wise or runtime-wise?  My gut
feeling is that it wouldn't be, but you're clearly in a much better
position to determine that.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] File based Incremental backup v8

2015-03-06 Thread Gabriele Bartolini

Hi Robert,

2015-03-06 3:10 GMT+11:00 Robert Haas robertmh...@gmail.com:

 But I agree with Fujii to the extent that I see little value in
 committing this patch in the form proposed.  Being smart enough to use
 the LSN to identify changed blocks, but then sending the entirety of
 every file anyway because you don't want to go to the trouble of
 figuring out how to revise the wire protocol to identify the
 individual blocks being sent and write the tools to reconstruct a full
 backup based on that data, does not seem like enough of a win.


I believe the main point is to look at a user interface point of view.
If/When we switch to a block level incremental support, this will be
completely transparent to the end user, even if we start with a file-level
approach with LSN check.

The win is already determined by the average space/time gained by users of
VLDB with a good chunk of read-only data. Our Barman users with incremental
backup (released recently - its algorithm can be compared to the one of
file-level backup proposed by Marco) can benefit on average of a data
deduplication ratio ranging between 50 to 70% of the cluster size.

A tangible example is depicted here, with Navionics saving 8.2TB a week
thanks to this approach (and 17 hours instead of 50 for backup time):
http://blog.2ndquadrant.com/incremental-backup-barman-1-4-0/

However, even smaller databases will benefit. It is clear that very small
databases as well as frequently updated ones won't be interested in
incremental backup, but that is never been the use case for this feature.

I believe that if we still think that this approach is not worth it, we are
making a big mistake. The way I see it, this patch follows an agile
approach and it is an important step towards incremental backup on a block
basis.


 As Fujii says, if we ship this patch as written, people will just keep
 using the timestamp-based approach anyway.


I think that allowing users to be able to backup in an incremental way
through streaming replication (even though based on files) will give more
flexibility to system and database administrators for their disaster
recovery solutions.

Thanks,
Gabriele

Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-03-06 Thread Stephen Frost

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 On 26 February 2015 at 09:50, Dean Rasheed dean.a.rash...@gmail.com wrote:
  On 26 February 2015 at 05:43, Stephen Frost sfr...@snowman.net wrote:
  I wonder if there are some documentation updates which need to be done
  for this also?  I'm planning to look as I vauguely recall mentioning the
  ordering of operations somewhere along the way.
 
 I couldn't find any mention of the timing of the check in the existing
 documentation, although it does vaguely imply that the check is done
 before inserting any new data. There is an existing paragraph
 describing the timing of USING conditions, so I added a new paragraph
 immediately after that to explain when CHECK expressions are enforced,
 since that seemed like the best place for it.

Ah, ok, I must have been thinking about the USING discussion.  Thanks
for adding the paragraph about the CHECK timing.

  I also addressed the bitrot from the column-priv leak patch.  Would be
  great to have you take a look at the latest and let me know if you have
  any further comments or suggestions.
 
 I looked it over again, and I'm happy with these updates, except there
 was a missing break in the switch statement in
 ExecWithCheckOptions().

Oh, thanks for catching that!

 Here's an updated patch.

Excellent, will review.

Thanks again!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Tom Lane

Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Hello, I had a report that a query gets wired estimated row
 numbers and it makes subsequent plans wrong.

 Although the detailed explanation is shown later in this mail,
 the problem here was that a query like following makes the path
 with apparently wrong row number.

 EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
  JOIN (VALUES (1)) tt(x) ON tt.x = t.a;

Hm, the problem evidently is that we get a default selectivity estimate
for the ON condition.  I think a proper fix for this would involve
teaching eqjoinsel (and ideally other join selectivity functions) how
to drill down into appendrels and combine estimates for the child
relations.

 I suppose that join nodes can safely clamp the number of its
 result rows by the product of the row numbers of underlying two
 paths. And if it is correct, the attached patch can correct the
 estimation like this.

Unfortunately, this patch is completely horrid, because it gives an unfair
advantage to the parameterized-nestloop plan.  (The hacks in the other
two functions are no-ops, because only a nestloop plan will have a
parameterized path for the appendrel.)  What's more, it's only a cosmetic
fix: it won't correct the planner's actual idea of the joinrel size, which
means it won't have an impact on planning any additional levels of
joining.

We really need to fix this on the selectivity-estimate side.

BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
just use it to make a compact example?  If it were something worth
optimizing, it seems like we could teach the planner to pull up VALUES
in the same way that it flattens sub-selects.  I'm not sure if this is
worth the trouble or not, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CATUPDATE confusion?

2015-03-06 Thread Adam Brightwell

All,

Thanks for all the feedback and review.

 I think I vote for (1).  I do not like (2) because of the argument I made
  upthread that write access on system catalogs is far more dangerous than
  a naive DBA might think.  (0) has some attraction but really CATUPDATE
  is one more concept than we need.

 I agree with #1 on this.


#1 makes sense to me as well.  The current version of the patch takes this
approach.  Also, I have verified against 'master' as of c6ee39b.

Thanks,
Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com

[HACKERS] extend pgbench expressions with functions

2015-03-06 Thread Fabien COELHO



This patch extends pgbench expression with functions. Currently only one 
abs function is added. The point is rather to bootstrap the 
infrastructure for other functions (such as hash, random variants...) to 
be added later.


--
Fabien.diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
index 243c6b9..fd396bd 100644
--- a/contrib/pgbench/exprparse.y
+++ b/contrib/pgbench/exprparse.y
@@ -20,6 +20,7 @@ static PgBenchExpr *make_integer_constant(int64 ival);
 static PgBenchExpr *make_variable(char *varname);
 static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 		PgBenchExpr *rexpr);
+static PgBenchExpr *make_func(const char *fname, PgBenchExpr *arg1);
 
 %}
 
@@ -35,8 +36,8 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 
 %type expr expr
 %type ival INTEGER
-%type str VARIABLE
-%token INTEGER VARIABLE
+%type str VARIABLE FUNCTION
+%token INTEGER VARIABLE FUNCTION
 %token CHAR_ERROR /* never used, will raise a syntax error */
 
 %left	'+' '-'
@@ -57,6 +58,7 @@ expr: '(' expr ')'			{ $$ = $2; }
 	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
 	| INTEGER{ $$ = make_integer_constant($1); }
 	| VARIABLE { $$ = make_variable($1); }
+	| FUNCTION '(' expr ')' { $$ = make_func($1, $3); pg_free($1); }
 	;
 
 %%
@@ -93,4 +95,32 @@ make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
 	return expr;
 }
 
+static struct {
+	char * fname;
+	int nargs;
+	PgBenchFunction tag;
+} PGBENCH_FUNCTIONS[] = {
+	{ abs, 1, PGBENCH_ABS }
+};
+
+static PgBenchExpr *
+make_func(const char * fname, PgBenchExpr *arg1)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+	int nfunctions = sizeof(PGBENCH_FUNCTIONS) / sizeof(PGBENCH_FUNCTIONS[0]);
+	int i;
+
+	expr-etype = ENODE_FUNCTION;
+	expr-u.function.function = PGBENCH_NONE;
+
+	for (i = 0; i  nfunctions; i++)
+		if (pg_strcasecmp(fname, PGBENCH_FUNCTIONS[i].fname) == 0)
+			expr-u.function.function = PGBENCH_FUNCTIONS[i].tag;
+
+	Assert(expr-u.function.function != PGBENCH_NONE);
+
+	expr-u.function.arg1 = arg1;
+	return expr;
+}
+
 #include exprscan.c
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
index 4c9229c..a276444 100644
--- a/contrib/pgbench/exprscan.l
+++ b/contrib/pgbench/exprscan.l
@@ -46,6 +46,7 @@ space			[ \t\r\f]
 ){ yycol += yyleng; return ')'; }
 :[a-zA-Z0-9_]+	{ yycol += yyleng; yylval.str = pg_strdup(yytext + 1); return VARIABLE; }
 [0-9]+			{ yycol += yyleng; yylval.ival = strtoint64(yytext); return INTEGER; }
+[a-zA-Z]+   { yycol += yyleng; yylval.str = pg_strdup(yytext); return FUNCTION; }
 
 [\n]			{ yycol = 0; yyline++; }
 {space}			{ yycol += yyleng; /* ignore */ }
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 706fdf5..1e9604d 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -959,6 +959,26 @@ evaluateExpr(CState *st, PgBenchExpr *expr, int64 *retval)
 return false;
 			}
 
+		case ENODE_FUNCTION:
+			{
+switch (expr-u.function.function)
+{
+	case PGBENCH_ABS:
+	{
+		int64 arg1;
+		if (!evaluateExpr(st, expr-u.function.arg1, arg1))
+			return false;
+
+		*retval = arg1  0? arg1: -arg1;
+		return true;
+	}
+	default:
+		fprintf(stderr, bad function (%d),
+expr-u.function.function);
+		return false;
+}
+			}
+
 		default:
 			break;
 	}
diff --git a/contrib/pgbench/pgbench.h b/contrib/pgbench/pgbench.h
index 128bf11..8697c7b 100644
--- a/contrib/pgbench/pgbench.h
+++ b/contrib/pgbench/pgbench.h
@@ -15,12 +15,19 @@ typedef enum PgBenchExprType
 {
 	ENODE_INTEGER_CONSTANT,
 	ENODE_VARIABLE,
-	ENODE_OPERATOR
+	ENODE_OPERATOR,
+	ENODE_FUNCTION
 } PgBenchExprType;
 
 struct PgBenchExpr;
 typedef struct PgBenchExpr PgBenchExpr;
 
+typedef enum PgBenchFunction
+{
+	PGBENCH_NONE,
+	PGBENCH_ABS
+} PgBenchFunction;
+
 struct PgBenchExpr
 {
 	PgBenchExprType	etype;
@@ -40,6 +47,11 @@ struct PgBenchExpr
 			PgBenchExpr	*lexpr;
 			PgBenchExpr *rexpr;
 		} operator;
+		struct
+		{
+			int function;
+			PgBenchExpr *arg1;
+		} function;
 	} u;
 };
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

2015-03-06 Thread Amit Kapila

On Sun, Feb 15, 2015 at 11:48 AM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Feb 13, 2015 at 2:22 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  On Thu, Feb 12, 2015 at 3:59 AM, Robert Haas robertmh...@gmail.com
wrote:
 
  We're not seeing eye to eye here yet, since I don't accept your
  example scenario and you don't accept mine.  Let's keep discussing.
 
  Meanwhile, here's an updated patch.
 
  A lot of cool activity is showing up here, so moved the patch to CF
2015-02.
  Perhaps Andres you could add yourself as a reviewer?

 Here's a new version of the patch with (a) some additional
 restrictions on heavyweight locking both at the start of, and during,
 parallel mode and (b) a write-up in the README explaining the
 restrictions and my theory of why the handling of heavyweight locking
 is safe.  Hopefully this goes some way towards addressing Andres's
 concerns.  I've also replaced the specific (and wrong) messages about
 advisory locks with a more generic message, as previously proposed;
 and I've fixed at least one bug.


Today, while testing parallel_seqscan patch, I encountered one
intermittent issue (it hangs in below function) and I have question
related to below function.

+void
+WaitForParallelWorkersToFinish(ParallelContext *pcxt)
+{
..
+ for (;;)
+ {
..
+ CHECK_FOR_INTERRUPTS();
+ for (i = 0; i  pcxt-nworkers; ++i)
+ {
+ if (pcxt-worker[i].error_mqh != NULL)
+ {
+ anyone_alive = true;
+ break;
+ }
+ }
+
+ if (!anyone_alive)
+ break;
+
+ WaitLatch(MyProc-procLatch, WL_LATCH_SET, -1);
+ ResetLatch(MyProc-procLatch);
+ }

Isn't there a race condition in this function such that after it finds
that there is some alive worker and before it does WaitLatch(), the
worker completes its work and exits, now in such a case who is
going to wake the backend waiting on procLatch?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] parallel mode and parallel contexts

2015-03-06 Thread Robert Haas

On Fri, Mar 6, 2015 at 7:01 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Today, while testing parallel_seqscan patch, I encountered one
 intermittent issue (it hangs in below function) and I have question
 related to below function.

 +void
 +WaitForParallelWorkersToFinish(ParallelContext *pcxt)
 +{
 ..
 + for (;;)
 + {
 ..
 + CHECK_FOR_INTERRUPTS();
 + for (i = 0; i  pcxt-nworkers; ++i)
 + {
 + if (pcxt-worker[i].error_mqh != NULL)
 + {
 + anyone_alive = true;
 + break;
 + }
 + }
 +
 + if (!anyone_alive)
 + break;
 +
 + WaitLatch(MyProc-procLatch, WL_LATCH_SET, -1);
 + ResetLatch(MyProc-procLatch);
 + }

 Isn't there a race condition in this function such that after it finds
 that there is some alive worker and before it does WaitLatch(), the
 worker completes its work and exits, now in such a case who is
 going to wake the backend waiting on procLatch?

It doesn't matter whether some other backend sets the process latch
before we reach WaitLatch() or after we begin waiting.  Either way,
it's fine.

It would be bad if the process could exit without setting the latch at
all, though.  I hope that's not the case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

* Albe Laurenz (laurenz.a...@wien.gv.at) wrote:
 Stephen Frost wrote:
  Yes, it certainly was.  I think Bruce was thinking that we could simply
  hash what goes on to disk with an additional salt that's stored, but
  that wouldn't actually work without requiring a change to the wireline
  protocol, which is the basis of this entire line of discussion, in my
  view.
 
 This article
 https://hashcat.net/misc/postgres-pth/postgres-pth.pdf
 has some ideas about how to improve the situation.

This falls into the same category as some other proposed changes- it
requires wireline protocol changes, which means it really isn't
interesting to consider.

While I'm not surprised, it's certainly unfortunate that none of these
articles bother to point out what would be really useful to PG users-
how they can decide which risks they want to accept by choosing the
authentication method.  Using 'password', while it isn't great because
of the poor salt used (username), it isn't vulnerable to the 'PTH'
attack, and better authentication methods are available (certificates,
Kerberos, PAM, etc).  Admittedly, the default is md5 for most
distributions, but that's because the better auth methods require
depending on external systems and distribution installers can't know if
those systems have been set up or not.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

Greg,

* Greg Stark (st...@mit.edu) wrote:
 Locked accounts are a terrible terrible idea. All they do is hand attackers
 an easy DOS vulnerability. They're pure security theatre if your
 authentication isn't vulnerable to brute force attacks and an unreliable
 band-aid if they are.

For starters, our authentication *is* vulnerable to brute force attacks
(as is any password-based system, and I doubt we're going to completely
drop support for them regardless of what else we do here), and second
the account lock-out capability is still required in NIST 800-53 rev4 and
is covered by AC-7.

http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-53r4.pdf

AC-7 does address the DOS risk and allows organizations to unlock the
account after an organization-specified delay.

I've been able to address that in the past by using Kerberos for PG
instead and implementing the lock-out and other requirements in this
area that PG doesn't support that way, but that isn't available in all
situations which leads to far worse solutions having to be used to meet
these requirements (sorry, but hacking up a PAM-based approach which
uses cracklib and pam_deny is *really* ugly).

 Having dealt with mechanisms for locking accounts in other database they're
 much more complicated than they appear. You need to deal with different
 requirements for different users, have multiple knobs for how it triggers
 and resolves, have tools for auditing the connection attempts to determine
 if they're legitimate and identify where the incorrect attempts are coming
 from, and so on. And all that accomplishes in the best case scenario is
 having lots of busy-work support requests responding to locked accounts
 and in the worst case scenario upgrading minor issues into major service
 outages.

I agree that they're complicated and that auditing is another necessary
component that we don't currently have.  We are woefully behind in these
areas and should certainly look to what others have done and learned
over the past 10 years that these issues have more-or-less been ignored,
but I don't believe we can or should continue to ignore them as it makes
PG unnecessairly more difficult to use in many areas.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Alvaro Herrera

Stephen Frost wrote:
 Alvaro,
 
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:

  Perhaps one of the requirements of a new auth method should be to allow
  middlemen such as connection poolers.  It's been over two years since I
  had a look, but IIRC pgbouncer had the very ugly requirement of its own
  copy of user/passwords in a file, and of course you had to update it
  separately if you changed the password in the server.  We need to make
  it possible for it not to require any such thing.
 
 If we go this direction, we've got to be *very* careful that it's only
 when the admin enables it.  man-in-the-middle attacks are quite real and
 you're essentially asking that we support them intentionally.  I agree
 that we want to support connection poolers but they have an inherent
 MITM profile.

Sure.  I was thinking we would have some mechanism to authenticate the
connection as coming from a pooler that has been previously authorized;
something simple as a new pg_hba.conf entry type for poolers that are
only authorized to connect to such-and-such databases, perhaps limit to
such-and-such users, etc.

 Note that this is also something which is up to the pooling system and
 which we can't control.  A good example is Kerberos.  Kerberos has had a
 way for authentication to be proxied for a long time (with some controls
 to say which principals are allowed to be proxied, and which systems are
 allowed to proxy on behalf of other principals), but pgbouncer doesn't
 support that even though it'd eliminate the need for it to have a user /
 password file.

True.

I think Kerberos implementations are uncommon, and the complexity of
getting the whole thing up and running is probably the major reason; or
at least there's the common belief that it's so.

My guess is that since there are few users, pgbouncer devs see little
reason to implement it also.  Chicken and egg, perhaps.

(Actually, is pgbouncer under active maintenance at all these days?)

 Also, I don't expect we're going to remove md5 any time soon and,
 frankly, people using pgbouncer probably aren't worried about the issues
 which exist with that mechanism and care much more about performance, as
 it doesn't even support TLS..

I think the accept the issues because they have no other choice, not
because they are really all that OK with them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
 just use it to make a compact example?  If it were something worth
 optimizing, it seems like we could teach the planner to pull up VALUES
 in the same way that it flattens sub-selects.  I'm not sure if this is
 worth the trouble or not, though.

 I've certainly seen and used values() constructs in joins for a variety
 of reasons and I do think it'd be worthwhile for the planner to know how
 to pull up a VALUES construct.

 Would that be a lot of effort, either code-wise or runtime-wise?  My gut
 feeling is that it wouldn't be, but you're clearly in a much better
 position to determine that.

My guess is that it'd be pretty simple to do if we want to do it.
I've not looked at the code yet though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-03-06 Thread Bruce Momjian

On Fri, Mar  6, 2015 at 10:50:27AM +0300, Vladimir Borodin wrote:
 What I have done is to update the pg_upgrade instructions to add this
 required step.  Updated doc patch attached.  (I also added the --delete
 flag to rsync.)  Thanks so much for your detailed report.
 
 
 It seems to work fine now. The only thing that would be nice to change is
 to explicitly write that these instructions are correct for hot standby
 installations too.
 
 + para
 +  If you have Log-Shipping Standby Servers (xref
 +  linkend=warm-standby), follow these steps to upgrade them (before
 +  starting any servers):
 + /para
 
 Actually, I’ve entered this thread because it is not obvious from the 
 paragraph
 above or any other places.

Oh, very good point.  I was trying to match the wording we use in the
docs, but forgot that log shipping and streaming replication are
specified separately.

Updated patch attached.  You can view the output at:

http://momjian.us/tmp/pgsql/pgupgrade.html

Thanks much!

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
new file mode 100644
index 07ca0dc..e25e0d0
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
*** tar -cf backup.tar /usr/local/pgsql/data
*** 438,445 
 Another option is to use applicationrsync/ to perform a file
 system backup.  This is done by first running applicationrsync/
 while the database server is running, then shutting down the database
!server just long enough to do a second applicationrsync/.  The
!second applicationrsync/ will be much quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
--- 438,447 
 Another option is to use applicationrsync/ to perform a file
 system backup.  This is done by first running applicationrsync/
 while the database server is running, then shutting down the database
!server long enough to do an commandrsync --checksum/.
!(option--checksum/ is necessary because commandrsync/ only
!has file modification-time granularity of one second.)  The
!second applicationrsync/ will be quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index e1cd260..0d79fb5
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** NET STOP postgresql-8.4
*** 315,320 
--- 315,325 
  NET STOP postgresql-9.0
  /programlisting
  /para
+ 
+ para
+  Streaming replication and log-shipping standby servers can remain running until
+  a later step.
+ /para
 /step
  
 step
*** pg_upgrade.exe
*** 399,404 
--- 404,539 
 /step
  
 step
+ titleUpgrade Streaming Replication and Log-Shipping standby
+ servers/title
+ 
+ para
+  If you have Streaming Replication (xref
+  linkend=streaming-replication) or Log-Shipping (xref
+  linkend=warm-standby) standby servers, follow these steps to
+  upgrade them (before starting any servers):
+ /para
+ 
+ procedure
+ 
+  step
+   titleInstall the new PostgreSQL binaries on standby servers/title
+ 
+   para
+Make sure the new binaries and support files are installed on all
+standby servers.
+   /para
+  /step
+ 
+  step
+   titleMake sure the new standby data directories do emphasisnot/
+   exist/title
+ 
+   para
+Make sure the new standby data directories do emphasisnot/
+exist or are empty.  If applicationinitdb/ was run, delete
+the standby server data directories.
+   /para
+  /step
+ 
+  step
+   titleInstall custom shared object files/title
+ 
+   para
+Install the same custom shared object files on the new standbys
+that you installed in the new master cluster.
+   /para
+  /step
+ 
+  step
+   titleStop standby servers/title
+ 
+   para
+If the standby servers are still running, stop them now using the
+above instructions.
+   /para
+  /step
+ 
+  step
+   titleVerify standby servers/title
+ 
+   para
+To prevent old standby servers from being modified, run
+applicationpg_controldata/ against the primary and standby
+clusters and verify that the quoteLatest checkpoint location/
+values match in all clusters.  (This requires the standbys to be
+shut down after the

Re: [HACKERS] File based Incremental backup v8

2015-03-06 Thread Robert Haas

On Fri, Mar 6, 2015 at 9:38 AM, Gabriele Bartolini
gabriele.bartol...@2ndquadrant.it wrote:
 I believe the main point is to look at a user interface point of view.
 If/When we switch to a block level incremental support, this will be
 completely transparent to the end user, even if we start with a file-level
 approach with LSN check.

I don't think that's true.  To have a real file-level incremental
backup you need the ability to take the incremental backup, and then
you also need the ability to take a full backup + an incremental
backup taken later and reassemble a full image of the cluster on which
you can run recovery.  The means of doing that is going to be
different for an approach that only copies certain blocks vs. one that
copies whole files.  Once we have the block-based approach, nobody
will ever use the file-based approach, so whatever code or tools we
write to do that will all be dead code, yet we'll still have to
support them for many years.

By the way, unless I'm missing something, this patch only seems to
include the code to construct an incremental backup, but no tools
whatsoever to do anything useful with it once you've got it.  I think
that's 100% unacceptable.  Users need to be able to manipulate
PostgreSQL backups using either standard operating system tools or
tools provided with PostgreSQL.  Some people may prefer to use
something like repmgr or pitrtools or omniptr in addition, but that
shouldn't be a requirement for incremental backup to be usable.

Agile development is good, but that does not mean you can divide a big
project into arbitrarily small chunks.  At some point the chunks are
too small to be sure that the overall direction is right, and/or
individually useless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Stephen Frost wrote:
  * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
   Perhaps one of the requirements of a new auth method should be to allow
   middlemen such as connection poolers.  It's been over two years since I
   had a look, but IIRC pgbouncer had the very ugly requirement of its own
   copy of user/passwords in a file, and of course you had to update it
   separately if you changed the password in the server.  We need to make
   it possible for it not to require any such thing.
  
  If we go this direction, we've got to be *very* careful that it's only
  when the admin enables it.  man-in-the-middle attacks are quite real and
  you're essentially asking that we support them intentionally.  I agree
  that we want to support connection poolers but they have an inherent
  MITM profile.
 
 Sure.  I was thinking we would have some mechanism to authenticate the
 connection as coming from a pooler that has been previously authorized;
 something simple as a new pg_hba.conf entry type for poolers that are
 only authorized to connect to such-and-such databases, perhaps limit to
 such-and-such users, etc.

Well, server-side, we already have that- have pgbouncer run on the
database server (something which I'm typically in favor of anyway) and
use 'peer'.  If it supported TLS then it could use certificates instead.
The question is what to do after the pooler has connected and that's
actually a generic issue which goes beyond poolers and into
applications, basically, how can I re-authenticate this connection
using a different role.  We have SET ROLE, but that gives a lot of
power to the role the pooler logs in as.  It'd definitely be neat to
provide a way to use SCRAM or similar to do that re-authentication after
the initial connection.

 I think Kerberos implementations are uncommon, and the complexity of
 getting the whole thing up and running is probably the major reason; or
 at least there's the common belief that it's so.

Kerberos is *extremely* common- it's what Active Directory uses, after
all.  Where it isn't used are hosting/cloud providers and the like where
they want to support any and every device their client might want to use
to connect, or places which don't realize that AD uses Kerberos and that
it can be leveraged to provide auth to PG.

 My guess is that since there are few users, pgbouncer devs see little
 reason to implement it also.  Chicken and egg, perhaps.

pgbouncer isn't as necessary in the kinds of environments that use
Kerberos because you aren't having lots of connections/s from distinct
principals to a given server.  It's just not the profile that pgbouncer
is built for, but that's kind of my point here- pgbouncer is far more
concerned with performance than security because the assumption going in
is that you have a trusted server connecting to a trusted server.
That's an acceptable assumption for a lot of environments, though not
all.

 (Actually, is pgbouncer under active maintenance at all these days?)

I understand Andrew has been helping with it, but my guess is that's
more maintenance and less active development.

  Also, I don't expect we're going to remove md5 any time soon and,
  frankly, people using pgbouncer probably aren't worried about the issues
  which exist with that mechanism and care much more about performance, as
  it doesn't even support TLS..
 
 I think the accept the issues because they have no other choice, not
 because they are really all that OK with them.

I'd certainly be very happy to see someone interested enough in this
issue to dedicate resources (either human time or funding) to implement
TLS for pgbouncer...

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-06 Thread Tom Lane

I wrote:
 I chewed on this for awhile and decided that there'd be no real harm in
 taking identification of the unique expressions out of 
 create_unique_path() and doing it earlier, in initsplan.c; we'd need a
 couple more fields in SpecialJoinInfo but that doesn't seem like a
 problem.  However, rel-rows is a *big* problem; we simply have not made
 any join size estimates yet, and can't, because these things are done
 bottom up.

 However ... estimate_num_groups's dependency on its rowcount input is not
 large (it's basically using it as a clamp).  So conceivably we could have
 get_loop_count just multiply together the sizes of the base relations
 included in the semijoin's RHS to get a preliminary estimate of that
 number.  This would be the right thing anyway for a single relation in the
 RHS, which is the most common case.  It would usually be an overestimate
 for join RHS, but we could hope that the output of estimate_num_groups
 wouldn't be affected too badly.

Attached is a draft patch that does those two things.  I like the first
part (replacing SpecialJoinInfo's rather ad-hoc join_quals field with
something more explicitly attuned to semijoin uniqueness processing).
The second part is still pretty much of a kluge, but then get_loop_count
was a kluge already.  This arguably makes it better.

Now, on the test case you presented, this has the unfortunate effect that
it now reliably chooses the wrong plan for both cases :-(.  But I think
that's a reflection of poor cost parameters (ie, test case fits handily in
RAM but we've not set the cost parameters to reflect that).  We do get the
same rowcount and roughly-same cost estimates for both the random_fk_dupl
and random_fk_uniq queries, so from that standpoint it's doing the right
thing.  If I reduce random_page_cost to 2 or so, it makes the choices you
wanted.

regards, tom lane

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9fe8008..9f65d66 100644
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
*** _copySpecialJoinInfo(const SpecialJoinIn
*** 1942,1948 
  	COPY_SCALAR_FIELD(jointype);
  	COPY_SCALAR_FIELD(lhs_strict);
  	COPY_SCALAR_FIELD(delay_upper_joins);
! 	COPY_NODE_FIELD(join_quals);
  
  	return newnode;
  }
--- 1942,1951 
  	COPY_SCALAR_FIELD(jointype);
  	COPY_SCALAR_FIELD(lhs_strict);
  	COPY_SCALAR_FIELD(delay_upper_joins);
! 	COPY_SCALAR_FIELD(semi_can_btree);
! 	COPY_SCALAR_FIELD(semi_can_hash);
! 	COPY_NODE_FIELD(semi_operators);
! 	COPY_NODE_FIELD(semi_rhs_exprs);
  
  	return newnode;
  }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index fe509b0..fd876fb 100644
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
*** _equalSpecialJoinInfo(const SpecialJoinI
*** 798,804 
  	COMPARE_SCALAR_FIELD(jointype);
  	COMPARE_SCALAR_FIELD(lhs_strict);
  	COMPARE_SCALAR_FIELD(delay_upper_joins);
! 	COMPARE_NODE_FIELD(join_quals);
  
  	return true;
  }
--- 798,807 
  	COMPARE_SCALAR_FIELD(jointype);
  	COMPARE_SCALAR_FIELD(lhs_strict);
  	COMPARE_SCALAR_FIELD(delay_upper_joins);
! 	COMPARE_SCALAR_FIELD(semi_can_btree);
! 	COMPARE_SCALAR_FIELD(semi_can_hash);
! 	COMPARE_NODE_FIELD(semi_operators);
! 	COMPARE_NODE_FIELD(semi_rhs_exprs);
  
  	return true;
  }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 775f482..75c57a2 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*** _outSpecialJoinInfo(StringInfo str, cons
*** 1945,1951 
  	WRITE_ENUM_FIELD(jointype, JoinType);
  	WRITE_BOOL_FIELD(lhs_strict);
  	WRITE_BOOL_FIELD(delay_upper_joins);
! 	WRITE_NODE_FIELD(join_quals);
  }
  
  static void
--- 1945,1954 
  	WRITE_ENUM_FIELD(jointype, JoinType);
  	WRITE_BOOL_FIELD(lhs_strict);
  	WRITE_BOOL_FIELD(delay_upper_joins);
! 	WRITE_BOOL_FIELD(semi_can_btree);
! 	WRITE_BOOL_FIELD(semi_can_hash);
! 	WRITE_NODE_FIELD(semi_operators);
! 	WRITE_NODE_FIELD(semi_rhs_exprs);
  }
  
  static void
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5a9daf0..1a0d358 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*** compute_semi_anti_join_factors(PlannerIn
*** 3294,3300 
  	/* we don't bother trying to make the remaining fields valid */
  	norm_sjinfo.lhs_strict = false;
  	norm_sjinfo.delay_upper_joins = false;
! 	norm_sjinfo.join_quals = NIL;
  
  	nselec = clauselist_selectivity(root,
  	joinquals,
--- 3294,3303 
  	/* we don't bother trying to make the remaining fields valid */
  	norm_sjinfo.lhs_strict = false;
  	norm_sjinfo.delay_upper_joins = false;
! 	norm_sjinfo.semi_can_btree = false;
! 	norm_sjinfo.semi_can_hash = false;
! 	norm_sjinfo.semi_operators = NIL;
! 	norm_sjinfo.semi_rhs_exprs = NIL;
  
  	nselec = clauselist_selectivity(root,
  	joinquals,
***

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Greg Stark

Locked accounts are a terrible terrible idea. All they do is hand attackers
an easy DOS vulnerability. They're pure security theatre if your
authentication isn't vulnerable to brute force attacks and an unreliable
band-aid if they are.

Having dealt with mechanisms for locking accounts in other database they're
much more complicated than they appear. You need to deal with different
requirements for different users, have multiple knobs for how it triggers
and resolves, have tools for auditing the connection attempts to determine
if they're legitimate and identify where the incorrect attempts are coming
from, and so on. And all that accomplishes in the best case scenario is
having lots of busy-work support requests responding to locked accounts
and in the worst case scenario upgrading minor issues into major service
outages.

Re: [HACKERS] Join push-down support for foreign tables

2015-03-06 Thread Ashutosh Bapat

Hi Kaigai-san, Hanada-san,
Attached please find a patch to print the column names prefixed by the
relation names. I haven't tested the patch fully. The same changes will be
needed for CustomPlan node specific code.

Now I am able to make sense out of the Output information

postgres=# explain verbose select * from ft1 join ft2 using (val);

QUERY PLAN

---
---
 Foreign Scan  (cost=100.00..125.60 rows=2560 width=12)
   Output: ft1.val, ft1.val2, ft2.val2
   Remote SQL: SELECT r.a_0, r.a_1, l.a_1 FROM (SELECT val, val2 FROM
public.lt) l (a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r
(a_0, a_1)
  ON ((r.a_0 = l.a_0))
(3 rows)


On Fri, Mar 6, 2015 at 6:41 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

  Actually val and val2 come from public.lt in r side, but as you say
  it's too difficult to know that from EXPLAIN output.  Do you have any
  idea to make the Output item more readable?
 
 A fundamental reason why we need to have symbolic aliases here is that
 postgres_fdw has remote query in cstring form. It makes implementation
 complicated to deconstruct/construct a query that is once constructed
 on the underlying foreign-path level.
 If ForeignScan keeps items to construct remote query in expression node
 form (and construction of remote query is delayed to beginning of the
 executor, probably), we will be able to construct more human readable
 remote query.

 However, I don't recommend to work on this great refactoring stuff
 within the scope of join push-down support project.

 Thanks,
 --
 NEC OSS Promotion Center / PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com


  -Original Message-
  From: pgsql-hackers-ow...@postgresql.org
  [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada
  Sent: Thursday, March 05, 2015 10:00 PM
  To: Ashutosh Bapat
  Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development
  Subject: Re: [HACKERS] Join push-down support for foreign tables
 
  Hi Ashutosh, thanks for the review.
 
  2015-03-04 19:17 GMT+09:00 Ashutosh Bapat 
 ashutosh.ba...@enterprisedb.com:
   In create_foreignscan_path() we have lines like -
   1587 pathnode-path.param_info = get_baserel_parampathinfo(root,
 rel,
   1588
   required_outer);
   Now, that the same function is being used for creating foreign scan
 paths
   for joins, we should be calling get_joinrel_parampathinfo() on a join
 rel
   and get_baserel_parampathinfo() on base rel.
 
  Got it.  Please let me check the difference.
 
  
   The patch seems to handle all the restriction clauses in the same way.
 There
   are two kinds of restriction clauses - a. join quals (specified using
 ON
   clause; optimizer might move them to the other class if that doesn't
 affect
   correctness) and b. quals on join relation (specified in the WHERE
 clause,
   optimizer might move them to the other class if that doesn't affect
   correctness). The quals in a are applied while the join is being
 computed
   whereas those in b are applied after the join is computed. For
 example,
   postgres=# select * from lt;
val | val2
   -+--
  1 |2
  1 |3
   (2 rows)
  
   postgres=# select * from lt2;
val | val2
   -+--
  1 |2
   (1 row)
  
   postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
val | val2 | val | val2
   -+--+-+--
  1 |2 |   1 |2
  1 |3 | |
   (2 rows)
  
   postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
   lt2.val2);
val | val2 | val | val2
   -+--+-+--
  1 |2 |   1 |2
   (1 row)
  
   The difference between these two kinds is evident in case of outer
 joins,
   for inner join optimizer puts all of them in class b. The remote
 query
   sent to the foreign server has all those in ON clause. Consider foreign
   tables ft1 and ft2 pointing to local tables on the same server.
   postgres=# \d ft1
Foreign table public.ft1
Column |  Type   | Modifiers | FDW Options
   +-+---+-
val| integer |   |
val2   | integer |   |
   Server: loopback
   FDW Options: (table_name 'lt')
  
   postgres=# \d ft2
Foreign table public.ft2
Column |  Type   | Modifiers | FDW Options
   +-+---+-
val| integer |   |
val2   | integer |   |
   Server: loopback
   FDW Options: (table_name 'lt2')
  
   postgres=# explain verbose select * from ft1 left join ft2 on
 (ft1.val2 =
   ft2.val2) where ft1.val + ft2.val  ft1.val2 or ft2.val is null;
  
   QUERY PLAN
  
  
 
 
 
 ---

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Albe Laurenz

Stephen Frost wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 Bruce Momjian br...@momjian.us writes:
 Let me update my list of possible improvements:

 1)  MD5 makes users feel uneasy (though our usage is mostly safe)

 2)  The per-session salt sent to the client is only 32-bits, meaning
 that it is possible to reply an observed MD5 hash in ~16k connection
 attempts.

 3)  Using the user name for the MD5 storage salt allows the MD5 stored
 hash to be used on a different cluster if the user used the same
 password.

 4)  Using the user name for the MD5 storage salt allows the MD5 stored
 hash to be used on the _same_ cluster.

 5)  Using the user name for the MD5 storage salt causes the renaming of
 a user to break the stored password.

 What happened to possession of the contents of pg_authid is sufficient
 to log in?  I thought fixing that was one of the objectives here.
 
 Yes, it certainly was.  I think Bruce was thinking that we could simply
 hash what goes on to disk with an additional salt that's stored, but
 that wouldn't actually work without requiring a change to the wireline
 protocol, which is the basis of this entire line of discussion, in my
 view.

This article
https://hashcat.net/misc/postgres-pth/postgres-pth.pdf
has some ideas about how to improve the situation.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-03-06 Thread Dean Rasheed

On 26 February 2015 at 09:50, Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 26 February 2015 at 05:43, Stephen Frost sfr...@snowman.net wrote:
 I wonder if there are some documentation updates which need to be done
 for this also?  I'm planning to look as I vauguely recall mentioning the
 ordering of operations somewhere along the way.


I couldn't find any mention of the timing of the check in the existing
documentation, although it does vaguely imply that the check is done
before inserting any new data. There is an existing paragraph
describing the timing of USING conditions, so I added a new paragraph
immediately after that to explain when CHECK expressions are enforced,
since that seemed like the best place for it.


 I also addressed the bitrot from the column-priv leak patch.  Would be
 great to have you take a look at the latest and let me know if you have
 any further comments or suggestions.

I looked it over again, and I'm happy with these updates, except there
was a missing break in the switch statement in
ExecWithCheckOptions().

Here's an updated patch.

Regards,
Dean
diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml
new file mode 100644
index 868a6c1..49eaadc
*** a/doc/src/sgml/ref/create_policy.sgml
--- b/doc/src/sgml/ref/create_policy.sgml
*** CREATE POLICY replaceable class=parame
*** 61,66 
--- 61,74 
/para
  
para
+ For INSERT and UPDATE queries, WITH CHECK expressions are enforced after
+ BEFORE triggers are fired, and before any data modifications are made.
+ Thus a BEFORE ROW trigger may modify the data to be inserted, affecting
+ the result of the security policy check.  WITH CHECK expressions are
+ enforced before any other constraints.
+   /para
+ 
+   para
 Policy names are per-table, therefore one policy name can be used for many
 different tables and have a definition for each table which is appropriate to
 that table.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
new file mode 100644
index 33b172b..8dfb952
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
*** ExecConstraints(ResultRelInfo *resultRel
*** 1667,1675 
  
  /*
   * ExecWithCheckOptions -- check that tuple satisfies any WITH CHECK OPTIONs
   */
  void
! ExecWithCheckOptions(ResultRelInfo *resultRelInfo,
  	 TupleTableSlot *slot, EState *estate)
  {
  	Relation	rel = resultRelInfo-ri_RelationDesc;
--- 1667,1676 
  
  /*
   * ExecWithCheckOptions -- check that tuple satisfies any WITH CHECK OPTIONs
+  * of the specified kind.
   */
  void
! ExecWithCheckOptions(WCOKind kind, ResultRelInfo *resultRelInfo,
  	 TupleTableSlot *slot, EState *estate)
  {
  	Relation	rel = resultRelInfo-ri_RelationDesc;
*** ExecWithCheckOptions(ResultRelInfo *resu
*** 1694,1699 
--- 1695,1703 
  		WithCheckOption *wco = (WithCheckOption *) lfirst(l1);
  		ExprState  *wcoExpr = (ExprState *) lfirst(l2);
  
+ 		if (wco-kind != kind)
+ 			continue;
+ 
  		/*
  		 * WITH CHECK OPTION checks are intended to ensure that the new tuple
  		 * is visible (in the case of a view) or that it passes the
*** ExecWithCheckOptions(ResultRelInfo *resu
*** 1715,1726 
  	 modifiedCols,
  	 64);
  
! 			ereport(ERROR,
! 	(errcode(ERRCODE_WITH_CHECK_OPTION_VIOLATION),
!  errmsg(new row violates WITH CHECK OPTION for \%s\,
! 		wco-viewname),
! 	val_desc ? errdetail(Failing row contains %s., val_desc) :
! 			   0));
  		}
  	}
  }
--- 1719,1747 
  	 modifiedCols,
  	 64);
  
! 			switch (wco-kind)
! 			{
! case WCO_VIEW_CHECK:
! 	ereport(ERROR,
! 			(errcode(ERRCODE_WITH_CHECK_OPTION_VIOLATION),
! 		errmsg(new row violates WITH CHECK OPTION for \%s\,
! 			   wco-relname),
! 			 val_desc ? errdetail(Failing row contains %s.,
!   val_desc) : 0));
! 	break;
! case WCO_RLS_INSERT_CHECK:
! case WCO_RLS_UPDATE_CHECK:
! 	ereport(ERROR,
! 			(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
! 		 errmsg(new row violates row level security policy for \%s\,
! wco-relname),
! 			 val_desc ? errdetail(Failing row contains %s.,
!   val_desc) : 0));
! 	break;
! default:
! 	elog(ERROR, unrecognized WCO kind: %u, wco-kind);
! 	break;
! 			}
  		}
  	}
  }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
new file mode 100644
index f96fb24..be879a4
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
*** ExecInsert(TupleTableSlot *slot,
*** 253,258 
--- 253,265 
  		tuple-t_tableOid = RelationGetRelid(resultRelationDesc);
  
  		/*
+ 		 * Check any RLS INSERT WITH CHECK policies
+ 		 */
+ 		if (resultRelInfo-ri_WithCheckOptions != NIL)
+ 			ExecWithCheckOptions(WCO_RLS_INSERT_CHECK,
+

Re: [HACKERS] extend pgbench expressions with functions

2015-03-06 Thread Fabien COELHO



This patch extends pgbench expression with functions. Currently only one 
abs function is added. The point is rather to bootstrap the infrastructure 
for other functions (such as hash, random variants...) to be added later.


Here is a small v2 update:
 - with better error messages on non existing functions
 - a minimal documentation

--
Fabien.diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
index 243c6b9..254f081 100644
--- a/contrib/pgbench/exprparse.y
+++ b/contrib/pgbench/exprparse.y
@@ -20,6 +20,7 @@ static PgBenchExpr *make_integer_constant(int64 ival);
 static PgBenchExpr *make_variable(char *varname);
 static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 		PgBenchExpr *rexpr);
+static PgBenchExpr *make_func(const char *fname, PgBenchExpr *arg1);
 
 %}
 
@@ -35,8 +36,8 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 
 %type expr expr
 %type ival INTEGER
-%type str VARIABLE
-%token INTEGER VARIABLE
+%type str VARIABLE FUNCTION
+%token INTEGER VARIABLE FUNCTION
 %token CHAR_ERROR /* never used, will raise a syntax error */
 
 %left	'+' '-'
@@ -57,6 +58,7 @@ expr: '(' expr ')'			{ $$ = $2; }
 	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
 	| INTEGER{ $$ = make_integer_constant($1); }
 	| VARIABLE { $$ = make_variable($1); }
+	| FUNCTION '(' expr ')' { $$ = make_func($1, $3); pg_free($1); }
 	;
 
 %%
@@ -93,4 +95,33 @@ make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
 	return expr;
 }
 
+static struct {
+	char * fname;
+	int nargs;
+	PgBenchFunction tag;
+} PGBENCH_FUNCTIONS[] = {
+	{ abs, 1, PGBENCH_ABS }
+};
+
+static PgBenchExpr *
+make_func(const char * fname, PgBenchExpr *arg1)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+	int nfunctions = sizeof(PGBENCH_FUNCTIONS) / sizeof(PGBENCH_FUNCTIONS[0]);
+	int i;
+
+	expr-etype = ENODE_FUNCTION;
+	expr-u.function.function = PGBENCH_NONE;
+
+	for (i = 0; i  nfunctions; i++)
+		if (pg_strcasecmp(fname, PGBENCH_FUNCTIONS[i].fname) == 0)
+			expr-u.function.function = PGBENCH_FUNCTIONS[i].tag;
+
+	if (expr-u.function.function == PGBENCH_NONE)
+		yyerror(unexpected function name);
+
+	expr-u.function.arg1 = arg1;
+	return expr;
+}
+
 #include exprscan.c
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
index 4c9229c..a276444 100644
--- a/contrib/pgbench/exprscan.l
+++ b/contrib/pgbench/exprscan.l
@@ -46,6 +46,7 @@ space			[ \t\r\f]
 ){ yycol += yyleng; return ')'; }
 :[a-zA-Z0-9_]+	{ yycol += yyleng; yylval.str = pg_strdup(yytext + 1); return VARIABLE; }
 [0-9]+			{ yycol += yyleng; yylval.ival = strtoint64(yytext); return INTEGER; }
+[a-zA-Z]+   { yycol += yyleng; yylval.str = pg_strdup(yytext); return FUNCTION; }
 
 [\n]			{ yycol = 0; yyline++; }
 {space}			{ yycol += yyleng; /* ignore */ }
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 706fdf5..e0cffc2 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -959,6 +959,26 @@ evaluateExpr(CState *st, PgBenchExpr *expr, int64 *retval)
 return false;
 			}
 
+		case ENODE_FUNCTION:
+			{
+switch (expr-u.function.function)
+{
+	case PGBENCH_ABS:
+	{
+		int64 arg1;
+		if (!evaluateExpr(st, expr-u.function.arg1, arg1))
+			return false;
+
+		*retval = arg1  0? arg1: -arg1;
+		return true;
+	}
+	default:
+		fprintf(stderr, bad function (%d)\n,
+expr-u.function.function);
+		return false;
+}
+			}
+
 		default:
 			break;
 	}
diff --git a/contrib/pgbench/pgbench.h b/contrib/pgbench/pgbench.h
index 128bf11..b7d30ba 100644
--- a/contrib/pgbench/pgbench.h
+++ b/contrib/pgbench/pgbench.h
@@ -15,12 +15,19 @@ typedef enum PgBenchExprType
 {
 	ENODE_INTEGER_CONSTANT,
 	ENODE_VARIABLE,
-	ENODE_OPERATOR
+	ENODE_OPERATOR,
+	ENODE_FUNCTION
 } PgBenchExprType;
 
 struct PgBenchExpr;
 typedef struct PgBenchExpr PgBenchExpr;
 
+typedef enum PgBenchFunction
+{
+	PGBENCH_NONE,
+	PGBENCH_ABS
+} PgBenchFunction;
+
 struct PgBenchExpr
 {
 	PgBenchExprType	etype;
@@ -40,6 +47,11 @@ struct PgBenchExpr
 			PgBenchExpr	*lexpr;
 			PgBenchExpr *rexpr;
 		} operator;
+		struct
+		{
+			PgBenchFunction function;
+			PgBenchExpr *arg1;
+		} function;
 	} u;
 };
 
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index ed12e27..0c58412 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -762,7 +762,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
   references to variables literal:/replaceablevariablename/,
   and expressions composed of unary (literal-/) or binary operators
   (literal+/, literal-/, literal*/, literal//, literal%/)
-  with their usual associativity, and parentheses.
+  with their usual associativity, function literalabs/ and parentheses.
  /para
 
  para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Tom Lane

Tom Lane t...@sss.pgh.pa.us writes:
 Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
 just use it to make a compact example?  If it were something worth
 optimizing, it seems like we could teach the planner to pull up VALUES
 in the same way that it flattens sub-selects.  I'm not sure if this is
 worth the trouble or not, though.

 I've certainly seen and used values() constructs in joins for a variety
 of reasons and I do think it'd be worthwhile for the planner to know how
 to pull up a VALUES construct.

 Would that be a lot of effort, either code-wise or runtime-wise?  My gut
 feeling is that it wouldn't be, but you're clearly in a much better
 position to determine that.

 My guess is that it'd be pretty simple to do if we want to do it.
 I've not looked at the code yet though.

I spent a bit of time looking at this, and realized that the blocker
is the same as the reason why we don't pull up sub-selects with empty
rangetables (SELECT expression(s)).  Namely, that the upper query's
jointree can't handle a null subtree.  (This is not only a matter of
the jointree data structure, but the fact that the whole planner
identifies relations by bitmapsets of RTE indexes, and subtrees with
empty RTE sets couldn't be told apart.)

We could probably fix both cases for order-of-a-hundred lines of new code
in prepjointree.  The plan I'm thinking about is to allow such vacuous
subquery jointrees to be pulled up, but only if they are in a place in
the upper query's jointree where it's okay to delete the subtree.  This
would basically be two cases: (1) the immediate parent is a FromExpr that
would have at least one remaining child, or (2) the immediate parent is
an INNER JOIN whose other child isn't also being deleted (so that we can
convert the JoinExpr to a nonempty FromExpr, or just use the other child
as-is if the JoinExpr has no quals).

More generally, it occurs to me that maybe Oracle wasn't being totally
silly when they invented DUAL.  If we had a jointree representation for
dummy relation with exactly one row then we could substitute that in
all vacuous-jointree cases.  However, it's not clear that there's any
functional advantage to replacing a VALUES Scan with a DUAL Scan,
which is basically what would happen if we flattened a VALUES and then
had to put one of these things in instead.  And having such things
propagate all through the planner, executor, EXPLAIN, etc is way more
code churn than I want to contemplate right now.  The plan proposed in
the preceding para is a bit more ugly logically, but it would limit the
code effects to basically pull_up_subqueries() and its child routines.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Josh Berkus

On 03/06/2015 08:19 AM, Stephen Frost wrote:
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Stephen Frost wrote:
 Sure.  I was thinking we would have some mechanism to authenticate the
 connection as coming from a pooler that has been previously authorized;
 something simple as a new pg_hba.conf entry type for poolers that are
 only authorized to connect to such-and-such databases, perhaps limit to
 such-and-such users, etc.
 
 Well, server-side, we already have that- have pgbouncer run on the
 database server (something which I'm typically in favor of anyway) and
 use 'peer'.  If it supported TLS then it could use certificates instead.
 The question is what to do after the pooler has connected and that's
 actually a generic issue which goes beyond poolers and into
 applications, basically, how can I re-authenticate this connection
 using a different role.  We have SET ROLE, but that gives a lot of
 power to the role the pooler logs in as.  It'd definitely be neat to
 provide a way to use SCRAM or similar to do that re-authentication after
 the initial connection.

Using pgbouncer on the DB server is common, but less common that using
it on an intermediate server or even the app server itself.  So anything
we create needs to be implementable with all three configurations in
some way.

What I'd particularly like to see, of course, is some way for a
pgbouncer-like proxy to authenticate a client connection, then to use
those credentials to authenticate to the database server, without every
storing a reusable auth token (such as the current md5) on the pgbouncer
server.  And, of course, TLS support.

 pgbouncer isn't as necessary in the kinds of environments that use
 Kerberos because you aren't having lots of connections/s from distinct
 principals to a given server.  It's just not the profile that pgbouncer
 is built for, but that's kind of my point here- pgbouncer is far more
 concerned with performance than security because the assumption going in
 is that you have a trusted server connecting to a trusted server.
 That's an acceptable assumption for a lot of environments, though not
 all.

It's not that pgbouncer users don't want more secure connections; it's
that they can't have them.  I know users who are using stunnel to
connect to pgbouncer because that's the only way they can secure it.

 (Actually, is pgbouncer under active maintenance at all these days?)
 
 I understand Andrew has been helping with it, but my guess is that's
 more maintenance and less active development.

No, there's active development; look for some new pgbouncer features
over the next 6-9 months.

More importantly, pgbouncer is *used* by a large portion of our users --
somewhere between 20% and 50% based on my experience with our clients
and at conferences (our clients are actually around 70%
pgbouncer-using).  So if our md5-replacement solution isn't compatible
with pgbouncer, or doesn't arrive with a replacement for pgbouncer with
which it is compatible, will result in us supporting old-style md5 for a
long time or a lot of people using password.

 Also, I don't expect we're going to remove md5 any time soon and,
 frankly, people using pgbouncer probably aren't worried about the issues
 which exist with that mechanism and care much more about performance, as
 it doesn't even support TLS..

 I think the accept the issues because they have no other choice, not
 because they are really all that OK with them.
 
 I'd certainly be very happy to see someone interested enough in this
 issue to dedicate resources (either human time or funding) to implement
 TLS for pgbouncer...

PGX would be happy to implement this if someone wanted to find funding.
 I'd expect that a couple other consulting companies would jump at it as
well.  It's just that so far nobody has wanted to foot the bill.

Kickstarter, maybe?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] get_object_address support for additional object types

2015-03-06 Thread Alvaro Herrera

This is extracted from the DDL deparse series.  These patches add
get_object_address support for the following object types:

- user mappings
- default ACLs
- operators and functions of operator families

In each case I had to create a new value in ObjectType.  These object
types can not be created from the parser, which is why they don't exist
yet.  But if we want to be able to process DDL for them, then we need to
cope at this level.  This is the kind of fix we need to handle the
failures related to commit a486841eb11517e.

There is one strange thing in the last one, which is that an opfamily
member is represented in two arrays like this (objname, objargs):
{opfamily identity, access method identity, number} , {left type, right type}

This is a bit odd considering that operator families themselves are
addressed like this instead:
{opfamily identity} , {access method identity}

Note that the AM name is originally in objargs and moves to objnames.
The reason I did it this way is that the objargs elements can be
processed completely as an array of TypeName, and therefore there's no
need for an extra strange case in pg_get_object_address.  But it does
mean that there is some code that knows to search the strategy or
function number in a specific position in the objname array.

If we had more freedom on general object representation I'm sure we
could do better, but it's what we have.  I don't think it's a tremendous
problem, considering that get_object_address gets a fairly ad-hoc
representation for each object type anyway, as each gets constructed by
the grammar.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
From fad598488c0c15e0962808ad13825374e8a3640e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera alvhe...@alvh.no-ip.org
Date: Thu, 5 Mar 2015 12:04:39 -0300
Subject: [PATCH 1/3] deparse/core: get_object_address support user mappings

---
 src/backend/catalog/objectaddress.c  | 67 +++-
 src/backend/commands/event_trigger.c |  1 +
 src/include/nodes/parsenodes.h   |  1 +
 src/test/regress/expected/event_trigger.out  | 12 -
 src/test/regress/expected/object_address.out | 19 +---
 src/test/regress/sql/event_trigger.sql   | 11 -
 src/test/regress/sql/object_address.sql  |  9 ++--
 7 files changed, 107 insertions(+), 13 deletions(-)

diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 541912b..5553ec7 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -520,7 +520,7 @@ ObjectTypeMap[] =
 	/* OCLASS_FOREIGN_SERVER */
 	{ server, OBJECT_FOREIGN_SERVER },
 	/* OCLASS_USER_MAPPING */
-	{ user mapping, -1 },		/* unmapped */
+	{ user mapping, OBJECT_USER_MAPPING },
 	/* OCLASS_DEFACL */
 	{ default acl, -1 },		/* unmapped */
 	/* OCLASS_EXTENSION */
@@ -555,6 +555,8 @@ static ObjectAddress get_object_address_type(ObjectType objtype,
 		List *objname, bool missing_ok);
 static ObjectAddress get_object_address_opcf(ObjectType objtype, List *objname,
 		List *objargs, bool missing_ok);
+static ObjectAddress get_object_address_usermapping(List *objname,
+			   List *objargs, bool missing_ok);
 static const ObjectPropertyType *get_object_property_data(Oid class_id);
 
 static void getRelationDescription(StringInfo buffer, Oid relid);
@@ -769,6 +771,10 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 address.objectId = get_ts_config_oid(objname, missing_ok);
 address.objectSubId = 0;
 break;
+			case OBJECT_USER_MAPPING:
+address = get_object_address_usermapping(objname, objargs,
+		 missing_ok);
+break;
 			default:
 elog(ERROR, unrecognized objtype: %d, (int) objtype);
 /* placate compiler, in case it thinks elog might return */
@@ -1373,6 +1379,64 @@ get_object_address_opcf(ObjectType objtype,
 }
 
 /*
+ * Find the ObjectAddress for a user mapping.
+ */
+static ObjectAddress
+get_object_address_usermapping(List *objname, List *objargs, bool missing_ok)
+{
+	ObjectAddress address;
+	Oid			userid;
+	char	   *username;
+	char	   *servername;
+	ForeignServer *server;
+	HeapTuple	tp;
+
+	ObjectAddressSet(address, UserMappingRelationId, InvalidOid);
+
+	username = strVal(linitial(objname));
+	servername = strVal(linitial(objargs));
+	server = GetForeignServerByName(servername, false);
+
+	if (strcmp(username, public) == 0)
+		userid = InvalidOid;
+	else
+	{
+		tp = SearchSysCache1(AUTHNAME,
+			 CStringGetDatum(username));
+		if (!HeapTupleIsValid(tp))
+		{
+			if (!missing_ok)
+ereport(ERROR,
+		(errcode(ERRCODE_UNDEFINED_OBJECT),
+		 errmsg(user mapping for user \%s\ in server \%s\ does not exist,
+username, servername)));
+			return address;
+		}
+		userid = HeapTupleGetOid(tp);
+		ReleaseSysCache(tp);
+	}
+
+	tp = SearchSysCache2(USERMAPPINGUSERSERVER,
+		 ObjectIdGetDatum(userid),
+

Re: [HACKERS] Bootstrap DATA is a pita

2015-03-06 Thread Alvaro Herrera

Robert Haas wrote:
 On Wed, Mar 4, 2015 at 2:27 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:

  BTW one solution to the merge problem is to have unique separators for
  each entry.  For instance, instead of

 Speaking from entirely too much experience, that's not nearly enough.
 git only needs 3 lines of context to apply a hunk with no qualms at
 all, and it'll shade that to just 1 or 2 with little fanfare.  If your
 pg_proc entries are each 20 lines long, this sort of thing will
 provide little protection.

Yeah, you're right.  This is going to be a problem, and we need some
solution for it.  I'm out of ideas, other than of course getting each
entry to be at most two lines long which nobody seems to like (for good
reasons.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-06 Thread Noah Misch

On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:
 On 3/4/15 9:10 AM, Robert Haas wrote:
 On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 Could the large allocation[2] for the dead tuple array in lazy_space_alloc
 cause problems with linux OOM? [1] and some other things I've read indicate
 that a large mmap will count towards total system memory, including
 producing a failure if overcommit is disabled.
 
 I believe that this is possible.

I have seen that in the field, albeit on a server with a 10 GiB allocation
limit, years ago.

 Would it be worth avoiding the full size allocation when we can?
 
 Maybe.  I'm not aware of any evidence that this is an actual problem
 as opposed to a theoretical one.  vacrelstats-dead_tuples is limited
 to a 1GB allocation, which is not a trivial amount of memory, but it's
 not huge, either.  But we could consider changing the representation
 from a single flat array to a list of chunks, with each chunk capped
 at say 64MB.  That would not only reduce the amount of memory that we
 
 I was thinking the simpler route of just repalloc'ing... the memcpy would
 suck, but much less so than the extra index pass. 64M gets us 11M tuples,
 which probably isn't very common.

+1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
growth at vac_work_mem.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-06 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:
 I was thinking the simpler route of just repalloc'ing... the memcpy would
 suck, but much less so than the extra index pass. 64M gets us 11M tuples,
 which probably isn't very common.

 +1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
 growth at vac_work_mem.

+1 for repalloc'ing at need, but I'm not sure about the start far below
64 MiB part.  64MB is a pretty small amount on nearly any machine these
days (and for anybody who thinks it isn't, that's why maintenance_work_mem
is a tunable).  I think min(64MB, vac_work_mem) might be a reasonable
start point.

A different line of thought is that it would seem to make sense to have
the initial allocation vary depending on the relation size.  For instance,
you could assume there might be 10 dead tuples per page, and hence try to
alloc that much if it fits in vac_work_mem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CATUPDATE confusion?

2015-03-06 Thread Peter Eisentraut

On 12/29/14 7:16 PM, Adam Brightwell wrote:
 Given this discussion, I have attached a patch that removes CATUPDATE
 for review/discussion.

committed this version



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CATUPDATE confusion?

2015-03-06 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On 12/29/14 7:16 PM, Adam Brightwell wrote:
 Given this discussion, I have attached a patch that removes CATUPDATE
 for review/discussion.

 committed this version

Hmm .. I'm not sure that summarily removing usecatupd from those three
system views was well thought out.  pg_shadow, especially, has no reason
to live at all except for backwards compatibility, and clients might well
expect that column to still be there.  I wonder if we'd not be better off
to keep the column in the views but have it read from rolsuper.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-06 Thread Noah Misch

On Sat, Mar 07, 2015 at 12:46:42AM -0500, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:
  I was thinking the simpler route of just repalloc'ing... the memcpy would
  suck, but much less so than the extra index pass. 64M gets us 11M tuples,
  which probably isn't very common.
 
  +1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
  growth at vac_work_mem.
 
 +1 for repalloc'ing at need, but I'm not sure about the start far below
 64 MiB part.  64MB is a pretty small amount on nearly any machine these
 days (and for anybody who thinks it isn't, that's why maintenance_work_mem
 is a tunable).

True; nothing would explode, especially since the allocation would be strictly
smaller than it is today.  However, I can't think of a place in PostgreSQL
where a growable allocation begins so aggressively, nor a reason to break new
ground in that respect.  For comparison, tuplestore/tuplesort start memtupsize
at 1 KiB.  (One could make a separate case for that practice being wrong.)

 A different line of thought is that it would seem to make sense to have
 the initial allocation vary depending on the relation size.  For instance,
 you could assume there might be 10 dead tuples per page, and hence try to
 alloc that much if it fits in vac_work_mem.

Sounds better than a fixed 64 MiB start, though I'm not sure it's better than
a fixed 256 KiB start.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking pg_dump's function sorting code

2015-03-06 Thread Marko Tiikkaja


On 2015-03-06 01:28, Tom Lane wrote:

In bug #12832 Marko Tiikkaja points out that commit
7b583b20b1c95acb621c71251150beef958bb603 created a rather unnecessary
dump failure hazard, since it applies pg_get_function_identity_arguments()
to every function in the database, even those that won't get dumped.
I think we should fix this by getting rid of pg_dump's use of that
function altogether.  A low-tech way to sort functions of identical names
would be to compare argument type OIDs, as in the attached simple patch.
If people feel it's important to avoid depending on numerical OID order,
we could instead look up type names locally and compare them as in the
attached less-simple patch.  (Both patches neglect reverting the data
collection aspects of the prior commit, since that's mechanical; the only
interesting part is what we'll do to sort.)

Neither patch will exactly preserve the sort behavior of the current
code, but I don't think that's important.

Comments?


I have my own cow in this ditch, but I would much prefer the sort to be 
done based on the type name.  That way the order is still consistent 
between two databases where the objects were created in a different order.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking pg_dump's function sorting code

2015-03-06 Thread Noah Misch

On Thu, Mar 05, 2015 at 07:28:33PM -0500, Tom Lane wrote:
 In bug #12832 Marko Tiikkaja points out that commit
 7b583b20b1c95acb621c71251150beef958bb603 created a rather unnecessary
 dump failure hazard, since it applies pg_get_function_identity_arguments()
 to every function in the database, even those that won't get dumped.
 I think we should fix this by getting rid of pg_dump's use of that
 function altogether.  A low-tech way to sort functions of identical names
 would be to compare argument type OIDs, as in the attached simple patch.
 If people feel it's important to avoid depending on numerical OID order,
 we could instead look up type names locally and compare them as in the
 attached less-simple patch.

Comparing argument type names sounds fine.  Comparing argument type OID does
not offer enough to justify the loss of cross-cluster sort equivalence.

 Neither patch will exactly preserve the sort behavior of the current
 code, but I don't think that's important.

Agreed.

 --- 291,313 
   {
   FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
   FuncInfo   *fobj2 = *(FuncInfo *const *) p2;
 + int i;
   
   cmpval = fobj1-nargs - fobj2-nargs;
   if (cmpval != 0)
   return cmpval;
 ! for (i = 0; i  fobj1-nargs; i++)
 ! {
 ! TypeInfo   *argtype1 = 
 findTypeByOid(fobj1-argtypes[i]);
 ! TypeInfo   *argtype2 = 
 findTypeByOid(fobj2-argtypes[i]);
 ! 
 ! if (argtype1  argtype2)
 ! {
 ! cmpval = strcmp(argtype1-dobj.name, 
 argtype2-dobj.name);
 ! if (cmpval != 0)
 ! return cmpval;
 ! }
 ! }

So as to stably compare f(nsp0.t) to f(nsp1.t), this should also compare the
dobj.namespace name.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Kyotaro HORIGUCHI

Hello, I had a report that a query gets wired estimated row
numbers and it makes subsequent plans wrong.

Although the detailed explanation is shown later in this mail,
the problem here was that a query like following makes the path
with apparently wrong row number.

EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
 JOIN (VALUES (1)) tt(x) ON tt.x = t.a;

0 Nested Loop  (cost=0.43..8.51 rows=68732 width=4)
1   -  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=4)
2   -  Append  (cost=0.43..8.48 rows=2 width=4)
3 -  Index Only Scan using i_t1_a on t1
4   (cost=0.43..8.46 rows=1 width=4)
5   Index Cond: (a = *VALUES*.column1)
6 -  Subquery Scan on *SELECT* 2  (cost=0.00..0.02 rows=1 width=4)
7   Filter: (*VALUES*.column1 = *SELECT* 2.?column?)
8   -  Result  (cost=0.00..0.01 rows=1 width=0)

The Nested Loop at line 0 says that it emits 68832 rows even
though the underlying nodes, Values at line 1 and Append at line
2 are giving 1 and 2 respectively as their reults. This query
actually returns only 1 row. It is seemingly wrong and causes the
plans above go wrong.

This is caused by the Append node, which don't have statistics,
so eqjoinsel takes 0.5 as the default selectivity for the nested
loop. Even though it is defficult to get statistics for Append
node, the nested loop could get more sane row nubmer if it
doesn't ignore the parameterized path selected for the scan on
t1.

I suppose that join nodes can safely clamp the number of its
result rows by the product of the row numbers of underlying two
paths. And if it is correct, the attached patch can correct the
estimation like this.

  Nested Loop  (cost=0.43..16.51 rows=2 width=4)
-  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=4)
-  Append  (cost=0.43..16.48 rows=2 width=4)
  -  Index Only Scan using i_t1_a on t1 
(cost=0.43..16.45 rows=1 width=4)
Index Cond: (a = *VALUES*.column1)
  -  Subquery Scan on *SELECT* 2  (cost=0.00..0.02 rows=1 width=4)
Filter: (*VALUES*.column1 = *SELECT* 2.?column?)
-  Result  (cost=0.00..0.01 rows=1 width=0)

Is it correct? And Does it have no bad side effects?
And is this applicable for 9.5?


The detailed test environment is described below.

Before this fix, the inner path of the top-level join is doing
hash because the inner path says that it will give 68732
rows. (But only 1 actually).

After this fix, the outer path declaring that it gives 2 rows so
the top-level join becomes nested loop and the inner path becomes
index scan.

===
CREATE TABLE t1 (a int);
INSERT INTO t1 (SELECT (a%2)*50 + a / 2 FROM generate_series(0, 100) a);
CREATE INDEX i_t1_a ON t1 (a);
CREATE TABLE t2 AS SELECT * from t1;
ANALYZE t1;
ANALYZE t2;
-- emulating the problematic situation
SET join_collapse_limit TO 1;
SET random_page_cost TO 8.0;
SET seq_page_cost TO 0.5;

EXPLAIN
  SELECT tt2.a
  FROM (SELECT a
FROM (SELECT a FROM t1
  UNION ALL SELECT 0) t
JOIN (VALUES (1)) tt(x) ON tt.x = t.a) tt2
  JOIN t2 ON (tt2.a = t2.a);


 The plan before fix
 Hash Join  (cost=30832.46..36230.60 rows=68732 width=4)
(actual time=1258.880..1260.519 rows=1 loops=1)
   Hash Cond: (*VALUES*.column1 = t2.a)
   -  Nested Loop  (cost=0.43..8.51 rows=68732 width=8)
(actual time=0.071..0.104 rows=1 loops=1)
 -  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=4)
  (actual time=0.002..0.003 rows=1 loops=1)
 -  Append  (cost=0.43..8.48 rows=2 width=4)
 (actual time=0.063..0.093 rows=1 loops=1)
   -  Index Only Scan using i_t1_a on t1
(cost=0.43..8.46 rows=1 width=4)
(actual time=0.059..0.075 rows=1 loops=1)
 Index Cond: (a = *VALUES*.column1)
 Heap Fetches: 2
   -  Subquery Scan on *SELECT* 2 
 (cost=0.00..0.02 rows=1 width=4)
 (actual time=0.010..0.010 rows=0 loops=1)
 Filter: (*VALUES*.column1 = *SELECT* 2.?column?)
 Rows Removed by Filter: 1
 -  Result  (cost=0.00..0.01 rows=1 width=0)
 (actual time=0.001..0.002 rows=1 loops=1)
   -  Hash  (cost=14425.01..14425.01 rows=101 width=4)
 (actual time=1250.274..1250.274 rows=101 loops=1)
 Buckets: 131072  Batches: 16  Memory Usage: 3227kB
 -  Seq Scan on t2  (cost=0.00..14425.01 rows=101 width=4)
 (actual time=0.021..608.245 rows=101 loops=1)
 Planning time: 0.319 ms
 Execution time: 1260.792 ms

 The plan after fix
 Nested Loop  (cost=0.86..17.43 rows=2 width=4)
  (actual time=0.103..0.118 rows=1 loops=1)
   Join Filter: (*VALUES*.column1 =

Re: [HACKERS] Add more tests on event triggers

2015-03-06 Thread Alvaro Herrera

Robert Haas wrote:
 On Wed, Feb 25, 2015 at 10:03 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:
  You can add tests in src/test/modules/dummy_seclabel.
 
  Patch attached to test sec label there, in addition to the other more
  standard checks in event_trigger.
 
 These tests seem worthwhile to me.

Pushed, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

44 matches

Mail list logo