Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina
On Tue, Mar 19, 2013 at 10:37 PM, Daniel Farina  wrote:
> I added programming around various NULL returns reading GUCs in this
> revision, v4.

Okay, one more of those fridge-logic bugs.  Sorry for the noise. v5.

A missing PG_RETHROW to get the properly finally-esque semantics:

--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -642,7 +642,10 @@ dblink_fetch(PG_FUNCTION_ARGS)
  }
  PG_CATCH();
  {
+ /* Pop any set GUCs, if necessary */
  restoreLocalGucs(&rgs);
+
+ PG_RE_THROW();
  }
  PG_END_TRY();

This was easy to add a regression test to exercise, and so I did (not
displayed here).

--
fdr


dblink-guc-sensitive-types-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] citext like searches using index

2013-03-19 Thread David E. Wheeler
On Mar 17, 2013, at 6:35 AM, Thorbjørn Weidemann  
wrote:

> Hi David,
> 
> I found your email-address on 
> http://www.postgresql.org/docs/9.2/static/citext.html. I hope it's ok to 
> contact you this way.
> I would like to thank you for taking the time to make citext available for 
> Postgres, and I hope you can help me with this problem.
> 
> In my workplace we are considering using citext in a Progress -> Postgresql 
> conversion that is underway. The Progress database always searches 
> case-insensitively.
> 
> Simply creating a normal btree index on a citext column makes = searches use 
> the index, but I have not been able to create an index that can be used for 
> searches like
> select  from  where  LIKE 'ide%';
> 
> During this investigation I found out that even for varchar columns I have to 
> append the varchar_pattern_ops opclass to the column when creating the index 
> for it to be used for LIKE searches. But there is no citext_pattern_ops 
> opclass.
> 
> Is there currently any way to create an index that can be used to speed up 
> searches like the one above?
> If not, do you have any idea how it might be implemented? Perhaps I could 
> give it a try myself.
> 
> Thank you in advance for any suggestions you might have.

I would think that text_pattern_ops would work, no?

Best,

David




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina
On Tue, Mar 19, 2013 at 6:10 PM, Daniel Farina  wrote:
> On Tue, Mar 19, 2013 at 3:06 PM, Daniel Farina  wrote:
>> On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane  wrote:
>>> Daniel Farina  writes:
 On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane  wrote:
> I'd be inclined to eat the cost of calling PQparameterStatus every time
> (which won't be that much) and instead try to optimize by avoiding the
> GUC-setting overhead if the current value matches the local setting.
> But even that might be premature optimization.  Did you do any
> performance testing to see if there was a problem worth avoiding?
>>>
 Nope; should I invent a new way to do that, or would it be up to
 commit standard based on inspection alone?  I'm okay either way, let
 me know.
>>>
>>> Doesn't seem that hard to test: run a dblink query that pulls back a
>>> bunch of data under best-case conditions (ie, local not remote server),
>>> and time it with and without the proposed fix.  If the difference
>>> is marginal then it's not worth working hard to optimize.
>>
>> Okay, will do, and here's the shorter and less mechanically intensive
>> naive version that I think is the baseline: it doesn't try to optimize
>> out any GUC settings and sets up the GUCs before the two
>> materialization paths in dblink.
>
> The results.  Summary: seems like grabbing the GUC and strcmp-ing is
> worthwhile, but the amount of ping-ponging between processes adds some
> noise to the timing results: utilization is far short of 100% on
> either processor involved.  Attached is a cumulative diff of the new
> version, and also reproduced below are the changes to v2 that make up
> v3.

I added programming around various NULL returns reading GUCs in this
revision, v4.

The non-cumulative changes:

--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -3005,8 +3005,22 @@ applyRemoteGucs(remoteGucs *rgs)
  /*
  * Attempt to avoid GUC setting if the remote and local GUCs
  * already have the same value.
+ *
+ * NB: Must error if the GUC is not found.
  */
- localVal = GetConfigOption(gucName, true, true);
+ localVal = GetConfigOption(gucName, false, true);
+
+ if (remoteVal == NULL)
+ ereport(ERROR,
+ (errmsg("could not load parameter status of %s",
+ gucName)));
+
+ /*
+ * An error must have been raised by now if GUC values could
+ * not be loaded for any reason.
+ */
+ Assert(localVal != NULL);
+ Assert(remoteVal != NULL);

  if (strcmp(remoteVal, localVal) == 0)
  continue;

--
fdr


dblink-guc-sensitive-types-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-parseable object descriptions

2013-03-19 Thread Alvaro Herrera
Alvaro Herrera wrote:
> .. and here's the patch.

I forgot an example involving the funniest of all object classes:
default ACLs.  Here it is.

alvherre=# create role rol1;
CREATE ROLE
alvherre=# create role rol2;
CREATE ROLE
alvherre=# create schema rol1 authorization rol1;
CREATE SCHEMA
alvherre=# alter default privileges for role rol1, rol2 grant insert on tables 
to alvherre;
ALTER DEFAULT PRIVILEGES
alvherre=# alter default privileges for role rol1 in schema rol1 grant insert 
on tables to alvherre;
ALTER DEFAULT PRIVILEGES
alvherre=# select oid,foo.* from pg_default_acl, lateral (select * from 
pg_identify_object(tableoid, oid, 0)) foo ;
  oid  |type | schema | name |  identity   
---+-++--+-
 48839 | default acl ||  | for role rol1 on tables
 48840 | default acl ||  | for role rol2 on tables
 48844 | default acl ||  | for role rol1 in schema rol1 on tables
(4 filas)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-parseable object descriptions

2013-03-19 Thread Alvaro Herrera
.. and here's the patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
*** a/src/backend/catalog/dependency.c
--- b/src/backend/catalog/dependency.c
***
*** 67,72 
--- 67,73 
  #include "commands/trigger.h"
  #include "commands/typecmds.h"
  #include "foreign/foreign.h"
+ #include "funcapi.h"
  #include "miscadmin.h"
  #include "nodes/nodeFuncs.h"
  #include "parser/parsetree.h"
***
*** 198,203  static bool stack_address_present_add_flags(const ObjectAddress *object,
--- 199,210 
  ObjectAddressStack *stack);
  static void getRelationDescription(StringInfo buffer, Oid relid);
  static void getOpFamilyDescription(StringInfo buffer, Oid opfid);
+ static void getRelationTypeDescription(StringInfo buffer, Oid relid,
+ 		   int32 objectSubId);
+ static void getProcedureTypeDescription(StringInfo buffer, Oid procid);
+ static void getConstraintTypeDescription(StringInfo buffer, Oid constroid);
+ static void getOpFamilyIdentity(StringInfo buffer, Oid opfid);
+ static void getRelationIdentity(StringInfo buffer, Oid relid);
  
  
  /*
***
*** 2193,2199  getObjectClass(const ObjectAddress *object)
  	/* only pg_class entries can have nonzero objectSubId */
  	if (object->classId != RelationRelationId &&
  		object->objectSubId != 0)
! 		elog(ERROR, "invalid objectSubId 0 for object class %u",
  			 object->classId);
  
  	switch (object->classId)
--- 2200,2206 
  	/* only pg_class entries can have nonzero objectSubId */
  	if (object->classId != RelationRelationId &&
  		object->objectSubId != 0)
! 		elog(ERROR, "invalid non-zero objectSubId for object class %u",
  			 object->classId);
  
  	switch (object->classId)
***
*** 3087,3093  pg_describe_object(PG_FUNCTION_ARGS)
  	Oid			classid = PG_GETARG_OID(0);
  	Oid			objid = PG_GETARG_OID(1);
  	int32		subobjid = PG_GETARG_INT32(2);
! 	char	   *description = NULL;
  	ObjectAddress address;
  
  	/* for "pinned" items in pg_depend, return null */
--- 3094,3100 
  	Oid			classid = PG_GETARG_OID(0);
  	Oid			objid = PG_GETARG_OID(1);
  	int32		subobjid = PG_GETARG_INT32(2);
! 	char	   *description;
  	ObjectAddress address;
  
  	/* for "pinned" items in pg_depend, return null */
***
*** 3101,3103  pg_describe_object(PG_FUNCTION_ARGS)
--- 3108,4145 
  	description = getObjectDescription(&address);
  	PG_RETURN_TEXT_P(cstring_to_text(description));
  }
+ 
+ Datum
+ pg_identify_object(PG_FUNCTION_ARGS)
+ {
+ 	Oid			classid = PG_GETARG_OID(0);
+ 	Oid			objid = PG_GETARG_OID(1);
+ 	int32		subobjid = PG_GETARG_INT32(2);
+ 	Oid			schema_oid = InvalidOid;
+ 	const char *objname = NULL;
+ 	ObjectAddress address;
+ 	Datum		values[4];
+ 	bool		nulls[4];
+ 	TupleDesc	tupdesc;
+ 	HeapTuple	htup;
+ 
+ 	address.classId = classid;
+ 	address.objectId = objid;
+ 	address.objectSubId = subobjid;
+ 
+ 	/*
+ 	 * Construct a tuple descriptor for the result row.  This must match this
+ 	 * function's pg_proc entry!
+ 	 */
+ 	tupdesc = CreateTemplateTupleDesc(4, false);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "type",
+ 	   TEXTOID, -1, 0);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "schema",
+ 	   TEXTOID, -1, 0);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "name",
+ 	   TEXTOID, -1, 0);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "identity",
+ 	   TEXTOID, -1, 0);
+ 
+ 	tupdesc = BlessTupleDesc(tupdesc);
+ 
+ 	if (is_objectclass_supported(address.classId))
+ 	{
+ 		HeapTuple	objtup;
+ 		Relation	catalog = heap_open(address.classId, AccessShareLock);
+ 
+ 		objtup = get_catalog_object_by_oid(catalog, address.objectId);
+ 		if (objtup != NULL)
+ 		{
+ 			bool		isnull;
+ 			AttrNumber	nspAttnum;
+ 			AttrNumber	nameAttnum;
+ 
+ 			nspAttnum = get_object_attnum_namespace(address.classId);
+ 			if (nspAttnum != InvalidAttrNumber)
+ 			{
+ schema_oid = heap_getattr(objtup, nspAttnum,
+ 		  RelationGetDescr(catalog), &isnull);
+ if (isnull)
+ 	elog(ERROR, "invalid null namespace in object %u/%u/%d",
+ 		 address.classId, address.objectId, address.objectSubId);
+ 			}
+ 
+ 			nameAttnum = get_object_attnum_name(address.classId);
+ 			if (nameAttnum != InvalidAttrNumber)
+ 			{
+ Datum	nameDatum;
+ 
+ nameDatum = heap_getattr(objtup, nameAttnum,
+ 		 RelationGetDescr(catalog), &isnull);
+ if (isnull)
+ 	elog(ERROR, "invalid null name in object %u/%u/%d",
+ 		 address.classId, address.objectId, address.objectSubId);
+ objname = quote_identifier(NameStr(*(DatumGetName(nameDatum;
+ 			}
+ 		}
+ 
+ 		heap_close(catalog, AccessShareLock);
+ 	}
+ 
+ 	/* object type */
+ 	values[0] = CStringGetTextDatum(getObjectTypeDescription(&address));
+ 	nulls[0] = false;
+ 
+ 	/* schema name */
+ 	if (OidIsValid(schema_oid))
+ 	{
+ 		const char	*schema = quote_identifier(get_namespace_name(schema_oid));
+ 
+ 		values[1] = CStringGetTextDatum

Re: [HACKERS] machine-parseable object descriptions

2013-03-19 Thread Alvaro Herrera
Dimitri Fontaine wrote:
> Tom Lane  writes:
> > I could also live with keeping the schema column as proposed, if people
> > think it has a use, but letting it be redundant with a schema name
> > included in the identity string.  But it seems like a bad idea to try to
> > shear schema off of identity.
> 
> +1
> 
> Use case for keeping the extra column: replication to a federated
> database where you want to tweak the target schema.

If I understood our IM discussion correctly, you were saying that for
federated database replication you wanted a separate "name" column, from
which you could extract a table name easily; not that you wanted a
separate schema column.  Anyway the schema column is also present.  We
can easily remove columns before commit, if we decide we don't want
them.

In the attached patch, we have these three columns: a "name" column,
which is the object's bare name; a "schema" column, which is the schema;
and an "identity" column, which is the whole thing, with all the schema
qualifications that apply.  There's also the type, of course.

I added the name column because it seems to me that it is genuinely
useful for some use cases.  However, there are two problems: first, the
original implementation had a slight bug in the case of column objects
(it displayed the name of the relation, not the name of the column), and
two I was afraid it'd be easily misused.  One way to attack both things
at once would to be make it NULL except in the cases where it's a truly
unique identifier (tables, schemas, etc).  To avoid this being a
standalone "whitelist" of catalogs (which would get outdated quickly, I
fear), I propose to add a new boolean option in ObjectProperty, which
specifies whether the name can be used as an identifier.  I have
implemented it that way in the attached patch.


The new identity column is amazingly verbose on things like pg_amproc entries:

 10650 | 1 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for 
gist: 
pg_catalog.gist_point_consistent(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid,pg_catalog.internal)
 10651 | 2 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for 
gist: pg_catalog.gist_box_union(pg_catalog.internal,pg_catalog.internal)
 10652 | 3 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for 
gist: pg_catalog.gist_point_compress(pg_catalog.internal)
 10653 | 4 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for 
gist: pg_catalog.gist_box_decompress(pg_catalog.internal)
 10654 | 5 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for 
gist: 
pg_catalog.gist_box_penalty(pg_catalog.internal,pg_catalog.internal,pg_catalog.internal)
 10655 | 6 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for 
gist: pg_catalog.gist_box_picksplit(pg_catalog.internal,pg_catalog.internal)
 10656 | 7 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for 
gist: 
pg_catalog.gist_box_same(pg_catalog.box,pg_catalog.box,pg_catalog.internal)
 10657 | 8 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for 
gist: 
pg_catalog.gist_point_distance(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid)

Also, note how types with standard-specified names ("integer") are not
qualified (which is the right thing, AFAICT).   Here's another interesting
example:

alvherre=# create type public.integer as enum ('uno', 'dos');
CREATE TYPE

alvherre=# select * from pg_identify_object('pg_type'::regclass, 
'integer'::regtype, 0);
 type |   schema   | name | identity 
--++--+--
 type | pg_catalog | int4 | integer
(1 fila)

alvherre=# select * from pg_identify_object('pg_type'::regclass, 
'public.integer'::regtype, 0);
 type | schema |   name| identity 
--++---+--
 type | public | "integer" | public."integer"
(1 fila)

If you create a public.int4 type, there's no confusion either, so it's
all consistent.

Here's another bit of sample output, from pg_depend contents (at the
left there's the referencing object, at the right the referenced
object):

alvherre=# select deptype, refd.*, ref.* from pg_depend, lateral (select * from 
pg_identify_object(classid, objid, objsubid) ) refd, lateral (select * from 
pg_identify_object(refclassid, refobjid, refobjsubid)) ref where classid <> 0 
and refd.schema <> 'pg_catalog' and ref.schema <> 'information_schema' and 
refd.schema <> 'pg_toast';
 deptype |   type|schema| name |
   identity   | type |schema|
name | identity  
-+---+--+--+--+--+--+-+---
 a   | domain constraint | public   |  | "my constr" on 
public.mydom  | type

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith

On 3/19/13 10:05 PM, Tom Lane wrote:

FWIW, I would argue that any tradeoffs we make in this area must be made
on the assumption of no such acceleration.  If we can later make things
better for Intel(TM) users, that's cool, but let's not screw those using
other CPUs.


I see compatibility with the acceleration as a tie-breaker.  If there's 
two approaches that are otherwise about equal, such as choosing the 
exact CRC polynomial, you might as well pick the one that works faster 
with Intel's SSE.  I'll make sure that this gets benchmarked soon on a 
decent AMD system too though.  I've been itching to assemble a 24 core 
AMD box at home anyway, this gives me an excuse to pull the trigger on that.


Thanks for the summary of how you view the zlib/libpng project state.  I 
saw 4 releases from zlib in 2012, so it seemed possible development 
might still move forward there.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Ignore invalid indexes in pg_dump

2013-03-19 Thread Michael Paquier
Hi,

If failures happen with CREATE INDEX CONCURRENTLY, the system will be let
with invalid indexes. I don't think that the user would like to see invalid
indexes of
an existing system being recreated as valid after a restore.
So why not removing from a dump invalid indexes with something like the
patch
attached?
This should perhaps be applied in pg_dump for versions down to 8.2 where
CREATE
INDEX CONCURRENTLY has been implemented?

I noticed some recent discussions about that:
http://www.postgresql.org/message-id/20121207141236.gb4...@alvh.no-ip.org
In this case the problem has been fixed in pg_upgrade directly.

-- 
Michael


20130317_dump_only_valid_index.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Tom Lane
Greg Smith  writes:
> While being a lazy researcher today instead of writing code, I 
> discovered that the PNG file format includes a CRC-32 on its data 
> chunks, and to support that there's a CRC32 function inside of zlib: 
> http://www.zlib.net/zlib_tech.html

Hah, old sins keep coming back to haunt one ;-)

Keep in mind that PNG was designed in 1995, and that any speed
considerations in that spec were decided in the context of whether it
would take noticeably longer to view an image downloaded over analog
dialup.  That design context also informed a greater interest in error
checking than has been seen in any other image file format before (or
since, I believe).

> And they've already put some work into optimizing its table-driven 
> implementation.  Seems possible to punt the whole problem of how to do 
> this efficiently toward the zlib developers, let them drop into assembly 
> to get the best possible Intel acceleration etc. one day.

I would not hold my breath waiting for any such work from either the
zlib or libpng developers; both of those projects are basically in
maintenance mode AFAIK.  If we want hardware acceleration we're going
to have to deal with the portability issues ourselves.

FWIW, I would argue that any tradeoffs we make in this area must be made
on the assumption of no such acceleration.  If we can later make things
better for Intel(TM) users, that's cool, but let's not screw those using
other CPUs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina
On Tue, Mar 19, 2013 at 3:06 PM, Daniel Farina  wrote:
> On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane  wrote:
>> Daniel Farina  writes:
>>> On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane  wrote:
 I'd be inclined to eat the cost of calling PQparameterStatus every time
 (which won't be that much) and instead try to optimize by avoiding the
 GUC-setting overhead if the current value matches the local setting.
 But even that might be premature optimization.  Did you do any
 performance testing to see if there was a problem worth avoiding?
>>
>>> Nope; should I invent a new way to do that, or would it be up to
>>> commit standard based on inspection alone?  I'm okay either way, let
>>> me know.
>>
>> Doesn't seem that hard to test: run a dblink query that pulls back a
>> bunch of data under best-case conditions (ie, local not remote server),
>> and time it with and without the proposed fix.  If the difference
>> is marginal then it's not worth working hard to optimize.
>
> Okay, will do, and here's the shorter and less mechanically intensive
> naive version that I think is the baseline: it doesn't try to optimize
> out any GUC settings and sets up the GUCs before the two
> materialization paths in dblink.

The results.  Summary: seems like grabbing the GUC and strcmp-ing is
worthwhile, but the amount of ping-ponging between processes adds some
noise to the timing results: utilization is far short of 100% on
either processor involved.  Attached is a cumulative diff of the new
version, and also reproduced below are the changes to v2 that make up
v3.

## Benchmark

SELECT dblink_connect('benchconn','dbname=contrib_regression');

CREATE FUNCTION bench() RETURNS integer AS $$
DECLARE
iterations integer;
BEGIN
iterations := 0;

WHILE iterations < 30 LOOP
  PERFORM * FROM dblink('benchconn', 'SELECT 1') AS t(a int);
  iterations := iterations + 1;
END LOOP;

RETURN iterations;
END;
$$ LANGUAGE plpgsql;

SELECT clock_timestamp() INTO TEMP beginning;
SELECT bench();
SELECT clock_timestamp() INTO TEMP ending;

SELECT 'dblink-benchmark-lines';
SELECT ending.clock_timestamp - beginning.clock_timestamp
FROM beginning, ending;

## Data Setup

CREATE TEMP TABLE bench_results (version text, span interval);

COPY bench_results FROM STDIN WITH CSV;
no-patch,@ 41.308122 secs
no-patch,@ 36.63597 secs
no-patch,@ 34.264119 secs
no-patch,@ 34.760179 secs
no-patch,@ 32.991257 secs
no-patch,@ 34.538258 secs
no-patch,@ 42.576354 secs
no-patch,@ 39.335557 secs
no-patch,@ 37.493206 secs
no-patch,@ 37.812205 secs
v2-applied,@ 36.550553 secs
v2-applied,@ 38.608723 secs
v2-applied,@ 39.415744 secs
v2-applied,@ 46.091052 secs
v2-applied,@ 45.425438 secs
v2-applied,@ 48.219506 secs
v2-applied,@ 43.514878 secs
v2-applied,@ 45.892302 secs
v2-applied,@ 48.479335 secs
v2-applied,@ 47.632041 secs
v3-strcmp,@ 32.524385 secs
v3-strcmp,@ 34.982098 secs
v3-strcmp,@ 34.487222 secs
v3-strcmp,@ 44.394681 secs
v3-strcmp,@ 44.638309 secs
v3-strcmp,@ 44.113088 secs
v3-strcmp,@ 45.497769 secs
v3-strcmp,@ 33.530176 secs
v3-strcmp,@ 32.9704 secs
v3-strcmp,@ 40.84764 secs
\.

=> SELECT version, avg(extract(epoch from span)), stddev(extract(epoch
from span))
   FROM bench_results
   GROUP BY version;
  version   |avg |  stddev
++--
 no-patch   | 37.1715227 | 3.17076487912923
 v2-applied | 43.9829572 | 4.30572672565711
 v3-strcmp  | 38.7985768 | 5.54760393720725

## Changes to v2:

--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2981,9 +2981,11 @@ initRemoteGucs(remoteGucs *rgs, PGconn *conn)
 static void
 applyRemoteGucs(remoteGucs *rgs)
 {
- int i;
  const int numGucs = sizeof parseAffectingGucs / sizeof *parseAffectingGucs;

+ int i;
+ int addedGucNesting = false;
+
  /*
  * Affected types require local GUC manipulations.  Create a new
  * GUC NestLevel to overlay the remote settings.
@@ -2992,14 +2994,27 @@ applyRemoteGucs(remoteGucs *rgs)
  * structure, so expect it to come with an invalid NestLevel.
  */
  Assert(rgs->localGUCNestLevel == -1);
- rgs->localGUCNestLevel = NewGUCNestLevel();

  for (i = 0; i < numGucs; i += 1)
  {
+ int gucApplyStatus;
+
  const char *gucName   = parseAffectingGucs[i];
  const char *remoteVal = PQparameterStatus(rgs->conn, gucName);
+ const char *localVal  = GetConfigOption(gucName, true, true);

- int gucApplyStatus;
+ /*
+ * Attempt to avoid GUC setting if the remote and local GUCs
+ * already have the same value.
+ */
+ if (strcmp(remoteVal, localVal) == 0)
+ continue;
+
+ if (!addedGucNesting)
+ {
+ rgs->localGUCNestLevel = NewGUCNestLevel();
+ addedGucNesting = true;
+ }

  gucApplyStatus = set_config_option(gucName, remoteVal,
PGC_USERSET, PGC_S_SESSION,

--
fdr


dblink-guc-sensitive-types-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith

On 3/19/13 8:17 PM, Simon Riggs wrote:

We know that will work, has reasonable distribution characteristics
and might even speed things up rather than have two versions of CRC in
the CPU cache.


That sounds reasonable to me.  All of these CRC options have space/time 
trade-offs via how large the lookup tables they use are.  And if those 
are already sitting in the CPU data cache via their use in the WAL 
writes, using them for this purpose too could give them an advantage 
that's not obvious in a synthetic test.  I'm curious how that plays out 
when multiple cores are involved too.


It would be hilarious if optimizing the CRC calculation makes WAL-heavy 
workloads with checksums still net faster in the next release.  Makes me 
wonder how much of the full-page write overhead is being gobbled up by 
CRC time already, on systems with a good sized write cache.



I'd rather get this committed with a safe option and then y'all can
discuss the fine merits of each algorithm at leisure.


Yes, that's what we're already doing--it just looks like work :)

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Simon Riggs
On 20 March 2013 00:03, Greg Smith  wrote:

> Simon suggested the other day that we should make the
> exact checksum mechanism used pluggable at initdb time, just some last
> minute alternatives checking on the performance of the real server code.
> I've now got the WAL CRC32, the zlib CRC32, and the Intel-derived versions
> Ants hacked on to compare.

Selectable, not pluggable.

I think the safe option is to calculate WAL CRC32, take the lowest 16
bits and use that.

We know that will work, has reasonable distribution characteristics
and might even speed things up rather than have two versions of CRC in
the CPU cache. It also gives us just one set of code to tune to cover
both.

I'd rather get this committed with a safe option and then y'all can
discuss the fine merits of each algorithm at leisure.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Ants Aasma
On Wed, Mar 20, 2013 at 12:52 AM, Greg Smith  wrote:
> On 3/19/13 6:08 PM, Ants Aasma wrote:
>>
>> My main worry is that there is a reasonably
>> large population of users out there that don't have that acceleration
>> capability and will have to settle for performance overhead 4x worse
>> than what you currently measured for a shared buffer swapping
>> workload.
>
>
> That would be very bad.  I want to keep hammering on this part of the
> implementation.  If the only style of checksum that's computationally
> feasible is the Fletcher one that's already been done--if that approach is
> basically the most expensive one that's practical to use--I'd still consider
> that a major win over doing nothing.

Well there is also the SIMD checksum that outperforms hardware
assisted CRC's, is almost 3 times as fast as Fletcher on the most
popular platform, should run fast on every CPU that has vector
instructions (almost all server CPUs from the last 10 years), should
run fast even on last two generations of cellphone CPUs and I don't
see any obvious errors that it misses. It will require some
portability work (maybe use intrinsics instead of relying on the
vectorizer) but I don't see why it wouldn't work.

> While being a lazy researcher today instead of writing code, I discovered
> that the PNG file format includes a CRC-32 on its data chunks, and to
> support that there's a CRC32 function inside of zlib:
> http://www.zlib.net/zlib_tech.html
>
> Is there anywhere that compiles a PostgreSQL --without-zlib that matters?
>
> The UI looks like this:
>
> ZEXTERN uLong ZEXPORT crc32 OF((uLong crc, const Bytef *buf, uInt len));
>
> And they've already put some work into optimizing its table-driven
> implementation.  Seems possible to punt the whole problem of how to do this
> efficiently toward the zlib developers, let them drop into assembly to get
> the best possible Intel acceleration etc. one day.

That's the same byte at a time lookup-table algorithm that Intel uses
in the slice-by-8 method, zlib uses a 4 level lookup table for a
smaller table but more overhead. Also, zlib uses the 0x04C11DB7
polynomial that is not supported by the Intel accelerated crc32c
instruction. I believe that if we go crc32 route we should definitely
pick the Castagnoli polynomial that atleast has the hope of being
accelerated.

I copied crc32.c, crc32.h and zutil.h from zlib to the test framework
and ran the tests. While at it I also did a version where the fletcher
loop was unrolled by hand 8 times.

Results on sandy bridge (plain -O2 compile):
CRC32 slicing by 8 Algorithm (bytes/cycle), 0.522284
CRC32 zlib (bytes/cycle), 0.308307
Fletcher Algorithm: (bytes/cycle), 1.891964
Fletcher Algorithm hand unrolled: (bytes/cycle), 3.30
SIMD Algorithm (gcc): (bytes/cycle), 0.572407
SIMD Algorithm (hand coded): (bytes/cycle), 9.124589

Results from papers:
crc32c instruction (castagnoli only): 6.8 bytes/cycle
pqlmulqdq based crc32: 0.9 bytes/cycle

Fletcher is also still a strong contender, we just need to replace the
255 modulus with something less prone to common errors, maybe use
65521 as the modulus. I'd have to think how to best combine the values
in that case. I believe we can lose the property that neither byte can
be zero, just avoiding both being zero seems good enough to me.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith

On 3/19/13 7:13 PM, Daniel Farina wrote:

I'm confused. Postgres includes a CRC32 implementation for WAL, does
it not?  Are you referring to something else?


I'm just pointing out that zlib includes one, too, and they might be 
more motivated/able as a project to chase after Intel's hardware 
acceleration for CRCs one day.  They already have code switching from C 
to assembly to get extra performance out of their longest_match() 
function.  The PostgreSQL CRC code is unlikely to go into twiddling 
assembly code, but zlib--which is usually linked in anyway--will.


And Adler-32 isn't just an option, it's named after a dude who works on 
zlib, and I can see he's already playing with the Intel acceleration by 
some of his recent answers at 
http://stackoverflow.com/users/1180620/mark-adler


I just re-discovered Ross Williams' CRC guide, which was already 
referenced in pg_crc_tables.h, so I think I'm getting close to being 
caught up on all the options here.  Simon suggested the other day that 
we should make the exact checksum mechanism used pluggable at initdb 
time, just some last minute alternatives checking on the performance of 
the real server code.  I've now got the WAL CRC32, the zlib CRC32, and 
the Intel-derived versions Ants hacked on to compare.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Andrew Dunstan


On 03/19/2013 06:52 PM, Greg Smith wrote:



While being a lazy researcher today instead of writing code, I 
discovered that the PNG file format includes a CRC-32 on its data 
chunks, and to support that there's a CRC32 function inside of zlib: 
http://www.zlib.net/zlib_tech.html


Is there anywhere that compiles a PostgreSQL --without-zlib that matters?



Some of the smaller platforms might not have it readily available. I 
doubt there is any common server class or general computing platform 
where it's not available.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Daniel Farina
On Tue, Mar 19, 2013 at 3:52 PM, Greg Smith  wrote:
> On 3/19/13 6:08 PM, Ants Aasma wrote:
>>
>> My main worry is that there is a reasonably
>> large population of users out there that don't have that acceleration
>> capability and will have to settle for performance overhead 4x worse
>> than what you currently measured for a shared buffer swapping
>> workload.
>
>
> That would be very bad.  I want to keep hammering on this part of the
> implementation.  If the only style of checksum that's computationally
> feasible is the Fletcher one that's already been done--if that approach is
> basically the most expensive one that's practical to use--I'd still consider
> that a major win over doing nothing.
>
> While being a lazy researcher today instead of writing code, I discovered
> that the PNG file format includes a CRC-32 on its data chunks, and to
> support that there's a CRC32 function inside of zlib:
> http://www.zlib.net/zlib_tech.html
>
> Is there anywhere that compiles a PostgreSQL --without-zlib that matters?

I'm confused. Postgres includes a CRC32 implementation for WAL, does
it not?  Are you referring to something else?

I happen to remember this because I moved some things around to enable
third party programs (like xlogdump) to be separately compiled:
http://www.postgresql.org/message-id/e1s2xo0-0004uv...@gemulon.postgresql.org

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith

On 3/19/13 6:08 PM, Ants Aasma wrote:

My main worry is that there is a reasonably
large population of users out there that don't have that acceleration
capability and will have to settle for performance overhead 4x worse
than what you currently measured for a shared buffer swapping
workload.


That would be very bad.  I want to keep hammering on this part of the 
implementation.  If the only style of checksum that's computationally 
feasible is the Fletcher one that's already been done--if that approach 
is basically the most expensive one that's practical to use--I'd still 
consider that a major win over doing nothing.


While being a lazy researcher today instead of writing code, I 
discovered that the PNG file format includes a CRC-32 on its data 
chunks, and to support that there's a CRC32 function inside of zlib: 
http://www.zlib.net/zlib_tech.html


Is there anywhere that compiles a PostgreSQL --without-zlib that matters?

The UI looks like this:

ZEXTERN uLong ZEXPORT crc32 OF((uLong crc, const Bytef *buf, uInt len));

And they've already put some work into optimizing its table-driven 
implementation.  Seems possible to punt the whole problem of how to do 
this efficiently toward the zlib developers, let them drop into assembly 
to get the best possible Intel acceleration etc. one day.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add regression tests for ROLE (USER)

2013-03-19 Thread Robins Tharakan
Hi,

Please find attached a patch to take 'make check' code-coverage of ROLE
(USER) from 59% to 91%.

Any feedback is more than welcome.
--
Robins Tharakan


regress_user.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Ants Aasma
On Tue, Mar 19, 2013 at 11:28 PM, Greg Smith  wrote:
> I don't remember if there's any good precedent for whether this form of BSD
> licensed code can be assimilated into PostgreSQL without having to give
> credit to Intel in impractical places.  I hate these licenses with the
> binary restrictions in them.

It's easy enough to re-implement this from scratch, including the
table generation if that is an issue. It's a very simple algorithm.

> Yes, there are two Intel patents on how they actually implement the CRC32C
> in their processor.  I just read them both, and they have many very specific
> claims.  I suspect their purpose is to keep AMD from knocking off the exact
> way Intel does this in hardware.  But they also contributed CRC32C code to
> Linux:
>
> https://lwn.net/Articles/292984/
> http://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git/tree/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
>
> with a dual license, GPLv2 and the 2 clause BSD again.  In theory any CRC32C
> implementation might get dragged into court over Intel's patents if they
> wanted to shake someone down.

Thanks for checking that out. The kernel code is indeed the same 3
parallel CRC's combined at the end method described in the paper.
Looks like that is thankfully a non-issue.

> The main message I took away from this paper is that it's possible to speed
> up CRC computation if you fix a) the CRC polynomial and b) the size of the
> input buffer.  There may be some good optimization possibilities in both
> those, given I'd only expect Postgres to use one polynomial and the typical
> database page sizes.  Intel's processor acceleration has optimizations for
> running against 1K blocks for example.  I don't think requiring the database
> page size to be a multiple of 1K is ever going to be an unreasonable
> limitation, if that's what it takes to get useful hardware acceleration.

The variable size CRC seemed to asymptotically approach the fixed
block speed at 1k. It only affects the specifics of the final
recombination. That said the, fixed size 1k looks good enough if we
decide to go this route. My main worry is that there is a reasonably
large population of users out there that don't have that acceleration
capability and will have to settle for performance overhead 4x worse
than what you currently measured for a shared buffer swapping
workload.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina
On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane  wrote:
> Daniel Farina  writes:
>> On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane  wrote:
>>> I'd be inclined to eat the cost of calling PQparameterStatus every time
>>> (which won't be that much) and instead try to optimize by avoiding the
>>> GUC-setting overhead if the current value matches the local setting.
>>> But even that might be premature optimization.  Did you do any
>>> performance testing to see if there was a problem worth avoiding?
>
>> Nope; should I invent a new way to do that, or would it be up to
>> commit standard based on inspection alone?  I'm okay either way, let
>> me know.
>
> Doesn't seem that hard to test: run a dblink query that pulls back a
> bunch of data under best-case conditions (ie, local not remote server),
> and time it with and without the proposed fix.  If the difference
> is marginal then it's not worth working hard to optimize.

Okay, will do, and here's the shorter and less mechanically intensive
naive version that I think is the baseline: it doesn't try to optimize
out any GUC settings and sets up the GUCs before the two
materialization paths in dblink.

Something I forgot to ask about is about if an strangely-minded type
input function could whack around the GUC as records are being
remitted, which would necessitate per-tuple polling of
pqParameterStatus (e.g. in the inner loop of a materialization) .
That seemed pretty "out there," but I'm broaching it for completeness.

I'll see how much of a penalty it is vs. not applying any patch at all next.

--
fdr


dblink-guc-sensitive-types-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Tom Lane
Daniel Farina  writes:
> On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane  wrote:
>> I'd be inclined to eat the cost of calling PQparameterStatus every time
>> (which won't be that much) and instead try to optimize by avoiding the
>> GUC-setting overhead if the current value matches the local setting.
>> But even that might be premature optimization.  Did you do any
>> performance testing to see if there was a problem worth avoiding?

> Nope; should I invent a new way to do that, or would it be up to
> commit standard based on inspection alone?  I'm okay either way, let
> me know.

Doesn't seem that hard to test: run a dblink query that pulls back a
bunch of data under best-case conditions (ie, local not remote server),
and time it with and without the proposed fix.  If the difference
is marginal then it's not worth working hard to optimize.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith

On 3/18/13 8:17 PM, Ants Aasma wrote:

I looked for fast CRC implementations on the net. The fastest plain C
variant I could find was one produced by Intels R&D department
(available with a BSD license [1], requires some porting).


Very specifically, it references 
http://opensource.org/licenses/bsd-license.html as the 2 clause BSD 
license it is released under.  If PostgreSQL wanted to use that as its 
implementation, the source file would need to have Intel's copyright, 
and there's this ugly thing:


"Redistributions in binary form must reproduce the above copyright 
notice, this list of conditions and the following disclaimer in the 
documentation and/or other materials provided with the distribution."


I don't remember if there's any good precedent for whether this form of 
BSD licensed code can be assimilated into PostgreSQL without having to 
give credit to Intel in impractical places.  I hate these licenses with 
the binary restrictions in them.



For CRC32C there is also an option to use the crc32 instructions
available on newer Intel machines and run 3 parallel CRC calculations
to cover for the 3 cycle latency on that instruction, combining them
in the end [2]. Skimming the paper it looks like there are some
patents in this area, so if we wish to implement this, we would have
to see how we can navigate around them.


Discussing patent issues, especially about how someone else implemented 
a feature on list, is generally bad news.  But since as you noted Intel 
has interacted with other open-source communities already with code 
related to those patents, I think it's OK to talk about that for a bit.


Yes, there are two Intel patents on how they actually implement the 
CRC32C in their processor.  I just read them both, and they have many 
very specific claims.  I suspect their purpose is to keep AMD from 
knocking off the exact way Intel does this in hardware.  But they also 
contributed CRC32C code to Linux:


https://lwn.net/Articles/292984/
http://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git/tree/arch/x86/crypto/crc32c-pcl-intel-asm_64.S

with a dual license, GPLv2 and the 2 clause BSD again.  In theory any 
CRC32C implementation might get dragged into court over Intel's patents 
if they wanted to shake someone down.


But they would bring a world of hurt upon themselves for asserting a 
CRC32C patent claim against any open-source project, considering that 
they contributed this code themselves under a pair of liberal licenses. 
 This doesn't set off any of my "beware of patents" alarms.  Intel 
wants projects to use this approach, detect their acceleration when it's 
available, and run faster on Intel than AMD.  Dragging free software 
packages into court over code they submitted would create a PR disaster 
for Intel.  That would practically be entrapment on their part.



perf accounted 25% of execution time to
rdtsc instructions in the measurement loop for the handcoded variant
not all of that is from the pipeline flush.


To clarify this part, rdtsc is instruction that gets timing information 
from the processor:  "Read Time Stamp Counter".  So Ants is saying a lot 
of the runtime is the timing itself.  rdtsc execution time is the 
overhead that the pg_test_timing utility estimates in some cases.



http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf


The main message I took away from this paper is that it's possible to 
speed up CRC computation if you fix a) the CRC polynomial and b) the 
size of the input buffer.  There may be some good optimization 
possibilities in both those, given I'd only expect Postgres to use one 
polynomial and the typical database page sizes.  Intel's processor 
acceleration has optimizations for running against 1K blocks for 
example.  I don't think requiring the database page size to be a 
multiple of 1K is ever going to be an unreasonable limitation, if that's 
what it takes to get useful hardware acceleration.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina
On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane  wrote:
> Daniel Farina  writes:
>> Similar in purpose to cc3f281ffb0a5d9b187e7a7b7de4a045809ff683, but
>> taking into account that a dblink caller may choose to cause arbitrary
>> changes to DateStyle and IntervalStyle.  To handle this, it is
>> necessary to use PQparameterStatus before parsing any input, every
>> time.  This is avoided in cases that do not include the two
>> problematic types treated -- interval and timestamptz -- by scanning
>> the TupleDesc's types first.
>
> Hm.  Is that really a win?  Examining the tupdesc wouldn't be free
> either, and what's more, I think you're making false assumptions about
> which types have behavior dependent on the GUCs.  You definitely missed
> out on date and plain timestamp, and what of domains over these types?

Yes, I also forgot about arrays, and nested composites in arrays or
nested composites.  As soon as recursive types are introduced the scan
is not sufficient for sure: it's necessary to copy every GUC unless
one wants to recurse through the catalogs which feels like a heavy
loss.

I presumed at the time that skimming over the tupdecs to compare a few
Oids would be significantly cheaper than the guts of
PQparameterStatus, which I believe to be a linked list traversal.  I
mostly wrote that optimization because it was easy coincidentally from
how I chose to structure the code rather than belief in its absolute
necessity.

And, as you said, I forgot a few types even for the simple case, which
is a bit of a relief because some of the machinery I wrote for the n =
3 case will come in useful for that.

> I'd be inclined to eat the cost of calling PQparameterStatus every time
> (which won't be that much) and instead try to optimize by avoiding the
> GUC-setting overhead if the current value matches the local setting.
> But even that might be premature optimization.  Did you do any
> performance testing to see if there was a problem worth avoiding?

Nope; should I invent a new way to do that, or would it be up to
commit standard based on inspection alone?  I'm okay either way, let
me know.

I'll take into account these problems and post a new version.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Tom Lane
Daniel Farina  writes:
> Similar in purpose to cc3f281ffb0a5d9b187e7a7b7de4a045809ff683, but
> taking into account that a dblink caller may choose to cause arbitrary
> changes to DateStyle and IntervalStyle.  To handle this, it is
> necessary to use PQparameterStatus before parsing any input, every
> time.  This is avoided in cases that do not include the two
> problematic types treated -- interval and timestamptz -- by scanning
> the TupleDesc's types first.

Hm.  Is that really a win?  Examining the tupdesc wouldn't be free
either, and what's more, I think you're making false assumptions about
which types have behavior dependent on the GUCs.  You definitely missed
out on date and plain timestamp, and what of domains over these types?

I'd be inclined to eat the cost of calling PQparameterStatus every time
(which won't be that much) and instead try to optimize by avoiding the
GUC-setting overhead if the current value matches the local setting.
But even that might be premature optimization.  Did you do any
performance testing to see if there was a problem worth avoiding?

> Although it has been suggested that extra_float_digits would need
> similar treatment as IntervalStyle and DateStyle (as it's seen in
> pg_dump), extra_float_digits is not an GUC_REPORT-ed GUC and is not
> able to be checked in PQparameterStatus, and furthermore, the float4
> and float8 parsers are not sensitive to the GUC anyway: both accept as
> much precision as is provided in an unambiguous way.

Agreed, we don't need to worry so much about that one; or at least,
if we do need to worry, it's on the other end of the connection from
what we're fixing here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith

On 3/8/13 4:40 PM, Greg Stark wrote:

On Fri, Mar 8, 2013 at 5:46 PM, Josh Berkus  wrote:

After some examination of the systems involved, we conculded that the
issue was the FreeBSD drivers for the new storage, which were unstable
and had custom source patches.  However, without PostgreSQL checksums,
we couldn't *prove* it wasn't PostgreSQL at fault.  It ended up taking
weeks of testing, most of which was useless, to prove to them they had a
driver problem so it could be fixed.  If Postgres had had checksums, we
could have avoided wasting a couple weeks looking for non-existant
PostgreSQL bugs.


How would Postgres checksums have proven that?


It's hard to prove this sort of thing definitively.  I see this more as 
a source of evidence that can increase confidence that the database is 
doing the right thing, most usefully in a replication environment. 
Systems that care about data integrity nowadays are running with a WAL 
shipping replica of some sort.  Right now there's no way to grade the 
master vs. standby copies of data, to figure out which is likely to be 
the better copy.  In a checksum environment, here's a new 
troubleshooting workflow that becomes possible:


1) Checksum error happens on the master.
2) The same block is checked on the standby.  It has the same 16 bit 
checksum, but different data, and its checksum matches its data.
3) The copy of that block on the standby, which was shipped over the 
network instead of being stored locally, is probably good.
4) The database must have been consistent when the data was in RAM on 
the master.
5) Conclusion:  there's probably something wrong at a storage layer 
below the database on the master.


Now, of course this doesn't automatically point the finger correctly 
with every possible corruption possibility.  But this example is a 
situation I've seen in the real world when a bad driver flips a random 
bit in a block.  If Josh had been able to show his client the standby 
server built from streaming replication was just fine, and corruption 
was limited to the master, that doesn't *prove* the database isn't the 
problem.  But it does usefully adjust the perception of what faults are 
likely and unlikely away from it.  Right now when I see master/standby 
differences in data blocks, it's the old problem of telling the true 
time when you have two clocks.  Having a checksum helps pick the right 
copy when there is more than one, and one has been corrupted by storage 
layer issues.



If i understand the performance issues right the main problem is the
extra round trip to the wal log which can require a sync. Is that
right?


I don't think this changes things such that there is a second fsync per 
transaction.  That is a worthwhile test workload to add though.  Right 
now the tests Jeff and I have ran have specifically avoided systems with 
slow fsync, because you can't really test the CPU/memory overhead very 
well if you're hitting the rotational latency bottleneck.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Tom Lane
Jeff Davis  writes:
> I will move back to verifying the page hole, as well.

> There are a few approaches:

> 1. Verify that the page hole is zero before write and after read.
> 2. Include it in the calculation (if we think there are some corner
> cases where the hole might not be all zero).
> 3. Zero the page hole before write, and verify that it's zero on read.
> This can be done during the memcpy at no performance penalty in
> PageSetChecksumOnCopy(), but that won't work for
> PageSetChecksumInplace().

TBH, I do not think that the checksum code ought to be so familiar with
the page format as to know that there *is* a hole, much less be willing
to zero out what it thinks is a hole.  I consider #3 totally
unacceptable from a safety standpoint, and don't much care for #1
either.  #2 sounds like the thing to do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Tom Lane
I wrote:
> [ looks at patch... ]  Oh, I see what's affecting the plan: you changed
> the aggtranstypes to internal for a bunch of aggregates.  That's not
> very good, because right now the planner takes that to mean that the
> aggregate could eat a lot of space.  We don't want that to happen for
> these aggregates, I think.

After thinking about that for awhile: if we pursue this type of
optimization, what would probably be appropriate is to add an aggregate
property (stored in pg_aggregate) that allows direct specification of
the size that the planner should assume for the aggregate's transition
value.  We were getting away with a hardwired assumption of 8K for
"internal" because the existing aggregates that used that transtype all
had similar properties, but it was always really a band-aid not a proper
solution.  A per-aggregate override could be useful in other cases too.

This was looking like 9.4 material already, but adding such a property
would definitely put it over the top of what we could think about
squeezing into 9.3, IMO.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Simon Riggs
On 19 March 2013 00:17, Ants Aasma  wrote:

> I looked for fast CRC implementations on the net.

Thanks very much for great input.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Simon Riggs
On 19 March 2013 17:18, Jeff Davis  wrote:

> I will move back to verifying the page hole, as well.

That was agreed long ago...

> There are a few approaches:
>
> 1. Verify that the page hole is zero before write and after read.
> 2. Include it in the calculation (if we think there are some corner
> cases where the hole might not be all zero).
> 3. Zero the page hole before write, and verify that it's zero on read.
> This can be done during the memcpy at no performance penalty in
> PageSetChecksumOnCopy(), but that won't work for
> PageSetChecksumInplace().
>
> With option #2 or #3, we might also verify that the hole is all-zero if
> asserts are enabled.

(3) seems likely to be more expensive than (2), since we're talking
unaligned memory writes rather than a single pre-fetchable block read.

In any case, at initial patch commit, we should CRC the whole block
and allow for the possibility of improvement following measurements.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-advocacy] Call for Google Summer of Code mentors, admins

2013-03-19 Thread Thom Brown
On 14 February 2013 18:02, Josh Berkus  wrote:
> Folks,
>
> Once again, Google is holding Summer of Code.  We need to assess whether
> we want to participate this year.
>
> Questions:
>
> - Who wants to mentor for GSOC?
>
> - Who can admin for GSOC?  Thom?
>
> - Please suggest project ideas for GSOC
>
> - Students seeing this -- please speak up if you have projects you plan
> to submit.

If anyone else has more projects ideas to suggest, please do share.
Students, please feel free to review the PostgreSQL Todo list for
inspiration: http://wiki.postgresql.org/wiki/Todo  Of course ensure
you don't choose anything too ambitious or trivial.

--
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Jeff Davis
On Fri, 2013-03-15 at 14:32 +0200, Ants Aasma wrote:
> The most obvious case here is that you
> can swap any number of bytes from 0x00 to 0xFF or back without
> affecting the hash.

That's a good point. Someone (Simon?) had brought that up before, but
you and Tom convinced me that it's a problem. As I said in my reply to
Tom, one option is to change the modulus.

> I took a look at how the fletcher-64 compiles.

Great analysis, thank you.

> I'm not really sure if parallel checksums would be worth doing or not.
> On one hand, enabling data parallelism would make it more future
> proof, on the other hand, the unvectorized variant is slower than
> Fletcher-64.

Looks like we still have several options being discussed. I think the
checksum with modulo 255 is out, but perhaps a different modulus is
still on the table. And if we can get a CRC to be fast enough, then we'd
all be happy with that option.

Another thing to consider is that, right now, the page is copied and
then checksummed. If we can calculate the checksum during the copy, that
might save us a small amount of effort. My feeling is that it would only
really help if the checksum is very cheap and works on large word sizes,
but I'm not sure.

> On another note, I think I found a bug with the current latest patch.

Ugh. Great catch, thank you!

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Jeff Davis
On Sat, 2013-03-16 at 20:41 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On 15 March 2013 13:08, Andres Freund  wrote:
> >> I commented on this before, I personally think this property makes 
> >> fletcher a
> >> not so good fit for this. Its not uncommon for parts of a block being 
> >> all-zero
> >> and many disk corruptions actually change whole runs of bytes.

[ referring to Ants's comment that the existing algorithm doesn't
distinguish between 0x00 and 0xFF ]

> Meh.  I don't think that argument holds a lot of water.  The point of
> having checksums is not so much to notice corruption as to be able to
> point the finger at flaky hardware.  If we have an 8K page with only
> 1K of data in it, and we fail to notice that the hardware dropped a lot
> of bits in the other 7K, we're not doing our job; and that's not really
> something to write off, because it would be a lot better if we complain
> *before* the hardware manages to corrupt something valuable.

I will move back to verifying the page hole, as well.

There are a few approaches:

1. Verify that the page hole is zero before write and after read.
2. Include it in the calculation (if we think there are some corner
cases where the hole might not be all zero).
3. Zero the page hole before write, and verify that it's zero on read.
This can be done during the memcpy at no performance penalty in
PageSetChecksumOnCopy(), but that won't work for
PageSetChecksumInplace().

With option #2 or #3, we might also verify that the hole is all-zero if
asserts are enabled.

> So I think we'd be best off to pick an algorithm whose failure modes
> don't line up so nicely with probable hardware failure modes.  It's
> worth noting that one of the reasons that CRCs are so popular is
> precisely that they were designed to detect burst errors with high
> probability.

Another option is to use a different modulus. The page
http://en.wikipedia.org/wiki/Fletcher%27s_checksum suggests that a prime
number can be a good modulus for Fletcher-32. Perhaps we could use 251
instead of 255? That would make it less likely to miss a common form of
hardware failure, although it would also reduce the number of possible
checksums slightly (about 4% fewer than 2^16).

I'm leaning toward this option now, or a CRC of some kind if the
performance is reasonable.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Tom Lane
[ please do not top-reply ]

Hadi Moshayedi  writes:
> On Tue, Mar 19, 2013 at 6:25 PM, Tom Lane  wrote:
>> Uh, what?  Fooling around with the implementation of avg() should surely
>> not change any planning decisions.

> I am not sure how this works, but I also changed numeric sum(), and the
> views in question had a numeric sum() column. Can that have any impact?

[ looks at patch... ]  Oh, I see what's affecting the plan: you changed
the aggtranstypes to internal for a bunch of aggregates.  That's not
very good, because right now the planner takes that to mean that the
aggregate could eat a lot of space.  We don't want that to happen for
these aggregates, I think.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Hadi Moshayedi
I am not sure how this works, but I also changed numeric sum(), and the
views in question had a numeric sum() column. Can that have any impact?

I am going to dig deeper to see why this happens.

On Tue, Mar 19, 2013 at 6:25 PM, Tom Lane  wrote:

> Kevin Grittner  writes:
> > Hadi Moshayedi  wrote:
> >> I also noticed that this patch makes matview test fail. It seems
> >> that it just changes the ordering of rows for queries like
> >> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
> >> should we add "ORDER BY" clauses to this test to make it more
> >> deterministic?
>
> > I added some ORDER BY clauses.  That is probably a good thing
> > anyway for purposes of code coverage.  Does that fix it for you?
>
> Uh, what?  Fooling around with the implementation of avg() should surely
> not change any planning decisions.
>
> regards, tom lane
>


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Tom Lane
Kevin Grittner  writes:
> Hadi Moshayedi  wrote:
>> I also noticed that this patch makes matview test fail. It seems
>> that it just changes the ordering of rows for queries like
>> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
>> should we add "ORDER BY" clauses to this test to make it more
>> deterministic?

> I added some ORDER BY clauses.  That is probably a good thing
> anyway for purposes of code coverage.  Does that fix it for you?

Uh, what?  Fooling around with the implementation of avg() should surely
not change any planning decisions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Pavel Stehule
2013/3/19 Kevin Grittner :
> Hadi Moshayedi  wrote:
>
>> I updated the patch by taking ideas from your patch, and unifying
>> the transition struct and update function for different
>> aggregates. The speed of avg improved even more. It now has 60%
>> better performance than the current committed version.
>
> Outstanding!

I did some tests ala  OLAP queries and I am thinking so ~ 40% speedup
for queries with AVG is realistic. Depends on other conditions.

But there are lot of situation when data are in shared buffers or file
system memory and then this patch can carry significant speedup - and
probably can be better if some better algorithm for sum two numeric
numbers in aggregate.

Regards

Pavel

>
>> I also noticed that this patch makes matview test fail. It seems
>> that it just changes the ordering of rows for queries like
>> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
>> should we add "ORDER BY" clauses to this test to make it more
>> deterministic?
>
> I added some ORDER BY clauses.  That is probably a good thing
> anyway for purposes of code coverage.  Does that fix it for you?
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-03-19 Thread Kevin Grittner
Hadi Moshayedi  wrote:

> I updated the patch by taking ideas from your patch, and unifying
> the transition struct and update function for different
> aggregates. The speed of avg improved even more. It now has 60%
> better performance than the current committed version.

Outstanding!

> I also noticed that this patch makes matview test fail. It seems
> that it just changes the ordering of rows for queries like
> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
> should we add "ORDER BY" clauses to this test to make it more
> deterministic?

I added some ORDER BY clauses.  That is probably a good thing
anyway for purposes of code coverage.  Does that fix it for you?

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writable FDW: returning clauses.

2013-03-19 Thread Tom Lane
Ronan Dunklau  writes:
> While implementing the new writable API for FDWs, I wondered wether they 
> are any obvious problems with the following behavior for handling returning 
> clauses (for the delete case).

> The goal is to fetch all required attributes during the initial scan, and 
> avoid fetching data on the delete operation itself.

> - in the AddForeignUpdateTargets hook, add resjunk entries for the columns in 
> the returning clause
> - in the ExecForeignDelete hook, fill the returned slot with values taken 
> from 
> the planSlot.

You can try it if you want.  There were some reasons not to try it in
postgres_fdw:

* this would foreclose doing something closer to the semantics for local
tables, in which the initial scan doesn't lock the rows.

* at least update operations have to pull back the actual row anyhow in
order to tell the truth when a BEFORE UPDATE trigger on the remote
changes the stored data.

Both of those boil down to the fact that the initial scan may not see
the right data to return, if there are other actors involved.  I grant
that some remote data sources don't have any such issues to worry about,
but you need to be careful.

There are some comments in postgres_fdw about eventually optimizing the
case where all update/delete quals are remote into a scan node that does
UPDATE/DELETE RETURNING to start with, and then the Modify node would
have to do what you suggest in order to have data to return.  It didn't
seem like something to tackle in the first iteration though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value

2013-03-19 Thread Steve Singer

On 13-03-18 09:17 PM, Bruce Momjian wrote:

On Mon, Mar 18, 2013 at 12:08:09PM -0400, Steve Singer wrote:

If you try running pg_upgrade with the PGSERVICE environment
variable set to some invalid/non-existent service pg_upgrade
segfaults

Program received signal SIGSEGV, Segmentation fault.
0x0040bdd1 in check_pghost_envvar () at server.c:304
304 for (option = start; option->keyword != NULL; option++)
(gdb) p start
$5 = (PQconninfoOption *) 0x0


PQconndefaults can return NULL if it has issues.

The attached patch prints a minimally useful error message. I don't
a good way of getting a more useful error message out of
PQconndefaults()

I checked this against master but it was reported to me as a issue in 9.2


Well, that's interesting.  There is no mention of PQconndefaults()
returning NULL except for out of memory:

Returns a connection options array.  This can be used to determine
all possible PQconnectdb options and their
current default values.  The return value points to an array of
PQconninfoOption structures, which ends
with an entry having a null keyword pointer.  The
-->null pointer is returned if memory could not be allocated. Note that
the current default values (val fields)
will depend on environment variables and other context.  Callers
must treat the connection options data as read-only.

Looking at libpq/fe-connect.c::PQconndefaults(), it calls
conninfo_add_defaults(), which has this:

 /*
  * If there's a service spec, use it to obtain any not-explicitly-given
  * parameters.
  */
 if (parseServiceInfo(options, errorMessage) != 0)
 return false;

so it is clearly possible for PQconndefaults() to return NULL for
service file failures.  The questions are:

*  Is this what we want?


What other choices do we have? I don't think PQconndefaults() should 
continue on as if PGSERVICE wasn't set in the environment after a 
failure from parseServiceInfo.




*  Should we document this?


Yes the documentation should indicate that PQconndefaults() can return 
NULL for more than just memory failures.



*  Should we change this to just throw a warning?


How would we communicate warnings from PQconndefaults() back to the caller?




Also, it seems pg_upgrade isn't the only utility that is confused:

contrib/postgres_fdw/option.c and contrib/dblink/dblink.c think
PQconndefaults() returning NULL means out of memory and report that
as the error string.

bin/scripts/pg_isready.c and contrib/pg_upgrade/server.c have no
check for NULL return.

libpq/test/uri-regress.c knows to throw a generic error message.

So, we have some decisions and work to do.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trust intermediate CA for client certificates

2013-03-19 Thread Stephen Frost
* Craig Ringer (cr...@2ndquadrant.com) wrote:
> As far as I'm concerned that's the immediate problem fixed. It may be
> worth adding a warning on startup if we find non-self-signed certs in
> root.crt too, something like 'WARNING: Intermediate certificate found in
> root.crt. This does not do what you expect and your configuration may be
> insecure; see the Client Certificates chapter in the documentation.'

I'm not sure that I follow this logic, unless you're proposing that
intermediate CAs only be allowed to be picked up from system-wide
configuration?  That strikes me as overly constrained as I imagine there
are valid configurations today which have intermediate CAs listed, with
the intention that they be available for PG to build the chain from a
client cert that is presented back up to the root.  Now, the client
might be able to provide such an intermediate CA cert too (one of the
fun things about SSL is that the client can send any 'missing' certs to
the server, if it has them available..), but it also might not.

> We could then look at using more flexible approaches to match PostgreSQL
> users to client certificates, like regexps or (preferably, IMO)
> DN-component based solutions to extract usernames from cert DNs, etc.
> Various ways to specify *authorization*.

Sure.

> It's looking more and more like the *authentication* side is basically
> "do you trust this CA root not to sign certs for fraudlent/fake
> SubjectDNs or issue intermediate certs that might do so? Trust: include
> it. No trust: Don't." That's what we have now, it just needs to be
> explained better in the docs.

I certainly agree that the docs could be improved in this area. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Trust intermediate CA for client certificates

2013-03-19 Thread Bruce Momjian
On Tue, Mar 19, 2013 at 09:37:18PM +0800, Craig Ringer wrote:
> On 03/19/2013 08:39 PM, Stephen Frost wrote:
> > Craig,
> >
> > * Craig Ringer (cr...@2ndquadrant.com) wrote:
> >> Yep, in most applications I've seen you usually store a list of
> >> authorized SubjectDNs or you just use your own self-signed root and
> >> issue certs from it.
> >
> > Even with a self-signed root issuing certs, you need to map the
> > individual cert to a PG user in some fashion.
> >
> 
> The more I look a this, the more it looks like trying to use
> intermediate CAs as authentication roots is largely wrong anyway. We
> should document this with something like:
> 
> NOTE: Only self-signed root CA certificates should be added to
> ssl_ca_file. If you add an intermediate CA certificate (one that's not
> self-signed) then PostgreSQL will not be able to validate client
> certificates against it because it will not have access to the full
> certificate chain. You can't fix that by adding the full certificate
> chain then PostgreSQL will then accept client certificates trusted by
> any member of the chain, including the root, so the effect is the same
> as placing only the root certificate in the file. It is not currently
> possible to trust certificates signed by an intermediate CA but not the
> parents in its certificate chain.
> 
> ... plus some explanation that having a valid trusted cert doesn't mean
> you're authorized for access, you still have to meet the requrements in
> pg_hba.conf, have a valid username/password or match an authorized
> certificate DN (depending on config), etc.
> 
> As far as I'm concerned that's the immediate problem fixed. It may be
> worth adding a warning on startup if we find non-self-signed certs in
> root.crt too, something like 'WARNING: Intermediate certificate found in
> root.crt. This does not do what you expect and your configuration may be
> insecure; see the Client Certificates chapter in the documentation.'

Yes, I like this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trust intermediate CA for client certificates

2013-03-19 Thread Craig Ringer

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/19/2013 08:39 PM, Stephen Frost wrote:
> Craig,
>
> * Craig Ringer (cr...@2ndquadrant.com) wrote:
>> Yep, in most applications I've seen you usually store a list of
>> authorized SubjectDNs or you just use your own self-signed root and
>> issue certs from it.
>
> Even with a self-signed root issuing certs, you need to map the
> individual cert to a PG user in some fashion.
>

The more I look a this, the more it looks like trying to use
intermediate CAs as authentication roots is largely wrong anyway. We
should document this with something like:

NOTE: Only self-signed root CA certificates should be added to
ssl_ca_file. If you add an intermediate CA certificate (one that's not
self-signed) then PostgreSQL will not be able to validate client
certificates against it because it will not have access to the full
certificate chain. You can't fix that by adding the full certificate
chain then PostgreSQL will then accept client certificates trusted by
any member of the chain, including the root, so the effect is the same
as placing only the root certificate in the file. It is not currently
possible to trust certificates signed by an intermediate CA but not the
parents in its certificate chain.

... plus some explanation that having a valid trusted cert doesn't mean
you're authorized for access, you still have to meet the requrements in
pg_hba.conf, have a valid username/password or match an authorized
certificate DN (depending on config), etc.

As far as I'm concerned that's the immediate problem fixed. It may be
worth adding a warning on startup if we find non-self-signed certs in
root.crt too, something like 'WARNING: Intermediate certificate found in
root.crt. This does not do what you expect and your configuration may be
insecure; see the Client Certificates chapter in the documentation.'



We could then look at using more flexible approaches to match PostgreSQL
users to client certificates, like regexps or (preferably, IMO)
DN-component based solutions to extract usernames from cert DNs, etc.
Various ways to specify *authorization*.

It's looking more and more like the *authentication* side is basically
"do you trust this CA root not to sign certs for fraudlent/fake
SubjectDNs or issue intermediate certs that might do so? Trust: include
it. No trust: Don't." That's what we have now, it just needs to be
explained better in the docs.

- -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRSGoOAAoJELBXNkqjr+S26esIALSmgX6/4lC+J7W3YPDpl1DE
UJsSGc46oBZbC/5xDwBELh2Tg+fqzIe+Kmx+EpsC20MaGinqEz9iwTb2M7vTFhxh
nvAkp1Em8MhR6lvCKITjPnDBCv7yQ7K3yTAfHO+LU2J1t3eVhStpXh71/73pRLoQ
p3SAUwO0EBnZFdY2HVLPABK7tpjuf5Mpn0QFR9T+KvsgcP9QXiV0UTFI0IxlQrpE
NRlJfPwkoYAweISTACrDwqJHJ3sL/qLdOQ8l4BCsiwtqynX7fPhxmDUuBXrOTqlS
dwW9ZkBJ9jvXjF3PPk1t0oujlMJGBC4Y7xgIb0Kd87Vyv/OTkWE4XKriDhIH6oQ=
=f3qr
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina
On Thu, Mar 14, 2013 at 8:07 PM, Tom Lane  wrote:
> Daniel Farina  writes:
>> On Tue, Mar 12, 2013 at 11:51 AM, Tom Lane  wrote:
>>> Yeah, watching the remote side's datestyle and intervalstyle and
>>> matching them (for both input and output) would probably work.
>
>> Alright, so I've been whacking at this and there's one interesting
>> thing to ask about: saving and restoring the local GUCs.  There are a
>> bunch of things about GUCs besides their value that are maintained,
>> such as their 'source', so writing a little ad-hoc save/restore is not
>> going to do the right thing.
>
> Right, you should use NewGUCNestLevel/AtEOXact_GUC.  Look at the fixes
> I committed in postgres_fdw a day or two ago for an example.

Thanks.  Here's a patch.  Here is the comments on top of the patch file inlined:

Similar in purpose to cc3f281ffb0a5d9b187e7a7b7de4a045809ff683, but
taking into account that a dblink caller may choose to cause arbitrary
changes to DateStyle and IntervalStyle.  To handle this, it is
necessary to use PQparameterStatus before parsing any input, every
time.  This is avoided in cases that do not include the two
problematic types treated -- interval and timestamptz -- by scanning
the TupleDesc's types first.

Although it has been suggested that extra_float_digits would need
similar treatment as IntervalStyle and DateStyle (as it's seen in
pg_dump), extra_float_digits is not an GUC_REPORT-ed GUC and is not
able to be checked in PQparameterStatus, and furthermore, the float4
and float8 parsers are not sensitive to the GUC anyway: both accept as
much precision as is provided in an unambiguous way.

>> So, I can add one more such use of AtEOXact_GUC for the dblink fix,
>> but would it also be appropriate to find some new names for the
>> concepts (instead of AtEOXact_GUC and isCommit) here to more
>> accurately express what's going on?
>
> Meh.  I guess we could invent an "EndGUCNestLevel" that's a wrapper
> around AtEOXact_GUC, but I'm not that excited about it ...

Per your sentiment, I won't pursue that then.

Overall, the patch I think has reasonably thorough regression testing
(I tried to hit the several code paths whereby one would have to scan
TupleDescs, as well as a few other edge cases) and has some of the
obvious optimizations in place: it doesn't call PQparameterStatus more
than once per vulnerable type per TupleDesc scan, and avoids using the
 parameter status procedure at all if there are no affected types.

The mechanisms may be overwrought though, since it was originally
intended to generalize to three types before I realized that
extra_float_digits is another kind of problem entirely, leaving just
IntervalStyle and DateStyle remaining, which perhaps could have been
treated even more specially to make the patch more terse.

I'll add it to the commitfest.

--
fdr


dblink-guc-sensitive-types-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trust intermediate CA for client certificates

2013-03-19 Thread Stephen Frost
Craig,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> Yep, in most applications I've seen you usually store a list of
> authorized SubjectDNs or you just use your own self-signed root and
> issue certs from it.

Even with a self-signed root issuing certs, you need to map the
individual cert to a PG user in some fashion.

> I'm pretty sure I've seen tools match on part of the DN, like the
> organisation field, but since I can't remember *where* I'm not sure
> that's all that useful.

Looking at what other tools do and how they handle this question would
certainly be a good idea.

> I don't know about "very rare" but it's certainly not common outside
> very large orgs. I tried to find a CA that'd let me issue intermediate
> client certs for 2ndQuadrant but found nobody that'd do it for
> certificate volumes less than several thousand new certs a month. I'd
> been using intermediate CAs based on my own self-signed CA root quite
> heavily in infrastructure elsewhere I was rather surprised that the same
> sort of thing wasn't easily available for public CAs.

It's pretty simple, really- issuing certs is how the public CAs make
their money.  If they give you a CA cert that can issue certs, they're
cut out of the loop.

> I get the impression it's fairly common in internal infrastructure,
> especially with the cert management tools offered by Microsoft Active
> Directory servers, but have no strong information to substantiate this.
> Nor do I know whether we need to support this mode of operation.

In general, CAs view intermediate certs as a way to provide automated
systems while having their self-signed root private key highly
protected.  The intermediate CAs therefore have a shorter life-span and
end up changing much more frequently than the root certs (though root
certs certainly also do change, but it's much more painful).  That's one
of the reasons that they're bad to use as part of the authentication
criteria.

The problem with trying to use intermediate CAs as a way of dividing up
organizational responsibility is simply that there's very few systems
out there which will support that kind of configuration, from what I've
seen.  Wrt Active Directory this problem is actually very well solved
through use of Kerberos, where you have multiple realms, directional
trust between the realms, and most tools (including PG) understand that
a principal is the combination of username@REALM and let you authorize
based on that.

> BTW, This discussion has made me realise that I know less about SSL/TLS
> and X.509 certificate extensions than I'd like to when dealing with this
> topic. In particular, I don't know whether a CA can issue an
> intermediate CA with extensions that restrict it to validly signing only
> host certificates for hosts under a particular domain or
> user-identifying client certs with CNs under a particular organisation -
> and whether, if such extensions exist, applications actually check them
> when verifying the certificate trust chain.

You can set flags on certificates but I've not seen the kind of complex
restrictions that you're describing.  Remember that anything that the CA
sets for an intermediate CA cert would have to be checked by whatever
software is doing the certificate validation, meaning that you'd have to
make sure all your certificate-based software is updated to do that kind
of validation (and do it in a consistent way, or you'd get all kinds of
fun failures).

> Ugh, that's not something I've ever had the ... privilege ... to deal
> with before.

I worked for the company that built the original federal bridge
software. :)  I wasn't directly involved in that project, but I
certainly gained some understanding of the complexities through working
with the folks who were.

> Only for using intermediate certs as authorization roots, and it may be
> reasonable to say "we don't support that, use an authorized DN list". Or
> come up with a better solution like checking attributes of the SubjectDN
> for authorization purposes after validating the signature chain to prove
> authenticity.

I think it'd be valuable to distinguish "trusted CAs" from "intermediate
CAs" in PG explicitly (as I recall, you can already do this by simply
ensuring that your OpenSSL config is set up correctly for the system
wide defaults).  That's what most serious SSL users will be familiar
with anyway.  I'm on the fence about if only supporting a list of
"trusted CAs" (through the cert files that we currently have) rises to
the level of being a security issue, but we should at least update the
documentation to reflect that all CAs listed in the file are fully
trusted and plan to provide an intermediate CA list option.  To be
honest, our entire SSL support mechanism could really use some work and
it'd be great if we had some folks looking into it seriously.  One of
the problems we've long had is the dependency on OpenSSL (yes, it's a
problem) and it'd be good to consider, if we're changing the SSL
configuration, how 

Re: [GENERAL] [HACKERS] Trust intermediate CA for client certificates

2013-03-19 Thread Bruce Momjian
On Tue, Mar 19, 2013 at 01:46:32AM -0400, Stephen Frost wrote:
> > I guess that suggests we should be calling this something like
> > 'ssl_authorized_client_roots'.
> 
> I'm no longer convinced that this really makes sense and I'm a bit
> worried about the simple authentication issue which I thought was at the
> heart of this concern.  Is there anything there that you see as being an
> issue with what we're doing currently..?

I too am worried that make SSL even more flexible will make simple setups
more complex to setup.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backward incompatible pg_basebackup and pg_receivexlog

2013-03-19 Thread Magnus Hagander
On Tue, Mar 19, 2013 at 11:39 AM, Heikki Linnakangas
 wrote:
> On 19.03.2013 04:42, Peter Eisentraut wrote:
>>
>> pg_basebackup and pg_receivexlog from 9.3 won't work with earlier
>> servers anymore.  I wonder if this has been fully thought through.  We
>> have put in a lot of effort to make client programs compatible with many
>> server versions as well as keeping the client/server protocol compatible
>> across many versions.  Both of these assumptions are now being broken,
>> which will result in all kinds of annoyances.
>>
>> It seems to me that these tools could probably be enhanced to understand
>> both old and new formats.
>
>
> Yes, this was discussed, and the consensus was to break
> backwards-compatibility in 9.3, so that we can clean up the protocol to be
> architecture-independent. That makes it easier to write portable clients,
> from 9.3 onwards. See the thread ending at
> http://www.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com.
>
>
>> Also, using the old tools against new server versions either behaves
>> funny or silently appears to work, both of which might be a problem.
>
>
> Hmm, it would be good to fix that. I wonder how, though. The most
> straightforward way would be to add an explicit version check in the
> clients, in backbranches. That would give a nice error message, but that
> would only help with new minor versions.

Still better to do it in a backbranch, than not at all. At least we
are then nicer to the ones that do keep up with upgrades, which we
recommend they do...


>> I think if we are documenting the replication protocol as part of the
>> frontend/backend protocol and are exposing client tools that use it,
>> changes need to be done with the same rigor as other protocol changes.
>
>
> Agreed. The plan is that we're going to be more careful with it from now on.
>
>
>> As far as I can tell, there is no separate version number for the
>> replication part of the protocol, so either there needs to be one or the
>> protocol as a whole needs to be updated.
>
>
> Good point.
>
> I propose that we add a version number, and call the 9.3 version version 2.
> Let's add a new field to the result set of the IDENTIFY_SYSTEM command for
> the replication protocol version number. The version number should be bumped
> if the replication protocol is changed in a non-backwards-compatible way.

+1.

> That includes changes to the messages sent in the COPY-both mode, after the
> START_REPLICATION command. If we just add new commands, there's no need to
> bump the version; a client can still check the server version number to
> determine if a command exists or not.

Sounds good.


> We could also try to support old client + new server combination to some
> extent by future-proofing the protocol a bit. We could specify that the
> client should ignore any message types that it does not understand, and also
> add a header length field to the WalData message ('w'), so that we can add
> new header fields to it that old clients can just ignore. That way we can
> keep the protocol version unchanged if we just add some optional stuff to
> it. I'm not sure how useful that is in practice though; it's not
> unreasonable that you must upgrade to the latest client, as long as the new
> client works with old server versions.

I think that's quite reasonable, as long as we detect it, and can give
a nice error message telling the user how to deal with it.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backward incompatible pg_basebackup and pg_receivexlog

2013-03-19 Thread Heikki Linnakangas

On 19.03.2013 04:42, Peter Eisentraut wrote:

pg_basebackup and pg_receivexlog from 9.3 won't work with earlier
servers anymore.  I wonder if this has been fully thought through.  We
have put in a lot of effort to make client programs compatible with many
server versions as well as keeping the client/server protocol compatible
across many versions.  Both of these assumptions are now being broken,
which will result in all kinds of annoyances.

It seems to me that these tools could probably be enhanced to understand
both old and new formats.


Yes, this was discussed, and the consensus was to break 
backwards-compatibility in 9.3, so that we can clean up the protocol to 
be architecture-independent. That makes it easier to write portable 
clients, from 9.3 onwards. See the thread ending at 
http://www.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com.



Also, using the old tools against new server versions either behaves
funny or silently appears to work, both of which might be a problem.


Hmm, it would be good to fix that. I wonder how, though. The most 
straightforward way would be to add an explicit version check in the 
clients, in backbranches. That would give a nice error message, but that 
would only help with new minor versions.



I think if we are documenting the replication protocol as part of the
frontend/backend protocol and are exposing client tools that use it,
changes need to be done with the same rigor as other protocol changes.


Agreed. The plan is that we're going to be more careful with it from now on.


As far as I can tell, there is no separate version number for the
replication part of the protocol, so either there needs to be one or the
protocol as a whole needs to be updated.


Good point.

I propose that we add a version number, and call the 9.3 version version 
2. Let's add a new field to the result set of the IDENTIFY_SYSTEM 
command for the replication protocol version number. The version number 
should be bumped if the replication protocol is changed in a 
non-backwards-compatible way. That includes changes to the messages sent 
in the COPY-both mode, after the START_REPLICATION command. If we just 
add new commands, there's no need to bump the version; a client can 
still check the server version number to determine if a command exists 
or not.


We could also try to support old client + new server combination to some 
extent by future-proofing the protocol a bit. We could specify that the 
client should ignore any message types that it does not understand, and 
also add a header length field to the WalData message ('w'), so that we 
can add new header fields to it that old clients can just ignore. That 
way we can keep the protocol version unchanged if we just add some 
optional stuff to it. I'm not sure how useful that is in practice 
though; it's not unreasonable that you must upgrade to the latest 
client, as long as the new client works with old server versions.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Writable FDW: returning clauses.

2013-03-19 Thread Ronan Dunklau
Hello.

While implementing the new writable API for FDWs, I wondered wether they 
are any obvious problems with the following behavior for handling returning 
clauses (for the delete case).

The goal is to fetch all required attributes during the initial scan, and 
avoid fetching data on the delete operation itself.

- in the AddForeignUpdateTargets hook, add resjunk entries for the columns in 
the returning clause
- in the ExecForeignDelete hook, fill the returned slot with values taken from 
the planSlot.


--
Ronan Dunklau

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Identity projection

2013-03-19 Thread Kyotaro HORIGUCHI
Thank you to all involved.

> On Friday, March 15, 2013 12:52 AM Tom Lane wrote:
> > I wrote:
> > > ... So I think this patch is missing a bet by not
> > > accepting equal() expressions.
> > 
> > I've committed this with that logic, a comment explaining exactly why
> > this is the way to do it, and some other cosmetic improvements.
> 
> Thank you. 

Thank you for your time and the good job.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used

2013-03-19 Thread Kyotaro HORIGUCHI
Thank you for committing this patch.

> Applied with some mostly-cosmetic adjustments.  I also took the
> liberty of changing some of the error message texts to line up
> more closely with the expanded documentation (eg, use "format
> specifier" not "conversion specifier" because that's the phrase
> used in the docs).

I looked over the modifications. Thanks for refining rather large
portion of documentation and comments.. and code.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers