Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 19, 2013 at 10:37 PM, Daniel Farina wrote: > I added programming around various NULL returns reading GUCs in this > revision, v4. Okay, one more of those fridge-logic bugs. Sorry for the noise. v5. A missing PG_RETHROW to get the properly finally-esque semantics: --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -642,7 +642,10 @@ dblink_fetch(PG_FUNCTION_ARGS) } PG_CATCH(); { + /* Pop any set GUCs, if necessary */ restoreLocalGucs(&rgs); + + PG_RE_THROW(); } PG_END_TRY(); This was easy to add a regression test to exercise, and so I did (not displayed here). -- fdr dblink-guc-sensitive-types-v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] citext like searches using index
On Mar 17, 2013, at 6:35 AM, Thorbjørn Weidemann wrote: > Hi David, > > I found your email-address on > http://www.postgresql.org/docs/9.2/static/citext.html. I hope it's ok to > contact you this way. > I would like to thank you for taking the time to make citext available for > Postgres, and I hope you can help me with this problem. > > In my workplace we are considering using citext in a Progress -> Postgresql > conversion that is underway. The Progress database always searches > case-insensitively. > > Simply creating a normal btree index on a citext column makes = searches use > the index, but I have not been able to create an index that can be used for > searches like > select from where LIKE 'ide%'; > > During this investigation I found out that even for varchar columns I have to > append the varchar_pattern_ops opclass to the column when creating the index > for it to be used for LIKE searches. But there is no citext_pattern_ops > opclass. > > Is there currently any way to create an index that can be used to speed up > searches like the one above? > If not, do you have any idea how it might be implemented? Perhaps I could > give it a try myself. > > Thank you in advance for any suggestions you might have. I would think that text_pattern_ops would work, no? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 19, 2013 at 6:10 PM, Daniel Farina wrote: > On Tue, Mar 19, 2013 at 3:06 PM, Daniel Farina wrote: >> On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane wrote: >>> Daniel Farina writes: On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane wrote: > I'd be inclined to eat the cost of calling PQparameterStatus every time > (which won't be that much) and instead try to optimize by avoiding the > GUC-setting overhead if the current value matches the local setting. > But even that might be premature optimization. Did you do any > performance testing to see if there was a problem worth avoiding? >>> Nope; should I invent a new way to do that, or would it be up to commit standard based on inspection alone? I'm okay either way, let me know. >>> >>> Doesn't seem that hard to test: run a dblink query that pulls back a >>> bunch of data under best-case conditions (ie, local not remote server), >>> and time it with and without the proposed fix. If the difference >>> is marginal then it's not worth working hard to optimize. >> >> Okay, will do, and here's the shorter and less mechanically intensive >> naive version that I think is the baseline: it doesn't try to optimize >> out any GUC settings and sets up the GUCs before the two >> materialization paths in dblink. > > The results. Summary: seems like grabbing the GUC and strcmp-ing is > worthwhile, but the amount of ping-ponging between processes adds some > noise to the timing results: utilization is far short of 100% on > either processor involved. Attached is a cumulative diff of the new > version, and also reproduced below are the changes to v2 that make up > v3. I added programming around various NULL returns reading GUCs in this revision, v4. The non-cumulative changes: --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -3005,8 +3005,22 @@ applyRemoteGucs(remoteGucs *rgs) /* * Attempt to avoid GUC setting if the remote and local GUCs * already have the same value. + * + * NB: Must error if the GUC is not found. */ - localVal = GetConfigOption(gucName, true, true); + localVal = GetConfigOption(gucName, false, true); + + if (remoteVal == NULL) + ereport(ERROR, + (errmsg("could not load parameter status of %s", + gucName))); + + /* + * An error must have been raised by now if GUC values could + * not be loaded for any reason. + */ + Assert(localVal != NULL); + Assert(remoteVal != NULL); if (strcmp(remoteVal, localVal) == 0) continue; -- fdr dblink-guc-sensitive-types-v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] machine-parseable object descriptions
Alvaro Herrera wrote: > .. and here's the patch. I forgot an example involving the funniest of all object classes: default ACLs. Here it is. alvherre=# create role rol1; CREATE ROLE alvherre=# create role rol2; CREATE ROLE alvherre=# create schema rol1 authorization rol1; CREATE SCHEMA alvherre=# alter default privileges for role rol1, rol2 grant insert on tables to alvherre; ALTER DEFAULT PRIVILEGES alvherre=# alter default privileges for role rol1 in schema rol1 grant insert on tables to alvherre; ALTER DEFAULT PRIVILEGES alvherre=# select oid,foo.* from pg_default_acl, lateral (select * from pg_identify_object(tableoid, oid, 0)) foo ; oid |type | schema | name | identity ---+-++--+- 48839 | default acl || | for role rol1 on tables 48840 | default acl || | for role rol2 on tables 48844 | default acl || | for role rol1 in schema rol1 on tables (4 filas) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] machine-parseable object descriptions
.. and here's the patch. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services *** a/src/backend/catalog/dependency.c --- b/src/backend/catalog/dependency.c *** *** 67,72 --- 67,73 #include "commands/trigger.h" #include "commands/typecmds.h" #include "foreign/foreign.h" + #include "funcapi.h" #include "miscadmin.h" #include "nodes/nodeFuncs.h" #include "parser/parsetree.h" *** *** 198,203 static bool stack_address_present_add_flags(const ObjectAddress *object, --- 199,210 ObjectAddressStack *stack); static void getRelationDescription(StringInfo buffer, Oid relid); static void getOpFamilyDescription(StringInfo buffer, Oid opfid); + static void getRelationTypeDescription(StringInfo buffer, Oid relid, + int32 objectSubId); + static void getProcedureTypeDescription(StringInfo buffer, Oid procid); + static void getConstraintTypeDescription(StringInfo buffer, Oid constroid); + static void getOpFamilyIdentity(StringInfo buffer, Oid opfid); + static void getRelationIdentity(StringInfo buffer, Oid relid); /* *** *** 2193,2199 getObjectClass(const ObjectAddress *object) /* only pg_class entries can have nonzero objectSubId */ if (object->classId != RelationRelationId && object->objectSubId != 0) ! elog(ERROR, "invalid objectSubId 0 for object class %u", object->classId); switch (object->classId) --- 2200,2206 /* only pg_class entries can have nonzero objectSubId */ if (object->classId != RelationRelationId && object->objectSubId != 0) ! elog(ERROR, "invalid non-zero objectSubId for object class %u", object->classId); switch (object->classId) *** *** 3087,3093 pg_describe_object(PG_FUNCTION_ARGS) Oid classid = PG_GETARG_OID(0); Oid objid = PG_GETARG_OID(1); int32 subobjid = PG_GETARG_INT32(2); ! char *description = NULL; ObjectAddress address; /* for "pinned" items in pg_depend, return null */ --- 3094,3100 Oid classid = PG_GETARG_OID(0); Oid objid = PG_GETARG_OID(1); int32 subobjid = PG_GETARG_INT32(2); ! char *description; ObjectAddress address; /* for "pinned" items in pg_depend, return null */ *** *** 3101,3103 pg_describe_object(PG_FUNCTION_ARGS) --- 3108,4145 description = getObjectDescription(&address); PG_RETURN_TEXT_P(cstring_to_text(description)); } + + Datum + pg_identify_object(PG_FUNCTION_ARGS) + { + Oid classid = PG_GETARG_OID(0); + Oid objid = PG_GETARG_OID(1); + int32 subobjid = PG_GETARG_INT32(2); + Oid schema_oid = InvalidOid; + const char *objname = NULL; + ObjectAddress address; + Datum values[4]; + bool nulls[4]; + TupleDesc tupdesc; + HeapTuple htup; + + address.classId = classid; + address.objectId = objid; + address.objectSubId = subobjid; + + /* + * Construct a tuple descriptor for the result row. This must match this + * function's pg_proc entry! + */ + tupdesc = CreateTemplateTupleDesc(4, false); + TupleDescInitEntry(tupdesc, (AttrNumber) 1, "type", + TEXTOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 2, "schema", + TEXTOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 3, "name", + TEXTOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 4, "identity", + TEXTOID, -1, 0); + + tupdesc = BlessTupleDesc(tupdesc); + + if (is_objectclass_supported(address.classId)) + { + HeapTuple objtup; + Relation catalog = heap_open(address.classId, AccessShareLock); + + objtup = get_catalog_object_by_oid(catalog, address.objectId); + if (objtup != NULL) + { + bool isnull; + AttrNumber nspAttnum; + AttrNumber nameAttnum; + + nspAttnum = get_object_attnum_namespace(address.classId); + if (nspAttnum != InvalidAttrNumber) + { + schema_oid = heap_getattr(objtup, nspAttnum, + RelationGetDescr(catalog), &isnull); + if (isnull) + elog(ERROR, "invalid null namespace in object %u/%u/%d", + address.classId, address.objectId, address.objectSubId); + } + + nameAttnum = get_object_attnum_name(address.classId); + if (nameAttnum != InvalidAttrNumber) + { + Datum nameDatum; + + nameDatum = heap_getattr(objtup, nameAttnum, + RelationGetDescr(catalog), &isnull); + if (isnull) + elog(ERROR, "invalid null name in object %u/%u/%d", + address.classId, address.objectId, address.objectSubId); + objname = quote_identifier(NameStr(*(DatumGetName(nameDatum; + } + } + + heap_close(catalog, AccessShareLock); + } + + /* object type */ + values[0] = CStringGetTextDatum(getObjectTypeDescription(&address)); + nulls[0] = false; + + /* schema name */ + if (OidIsValid(schema_oid)) + { + const char *schema = quote_identifier(get_namespace_name(schema_oid)); + + values[1] = CStringGetTextDatum
Re: [HACKERS] machine-parseable object descriptions
Dimitri Fontaine wrote: > Tom Lane writes: > > I could also live with keeping the schema column as proposed, if people > > think it has a use, but letting it be redundant with a schema name > > included in the identity string. But it seems like a bad idea to try to > > shear schema off of identity. > > +1 > > Use case for keeping the extra column: replication to a federated > database where you want to tweak the target schema. If I understood our IM discussion correctly, you were saying that for federated database replication you wanted a separate "name" column, from which you could extract a table name easily; not that you wanted a separate schema column. Anyway the schema column is also present. We can easily remove columns before commit, if we decide we don't want them. In the attached patch, we have these three columns: a "name" column, which is the object's bare name; a "schema" column, which is the schema; and an "identity" column, which is the whole thing, with all the schema qualifications that apply. There's also the type, of course. I added the name column because it seems to me that it is genuinely useful for some use cases. However, there are two problems: first, the original implementation had a slight bug in the case of column objects (it displayed the name of the relation, not the name of the column), and two I was afraid it'd be easily misused. One way to attack both things at once would to be make it NULL except in the cases where it's a truly unique identifier (tables, schemas, etc). To avoid this being a standalone "whitelist" of catalogs (which would get outdated quickly, I fear), I propose to add a new boolean option in ObjectProperty, which specifies whether the name can be used as an identifier. I have implemented it that way in the attached patch. The new identity column is amazingly verbose on things like pg_amproc entries: 10650 | 1 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_point_consistent(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid,pg_catalog.internal) 10651 | 2 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_box_union(pg_catalog.internal,pg_catalog.internal) 10652 | 3 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_point_compress(pg_catalog.internal) 10653 | 4 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_box_decompress(pg_catalog.internal) 10654 | 5 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_box_penalty(pg_catalog.internal,pg_catalog.internal,pg_catalog.internal) 10655 | 6 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_box_picksplit(pg_catalog.internal,pg_catalog.internal) 10656 | 7 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_box_same(pg_catalog.box,pg_catalog.box,pg_catalog.internal) 10657 | 8 (pg_catalog.point, pg_catalog.point) of pg_catalog.point_ops for gist: pg_catalog.gist_point_distance(pg_catalog.internal,pg_catalog.point,integer,pg_catalog.oid) Also, note how types with standard-specified names ("integer") are not qualified (which is the right thing, AFAICT). Here's another interesting example: alvherre=# create type public.integer as enum ('uno', 'dos'); CREATE TYPE alvherre=# select * from pg_identify_object('pg_type'::regclass, 'integer'::regtype, 0); type | schema | name | identity --++--+-- type | pg_catalog | int4 | integer (1 fila) alvherre=# select * from pg_identify_object('pg_type'::regclass, 'public.integer'::regtype, 0); type | schema | name| identity --++---+-- type | public | "integer" | public."integer" (1 fila) If you create a public.int4 type, there's no confusion either, so it's all consistent. Here's another bit of sample output, from pg_depend contents (at the left there's the referencing object, at the right the referenced object): alvherre=# select deptype, refd.*, ref.* from pg_depend, lateral (select * from pg_identify_object(classid, objid, objsubid) ) refd, lateral (select * from pg_identify_object(refclassid, refobjid, refobjsubid)) ref where classid <> 0 and refd.schema <> 'pg_catalog' and ref.schema <> 'information_schema' and refd.schema <> 'pg_toast'; deptype | type|schema| name | identity | type |schema| name | identity -+---+--+--+--+--+--+-+--- a | domain constraint | public | | "my constr" on public.mydom | type
Re: [HACKERS] Enabling Checksums
On 3/19/13 10:05 PM, Tom Lane wrote: FWIW, I would argue that any tradeoffs we make in this area must be made on the assumption of no such acceleration. If we can later make things better for Intel(TM) users, that's cool, but let's not screw those using other CPUs. I see compatibility with the acceleration as a tie-breaker. If there's two approaches that are otherwise about equal, such as choosing the exact CRC polynomial, you might as well pick the one that works faster with Intel's SSE. I'll make sure that this gets benchmarked soon on a decent AMD system too though. I've been itching to assemble a 24 core AMD box at home anyway, this gives me an excuse to pull the trigger on that. Thanks for the summary of how you view the zlib/libpng project state. I saw 4 releases from zlib in 2012, so it seemed possible development might still move forward there. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Ignore invalid indexes in pg_dump
Hi, If failures happen with CREATE INDEX CONCURRENTLY, the system will be let with invalid indexes. I don't think that the user would like to see invalid indexes of an existing system being recreated as valid after a restore. So why not removing from a dump invalid indexes with something like the patch attached? This should perhaps be applied in pg_dump for versions down to 8.2 where CREATE INDEX CONCURRENTLY has been implemented? I noticed some recent discussions about that: http://www.postgresql.org/message-id/20121207141236.gb4...@alvh.no-ip.org In this case the problem has been fixed in pg_upgrade directly. -- Michael 20130317_dump_only_valid_index.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Greg Smith writes: > While being a lazy researcher today instead of writing code, I > discovered that the PNG file format includes a CRC-32 on its data > chunks, and to support that there's a CRC32 function inside of zlib: > http://www.zlib.net/zlib_tech.html Hah, old sins keep coming back to haunt one ;-) Keep in mind that PNG was designed in 1995, and that any speed considerations in that spec were decided in the context of whether it would take noticeably longer to view an image downloaded over analog dialup. That design context also informed a greater interest in error checking than has been seen in any other image file format before (or since, I believe). > And they've already put some work into optimizing its table-driven > implementation. Seems possible to punt the whole problem of how to do > this efficiently toward the zlib developers, let them drop into assembly > to get the best possible Intel acceleration etc. one day. I would not hold my breath waiting for any such work from either the zlib or libpng developers; both of those projects are basically in maintenance mode AFAIK. If we want hardware acceleration we're going to have to deal with the portability issues ourselves. FWIW, I would argue that any tradeoffs we make in this area must be made on the assumption of no such acceleration. If we can later make things better for Intel(TM) users, that's cool, but let's not screw those using other CPUs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 19, 2013 at 3:06 PM, Daniel Farina wrote: > On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane wrote: >> Daniel Farina writes: >>> On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane wrote: I'd be inclined to eat the cost of calling PQparameterStatus every time (which won't be that much) and instead try to optimize by avoiding the GUC-setting overhead if the current value matches the local setting. But even that might be premature optimization. Did you do any performance testing to see if there was a problem worth avoiding? >> >>> Nope; should I invent a new way to do that, or would it be up to >>> commit standard based on inspection alone? I'm okay either way, let >>> me know. >> >> Doesn't seem that hard to test: run a dblink query that pulls back a >> bunch of data under best-case conditions (ie, local not remote server), >> and time it with and without the proposed fix. If the difference >> is marginal then it's not worth working hard to optimize. > > Okay, will do, and here's the shorter and less mechanically intensive > naive version that I think is the baseline: it doesn't try to optimize > out any GUC settings and sets up the GUCs before the two > materialization paths in dblink. The results. Summary: seems like grabbing the GUC and strcmp-ing is worthwhile, but the amount of ping-ponging between processes adds some noise to the timing results: utilization is far short of 100% on either processor involved. Attached is a cumulative diff of the new version, and also reproduced below are the changes to v2 that make up v3. ## Benchmark SELECT dblink_connect('benchconn','dbname=contrib_regression'); CREATE FUNCTION bench() RETURNS integer AS $$ DECLARE iterations integer; BEGIN iterations := 0; WHILE iterations < 30 LOOP PERFORM * FROM dblink('benchconn', 'SELECT 1') AS t(a int); iterations := iterations + 1; END LOOP; RETURN iterations; END; $$ LANGUAGE plpgsql; SELECT clock_timestamp() INTO TEMP beginning; SELECT bench(); SELECT clock_timestamp() INTO TEMP ending; SELECT 'dblink-benchmark-lines'; SELECT ending.clock_timestamp - beginning.clock_timestamp FROM beginning, ending; ## Data Setup CREATE TEMP TABLE bench_results (version text, span interval); COPY bench_results FROM STDIN WITH CSV; no-patch,@ 41.308122 secs no-patch,@ 36.63597 secs no-patch,@ 34.264119 secs no-patch,@ 34.760179 secs no-patch,@ 32.991257 secs no-patch,@ 34.538258 secs no-patch,@ 42.576354 secs no-patch,@ 39.335557 secs no-patch,@ 37.493206 secs no-patch,@ 37.812205 secs v2-applied,@ 36.550553 secs v2-applied,@ 38.608723 secs v2-applied,@ 39.415744 secs v2-applied,@ 46.091052 secs v2-applied,@ 45.425438 secs v2-applied,@ 48.219506 secs v2-applied,@ 43.514878 secs v2-applied,@ 45.892302 secs v2-applied,@ 48.479335 secs v2-applied,@ 47.632041 secs v3-strcmp,@ 32.524385 secs v3-strcmp,@ 34.982098 secs v3-strcmp,@ 34.487222 secs v3-strcmp,@ 44.394681 secs v3-strcmp,@ 44.638309 secs v3-strcmp,@ 44.113088 secs v3-strcmp,@ 45.497769 secs v3-strcmp,@ 33.530176 secs v3-strcmp,@ 32.9704 secs v3-strcmp,@ 40.84764 secs \. => SELECT version, avg(extract(epoch from span)), stddev(extract(epoch from span)) FROM bench_results GROUP BY version; version |avg | stddev ++-- no-patch | 37.1715227 | 3.17076487912923 v2-applied | 43.9829572 | 4.30572672565711 v3-strcmp | 38.7985768 | 5.54760393720725 ## Changes to v2: --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -2981,9 +2981,11 @@ initRemoteGucs(remoteGucs *rgs, PGconn *conn) static void applyRemoteGucs(remoteGucs *rgs) { - int i; const int numGucs = sizeof parseAffectingGucs / sizeof *parseAffectingGucs; + int i; + int addedGucNesting = false; + /* * Affected types require local GUC manipulations. Create a new * GUC NestLevel to overlay the remote settings. @@ -2992,14 +2994,27 @@ applyRemoteGucs(remoteGucs *rgs) * structure, so expect it to come with an invalid NestLevel. */ Assert(rgs->localGUCNestLevel == -1); - rgs->localGUCNestLevel = NewGUCNestLevel(); for (i = 0; i < numGucs; i += 1) { + int gucApplyStatus; + const char *gucName = parseAffectingGucs[i]; const char *remoteVal = PQparameterStatus(rgs->conn, gucName); + const char *localVal = GetConfigOption(gucName, true, true); - int gucApplyStatus; + /* + * Attempt to avoid GUC setting if the remote and local GUCs + * already have the same value. + */ + if (strcmp(remoteVal, localVal) == 0) + continue; + + if (!addedGucNesting) + { + rgs->localGUCNestLevel = NewGUCNestLevel(); + addedGucNesting = true; + } gucApplyStatus = set_config_option(gucName, remoteVal, PGC_USERSET, PGC_S_SESSION, -- fdr dblink-guc-sensitive-types-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/19/13 8:17 PM, Simon Riggs wrote: We know that will work, has reasonable distribution characteristics and might even speed things up rather than have two versions of CRC in the CPU cache. That sounds reasonable to me. All of these CRC options have space/time trade-offs via how large the lookup tables they use are. And if those are already sitting in the CPU data cache via their use in the WAL writes, using them for this purpose too could give them an advantage that's not obvious in a synthetic test. I'm curious how that plays out when multiple cores are involved too. It would be hilarious if optimizing the CRC calculation makes WAL-heavy workloads with checksums still net faster in the next release. Makes me wonder how much of the full-page write overhead is being gobbled up by CRC time already, on systems with a good sized write cache. I'd rather get this committed with a safe option and then y'all can discuss the fine merits of each algorithm at leisure. Yes, that's what we're already doing--it just looks like work :) -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 20 March 2013 00:03, Greg Smith wrote: > Simon suggested the other day that we should make the > exact checksum mechanism used pluggable at initdb time, just some last > minute alternatives checking on the performance of the real server code. > I've now got the WAL CRC32, the zlib CRC32, and the Intel-derived versions > Ants hacked on to compare. Selectable, not pluggable. I think the safe option is to calculate WAL CRC32, take the lowest 16 bits and use that. We know that will work, has reasonable distribution characteristics and might even speed things up rather than have two versions of CRC in the CPU cache. It also gives us just one set of code to tune to cover both. I'd rather get this committed with a safe option and then y'all can discuss the fine merits of each algorithm at leisure. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Mar 20, 2013 at 12:52 AM, Greg Smith wrote: > On 3/19/13 6:08 PM, Ants Aasma wrote: >> >> My main worry is that there is a reasonably >> large population of users out there that don't have that acceleration >> capability and will have to settle for performance overhead 4x worse >> than what you currently measured for a shared buffer swapping >> workload. > > > That would be very bad. I want to keep hammering on this part of the > implementation. If the only style of checksum that's computationally > feasible is the Fletcher one that's already been done--if that approach is > basically the most expensive one that's practical to use--I'd still consider > that a major win over doing nothing. Well there is also the SIMD checksum that outperforms hardware assisted CRC's, is almost 3 times as fast as Fletcher on the most popular platform, should run fast on every CPU that has vector instructions (almost all server CPUs from the last 10 years), should run fast even on last two generations of cellphone CPUs and I don't see any obvious errors that it misses. It will require some portability work (maybe use intrinsics instead of relying on the vectorizer) but I don't see why it wouldn't work. > While being a lazy researcher today instead of writing code, I discovered > that the PNG file format includes a CRC-32 on its data chunks, and to > support that there's a CRC32 function inside of zlib: > http://www.zlib.net/zlib_tech.html > > Is there anywhere that compiles a PostgreSQL --without-zlib that matters? > > The UI looks like this: > > ZEXTERN uLong ZEXPORT crc32 OF((uLong crc, const Bytef *buf, uInt len)); > > And they've already put some work into optimizing its table-driven > implementation. Seems possible to punt the whole problem of how to do this > efficiently toward the zlib developers, let them drop into assembly to get > the best possible Intel acceleration etc. one day. That's the same byte at a time lookup-table algorithm that Intel uses in the slice-by-8 method, zlib uses a 4 level lookup table for a smaller table but more overhead. Also, zlib uses the 0x04C11DB7 polynomial that is not supported by the Intel accelerated crc32c instruction. I believe that if we go crc32 route we should definitely pick the Castagnoli polynomial that atleast has the hope of being accelerated. I copied crc32.c, crc32.h and zutil.h from zlib to the test framework and ran the tests. While at it I also did a version where the fletcher loop was unrolled by hand 8 times. Results on sandy bridge (plain -O2 compile): CRC32 slicing by 8 Algorithm (bytes/cycle), 0.522284 CRC32 zlib (bytes/cycle), 0.308307 Fletcher Algorithm: (bytes/cycle), 1.891964 Fletcher Algorithm hand unrolled: (bytes/cycle), 3.30 SIMD Algorithm (gcc): (bytes/cycle), 0.572407 SIMD Algorithm (hand coded): (bytes/cycle), 9.124589 Results from papers: crc32c instruction (castagnoli only): 6.8 bytes/cycle pqlmulqdq based crc32: 0.9 bytes/cycle Fletcher is also still a strong contender, we just need to replace the 255 modulus with something less prone to common errors, maybe use 65521 as the modulus. I'd have to think how to best combine the values in that case. I believe we can lose the property that neither byte can be zero, just avoiding both being zero seems good enough to me. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/19/13 7:13 PM, Daniel Farina wrote: I'm confused. Postgres includes a CRC32 implementation for WAL, does it not? Are you referring to something else? I'm just pointing out that zlib includes one, too, and they might be more motivated/able as a project to chase after Intel's hardware acceleration for CRCs one day. They already have code switching from C to assembly to get extra performance out of their longest_match() function. The PostgreSQL CRC code is unlikely to go into twiddling assembly code, but zlib--which is usually linked in anyway--will. And Adler-32 isn't just an option, it's named after a dude who works on zlib, and I can see he's already playing with the Intel acceleration by some of his recent answers at http://stackoverflow.com/users/1180620/mark-adler I just re-discovered Ross Williams' CRC guide, which was already referenced in pg_crc_tables.h, so I think I'm getting close to being caught up on all the options here. Simon suggested the other day that we should make the exact checksum mechanism used pluggable at initdb time, just some last minute alternatives checking on the performance of the real server code. I've now got the WAL CRC32, the zlib CRC32, and the Intel-derived versions Ants hacked on to compare. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 03/19/2013 06:52 PM, Greg Smith wrote: While being a lazy researcher today instead of writing code, I discovered that the PNG file format includes a CRC-32 on its data chunks, and to support that there's a CRC32 function inside of zlib: http://www.zlib.net/zlib_tech.html Is there anywhere that compiles a PostgreSQL --without-zlib that matters? Some of the smaller platforms might not have it readily available. I doubt there is any common server class or general computing platform where it's not available. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Tue, Mar 19, 2013 at 3:52 PM, Greg Smith wrote: > On 3/19/13 6:08 PM, Ants Aasma wrote: >> >> My main worry is that there is a reasonably >> large population of users out there that don't have that acceleration >> capability and will have to settle for performance overhead 4x worse >> than what you currently measured for a shared buffer swapping >> workload. > > > That would be very bad. I want to keep hammering on this part of the > implementation. If the only style of checksum that's computationally > feasible is the Fletcher one that's already been done--if that approach is > basically the most expensive one that's practical to use--I'd still consider > that a major win over doing nothing. > > While being a lazy researcher today instead of writing code, I discovered > that the PNG file format includes a CRC-32 on its data chunks, and to > support that there's a CRC32 function inside of zlib: > http://www.zlib.net/zlib_tech.html > > Is there anywhere that compiles a PostgreSQL --without-zlib that matters? I'm confused. Postgres includes a CRC32 implementation for WAL, does it not? Are you referring to something else? I happen to remember this because I moved some things around to enable third party programs (like xlogdump) to be separately compiled: http://www.postgresql.org/message-id/e1s2xo0-0004uv...@gemulon.postgresql.org -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/19/13 6:08 PM, Ants Aasma wrote: My main worry is that there is a reasonably large population of users out there that don't have that acceleration capability and will have to settle for performance overhead 4x worse than what you currently measured for a shared buffer swapping workload. That would be very bad. I want to keep hammering on this part of the implementation. If the only style of checksum that's computationally feasible is the Fletcher one that's already been done--if that approach is basically the most expensive one that's practical to use--I'd still consider that a major win over doing nothing. While being a lazy researcher today instead of writing code, I discovered that the PNG file format includes a CRC-32 on its data chunks, and to support that there's a CRC32 function inside of zlib: http://www.zlib.net/zlib_tech.html Is there anywhere that compiles a PostgreSQL --without-zlib that matters? The UI looks like this: ZEXTERN uLong ZEXPORT crc32 OF((uLong crc, const Bytef *buf, uInt len)); And they've already put some work into optimizing its table-driven implementation. Seems possible to punt the whole problem of how to do this efficiently toward the zlib developers, let them drop into assembly to get the best possible Intel acceleration etc. one day. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add regression tests for ROLE (USER)
Hi, Please find attached a patch to take 'make check' code-coverage of ROLE (USER) from 59% to 91%. Any feedback is more than welcome. -- Robins Tharakan regress_user.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Tue, Mar 19, 2013 at 11:28 PM, Greg Smith wrote: > I don't remember if there's any good precedent for whether this form of BSD > licensed code can be assimilated into PostgreSQL without having to give > credit to Intel in impractical places. I hate these licenses with the > binary restrictions in them. It's easy enough to re-implement this from scratch, including the table generation if that is an issue. It's a very simple algorithm. > Yes, there are two Intel patents on how they actually implement the CRC32C > in their processor. I just read them both, and they have many very specific > claims. I suspect their purpose is to keep AMD from knocking off the exact > way Intel does this in hardware. But they also contributed CRC32C code to > Linux: > > https://lwn.net/Articles/292984/ > http://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git/tree/arch/x86/crypto/crc32c-pcl-intel-asm_64.S > > with a dual license, GPLv2 and the 2 clause BSD again. In theory any CRC32C > implementation might get dragged into court over Intel's patents if they > wanted to shake someone down. Thanks for checking that out. The kernel code is indeed the same 3 parallel CRC's combined at the end method described in the paper. Looks like that is thankfully a non-issue. > The main message I took away from this paper is that it's possible to speed > up CRC computation if you fix a) the CRC polynomial and b) the size of the > input buffer. There may be some good optimization possibilities in both > those, given I'd only expect Postgres to use one polynomial and the typical > database page sizes. Intel's processor acceleration has optimizations for > running against 1K blocks for example. I don't think requiring the database > page size to be a multiple of 1K is ever going to be an unreasonable > limitation, if that's what it takes to get useful hardware acceleration. The variable size CRC seemed to asymptotically approach the fixed block speed at 1k. It only affects the specifics of the final recombination. That said the, fixed size 1k looks good enough if we decide to go this route. My main worry is that there is a reasonably large population of users out there that don't have that acceleration capability and will have to settle for performance overhead 4x worse than what you currently measured for a shared buffer swapping workload. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane wrote: > Daniel Farina writes: >> On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane wrote: >>> I'd be inclined to eat the cost of calling PQparameterStatus every time >>> (which won't be that much) and instead try to optimize by avoiding the >>> GUC-setting overhead if the current value matches the local setting. >>> But even that might be premature optimization. Did you do any >>> performance testing to see if there was a problem worth avoiding? > >> Nope; should I invent a new way to do that, or would it be up to >> commit standard based on inspection alone? I'm okay either way, let >> me know. > > Doesn't seem that hard to test: run a dblink query that pulls back a > bunch of data under best-case conditions (ie, local not remote server), > and time it with and without the proposed fix. If the difference > is marginal then it's not worth working hard to optimize. Okay, will do, and here's the shorter and less mechanically intensive naive version that I think is the baseline: it doesn't try to optimize out any GUC settings and sets up the GUCs before the two materialization paths in dblink. Something I forgot to ask about is about if an strangely-minded type input function could whack around the GUC as records are being remitted, which would necessitate per-tuple polling of pqParameterStatus (e.g. in the inner loop of a materialization) . That seemed pretty "out there," but I'm broaching it for completeness. I'll see how much of a penalty it is vs. not applying any patch at all next. -- fdr dblink-guc-sensitive-types-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
Daniel Farina writes: > On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane wrote: >> I'd be inclined to eat the cost of calling PQparameterStatus every time >> (which won't be that much) and instead try to optimize by avoiding the >> GUC-setting overhead if the current value matches the local setting. >> But even that might be premature optimization. Did you do any >> performance testing to see if there was a problem worth avoiding? > Nope; should I invent a new way to do that, or would it be up to > commit standard based on inspection alone? I'm okay either way, let > me know. Doesn't seem that hard to test: run a dblink query that pulls back a bunch of data under best-case conditions (ie, local not remote server), and time it with and without the proposed fix. If the difference is marginal then it's not worth working hard to optimize. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/18/13 8:17 PM, Ants Aasma wrote: I looked for fast CRC implementations on the net. The fastest plain C variant I could find was one produced by Intels R&D department (available with a BSD license [1], requires some porting). Very specifically, it references http://opensource.org/licenses/bsd-license.html as the 2 clause BSD license it is released under. If PostgreSQL wanted to use that as its implementation, the source file would need to have Intel's copyright, and there's this ugly thing: "Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution." I don't remember if there's any good precedent for whether this form of BSD licensed code can be assimilated into PostgreSQL without having to give credit to Intel in impractical places. I hate these licenses with the binary restrictions in them. For CRC32C there is also an option to use the crc32 instructions available on newer Intel machines and run 3 parallel CRC calculations to cover for the 3 cycle latency on that instruction, combining them in the end [2]. Skimming the paper it looks like there are some patents in this area, so if we wish to implement this, we would have to see how we can navigate around them. Discussing patent issues, especially about how someone else implemented a feature on list, is generally bad news. But since as you noted Intel has interacted with other open-source communities already with code related to those patents, I think it's OK to talk about that for a bit. Yes, there are two Intel patents on how they actually implement the CRC32C in their processor. I just read them both, and they have many very specific claims. I suspect their purpose is to keep AMD from knocking off the exact way Intel does this in hardware. But they also contributed CRC32C code to Linux: https://lwn.net/Articles/292984/ http://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git/tree/arch/x86/crypto/crc32c-pcl-intel-asm_64.S with a dual license, GPLv2 and the 2 clause BSD again. In theory any CRC32C implementation might get dragged into court over Intel's patents if they wanted to shake someone down. But they would bring a world of hurt upon themselves for asserting a CRC32C patent claim against any open-source project, considering that they contributed this code themselves under a pair of liberal licenses. This doesn't set off any of my "beware of patents" alarms. Intel wants projects to use this approach, detect their acceleration when it's available, and run faster on Intel than AMD. Dragging free software packages into court over code they submitted would create a PR disaster for Intel. That would practically be entrapment on their part. perf accounted 25% of execution time to rdtsc instructions in the measurement loop for the handcoded variant not all of that is from the pipeline flush. To clarify this part, rdtsc is instruction that gets timing information from the processor: "Read Time Stamp Counter". So Ants is saying a lot of the runtime is the timing itself. rdtsc execution time is the overhead that the pg_test_timing utility estimates in some cases. http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf The main message I took away from this paper is that it's possible to speed up CRC computation if you fix a) the CRC polynomial and b) the size of the input buffer. There may be some good optimization possibilities in both those, given I'd only expect Postgres to use one polynomial and the typical database page sizes. Intel's processor acceleration has optimizations for running against 1K blocks for example. I don't think requiring the database page size to be a multiple of 1K is ever going to be an unreasonable limitation, if that's what it takes to get useful hardware acceleration. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane wrote: > Daniel Farina writes: >> Similar in purpose to cc3f281ffb0a5d9b187e7a7b7de4a045809ff683, but >> taking into account that a dblink caller may choose to cause arbitrary >> changes to DateStyle and IntervalStyle. To handle this, it is >> necessary to use PQparameterStatus before parsing any input, every >> time. This is avoided in cases that do not include the two >> problematic types treated -- interval and timestamptz -- by scanning >> the TupleDesc's types first. > > Hm. Is that really a win? Examining the tupdesc wouldn't be free > either, and what's more, I think you're making false assumptions about > which types have behavior dependent on the GUCs. You definitely missed > out on date and plain timestamp, and what of domains over these types? Yes, I also forgot about arrays, and nested composites in arrays or nested composites. As soon as recursive types are introduced the scan is not sufficient for sure: it's necessary to copy every GUC unless one wants to recurse through the catalogs which feels like a heavy loss. I presumed at the time that skimming over the tupdecs to compare a few Oids would be significantly cheaper than the guts of PQparameterStatus, which I believe to be a linked list traversal. I mostly wrote that optimization because it was easy coincidentally from how I chose to structure the code rather than belief in its absolute necessity. And, as you said, I forgot a few types even for the simple case, which is a bit of a relief because some of the machinery I wrote for the n = 3 case will come in useful for that. > I'd be inclined to eat the cost of calling PQparameterStatus every time > (which won't be that much) and instead try to optimize by avoiding the > GUC-setting overhead if the current value matches the local setting. > But even that might be premature optimization. Did you do any > performance testing to see if there was a problem worth avoiding? Nope; should I invent a new way to do that, or would it be up to commit standard based on inspection alone? I'm okay either way, let me know. I'll take into account these problems and post a new version. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
Daniel Farina writes: > Similar in purpose to cc3f281ffb0a5d9b187e7a7b7de4a045809ff683, but > taking into account that a dblink caller may choose to cause arbitrary > changes to DateStyle and IntervalStyle. To handle this, it is > necessary to use PQparameterStatus before parsing any input, every > time. This is avoided in cases that do not include the two > problematic types treated -- interval and timestamptz -- by scanning > the TupleDesc's types first. Hm. Is that really a win? Examining the tupdesc wouldn't be free either, and what's more, I think you're making false assumptions about which types have behavior dependent on the GUCs. You definitely missed out on date and plain timestamp, and what of domains over these types? I'd be inclined to eat the cost of calling PQparameterStatus every time (which won't be that much) and instead try to optimize by avoiding the GUC-setting overhead if the current value matches the local setting. But even that might be premature optimization. Did you do any performance testing to see if there was a problem worth avoiding? > Although it has been suggested that extra_float_digits would need > similar treatment as IntervalStyle and DateStyle (as it's seen in > pg_dump), extra_float_digits is not an GUC_REPORT-ed GUC and is not > able to be checked in PQparameterStatus, and furthermore, the float4 > and float8 parsers are not sensitive to the GUC anyway: both accept as > much precision as is provided in an unambiguous way. Agreed, we don't need to worry so much about that one; or at least, if we do need to worry, it's on the other end of the connection from what we're fixing here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/8/13 4:40 PM, Greg Stark wrote: On Fri, Mar 8, 2013 at 5:46 PM, Josh Berkus wrote: After some examination of the systems involved, we conculded that the issue was the FreeBSD drivers for the new storage, which were unstable and had custom source patches. However, without PostgreSQL checksums, we couldn't *prove* it wasn't PostgreSQL at fault. It ended up taking weeks of testing, most of which was useless, to prove to them they had a driver problem so it could be fixed. If Postgres had had checksums, we could have avoided wasting a couple weeks looking for non-existant PostgreSQL bugs. How would Postgres checksums have proven that? It's hard to prove this sort of thing definitively. I see this more as a source of evidence that can increase confidence that the database is doing the right thing, most usefully in a replication environment. Systems that care about data integrity nowadays are running with a WAL shipping replica of some sort. Right now there's no way to grade the master vs. standby copies of data, to figure out which is likely to be the better copy. In a checksum environment, here's a new troubleshooting workflow that becomes possible: 1) Checksum error happens on the master. 2) The same block is checked on the standby. It has the same 16 bit checksum, but different data, and its checksum matches its data. 3) The copy of that block on the standby, which was shipped over the network instead of being stored locally, is probably good. 4) The database must have been consistent when the data was in RAM on the master. 5) Conclusion: there's probably something wrong at a storage layer below the database on the master. Now, of course this doesn't automatically point the finger correctly with every possible corruption possibility. But this example is a situation I've seen in the real world when a bad driver flips a random bit in a block. If Josh had been able to show his client the standby server built from streaming replication was just fine, and corruption was limited to the master, that doesn't *prove* the database isn't the problem. But it does usefully adjust the perception of what faults are likely and unlikely away from it. Right now when I see master/standby differences in data blocks, it's the old problem of telling the true time when you have two clocks. Having a checksum helps pick the right copy when there is more than one, and one has been corrupted by storage layer issues. If i understand the performance issues right the main problem is the extra round trip to the wal log which can require a sync. Is that right? I don't think this changes things such that there is a second fsync per transaction. That is a worthwhile test workload to add though. Right now the tests Jeff and I have ran have specifically avoided systems with slow fsync, because you can't really test the CPU/memory overhead very well if you're hitting the rotational latency bottleneck. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Jeff Davis writes: > I will move back to verifying the page hole, as well. > There are a few approaches: > 1. Verify that the page hole is zero before write and after read. > 2. Include it in the calculation (if we think there are some corner > cases where the hole might not be all zero). > 3. Zero the page hole before write, and verify that it's zero on read. > This can be done during the memcpy at no performance penalty in > PageSetChecksumOnCopy(), but that won't work for > PageSetChecksumInplace(). TBH, I do not think that the checksum code ought to be so familiar with the page format as to know that there *is* a hole, much less be willing to zero out what it thinks is a hole. I consider #3 totally unacceptable from a safety standpoint, and don't much care for #1 either. #2 sounds like the thing to do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving avg performance for numeric
I wrote: > [ looks at patch... ] Oh, I see what's affecting the plan: you changed > the aggtranstypes to internal for a bunch of aggregates. That's not > very good, because right now the planner takes that to mean that the > aggregate could eat a lot of space. We don't want that to happen for > these aggregates, I think. After thinking about that for awhile: if we pursue this type of optimization, what would probably be appropriate is to add an aggregate property (stored in pg_aggregate) that allows direct specification of the size that the planner should assume for the aggregate's transition value. We were getting away with a hardwired assumption of 8K for "internal" because the existing aggregates that used that transtype all had similar properties, but it was always really a band-aid not a proper solution. A per-aggregate override could be useful in other cases too. This was looking like 9.4 material already, but adding such a property would definitely put it over the top of what we could think about squeezing into 9.3, IMO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 19 March 2013 00:17, Ants Aasma wrote: > I looked for fast CRC implementations on the net. Thanks very much for great input. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 19 March 2013 17:18, Jeff Davis wrote: > I will move back to verifying the page hole, as well. That was agreed long ago... > There are a few approaches: > > 1. Verify that the page hole is zero before write and after read. > 2. Include it in the calculation (if we think there are some corner > cases where the hole might not be all zero). > 3. Zero the page hole before write, and verify that it's zero on read. > This can be done during the memcpy at no performance penalty in > PageSetChecksumOnCopy(), but that won't work for > PageSetChecksumInplace(). > > With option #2 or #3, we might also verify that the hole is all-zero if > asserts are enabled. (3) seems likely to be more expensive than (2), since we're talking unaligned memory writes rather than a single pre-fetchable block read. In any case, at initial patch commit, we should CRC the whole block and allow for the possibility of improvement following measurements. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgsql-advocacy] Call for Google Summer of Code mentors, admins
On 14 February 2013 18:02, Josh Berkus wrote: > Folks, > > Once again, Google is holding Summer of Code. We need to assess whether > we want to participate this year. > > Questions: > > - Who wants to mentor for GSOC? > > - Who can admin for GSOC? Thom? > > - Please suggest project ideas for GSOC > > - Students seeing this -- please speak up if you have projects you plan > to submit. If anyone else has more projects ideas to suggest, please do share. Students, please feel free to review the PostgreSQL Todo list for inspiration: http://wiki.postgresql.org/wiki/Todo Of course ensure you don't choose anything too ambitious or trivial. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Fri, 2013-03-15 at 14:32 +0200, Ants Aasma wrote: > The most obvious case here is that you > can swap any number of bytes from 0x00 to 0xFF or back without > affecting the hash. That's a good point. Someone (Simon?) had brought that up before, but you and Tom convinced me that it's a problem. As I said in my reply to Tom, one option is to change the modulus. > I took a look at how the fletcher-64 compiles. Great analysis, thank you. > I'm not really sure if parallel checksums would be worth doing or not. > On one hand, enabling data parallelism would make it more future > proof, on the other hand, the unvectorized variant is slower than > Fletcher-64. Looks like we still have several options being discussed. I think the checksum with modulo 255 is out, but perhaps a different modulus is still on the table. And if we can get a CRC to be fast enough, then we'd all be happy with that option. Another thing to consider is that, right now, the page is copied and then checksummed. If we can calculate the checksum during the copy, that might save us a small amount of effort. My feeling is that it would only really help if the checksum is very cheap and works on large word sizes, but I'm not sure. > On another note, I think I found a bug with the current latest patch. Ugh. Great catch, thank you! Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Sat, 2013-03-16 at 20:41 -0400, Tom Lane wrote: > Simon Riggs writes: > > On 15 March 2013 13:08, Andres Freund wrote: > >> I commented on this before, I personally think this property makes > >> fletcher a > >> not so good fit for this. Its not uncommon for parts of a block being > >> all-zero > >> and many disk corruptions actually change whole runs of bytes. [ referring to Ants's comment that the existing algorithm doesn't distinguish between 0x00 and 0xFF ] > Meh. I don't think that argument holds a lot of water. The point of > having checksums is not so much to notice corruption as to be able to > point the finger at flaky hardware. If we have an 8K page with only > 1K of data in it, and we fail to notice that the hardware dropped a lot > of bits in the other 7K, we're not doing our job; and that's not really > something to write off, because it would be a lot better if we complain > *before* the hardware manages to corrupt something valuable. I will move back to verifying the page hole, as well. There are a few approaches: 1. Verify that the page hole is zero before write and after read. 2. Include it in the calculation (if we think there are some corner cases where the hole might not be all zero). 3. Zero the page hole before write, and verify that it's zero on read. This can be done during the memcpy at no performance penalty in PageSetChecksumOnCopy(), but that won't work for PageSetChecksumInplace(). With option #2 or #3, we might also verify that the hole is all-zero if asserts are enabled. > So I think we'd be best off to pick an algorithm whose failure modes > don't line up so nicely with probable hardware failure modes. It's > worth noting that one of the reasons that CRCs are so popular is > precisely that they were designed to detect burst errors with high > probability. Another option is to use a different modulus. The page http://en.wikipedia.org/wiki/Fletcher%27s_checksum suggests that a prime number can be a good modulus for Fletcher-32. Perhaps we could use 251 instead of 255? That would make it less likely to miss a common form of hardware failure, although it would also reduce the number of possible checksums slightly (about 4% fewer than 2^16). I'm leaning toward this option now, or a CRC of some kind if the performance is reasonable. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving avg performance for numeric
[ please do not top-reply ] Hadi Moshayedi writes: > On Tue, Mar 19, 2013 at 6:25 PM, Tom Lane wrote: >> Uh, what? Fooling around with the implementation of avg() should surely >> not change any planning decisions. > I am not sure how this works, but I also changed numeric sum(), and the > views in question had a numeric sum() column. Can that have any impact? [ looks at patch... ] Oh, I see what's affecting the plan: you changed the aggtranstypes to internal for a bunch of aggregates. That's not very good, because right now the planner takes that to mean that the aggregate could eat a lot of space. We don't want that to happen for these aggregates, I think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving avg performance for numeric
I am not sure how this works, but I also changed numeric sum(), and the views in question had a numeric sum() column. Can that have any impact? I am going to dig deeper to see why this happens. On Tue, Mar 19, 2013 at 6:25 PM, Tom Lane wrote: > Kevin Grittner writes: > > Hadi Moshayedi wrote: > >> I also noticed that this patch makes matview test fail. It seems > >> that it just changes the ordering of rows for queries like > >> "SELECT * FROM tv;". Does this seem like a bug in my patch, or > >> should we add "ORDER BY" clauses to this test to make it more > >> deterministic? > > > I added some ORDER BY clauses. That is probably a good thing > > anyway for purposes of code coverage. Does that fix it for you? > > Uh, what? Fooling around with the implementation of avg() should surely > not change any planning decisions. > > regards, tom lane >
Re: [HACKERS] Improving avg performance for numeric
Kevin Grittner writes: > Hadi Moshayedi wrote: >> I also noticed that this patch makes matview test fail. It seems >> that it just changes the ordering of rows for queries like >> "SELECT * FROM tv;". Does this seem like a bug in my patch, or >> should we add "ORDER BY" clauses to this test to make it more >> deterministic? > I added some ORDER BY clauses. That is probably a good thing > anyway for purposes of code coverage. Does that fix it for you? Uh, what? Fooling around with the implementation of avg() should surely not change any planning decisions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving avg performance for numeric
2013/3/19 Kevin Grittner : > Hadi Moshayedi wrote: > >> I updated the patch by taking ideas from your patch, and unifying >> the transition struct and update function for different >> aggregates. The speed of avg improved even more. It now has 60% >> better performance than the current committed version. > > Outstanding! I did some tests ala OLAP queries and I am thinking so ~ 40% speedup for queries with AVG is realistic. Depends on other conditions. But there are lot of situation when data are in shared buffers or file system memory and then this patch can carry significant speedup - and probably can be better if some better algorithm for sum two numeric numbers in aggregate. Regards Pavel > >> I also noticed that this patch makes matview test fail. It seems >> that it just changes the ordering of rows for queries like >> "SELECT * FROM tv;". Does this seem like a bug in my patch, or >> should we add "ORDER BY" clauses to this test to make it more >> deterministic? > > I added some ORDER BY clauses. That is probably a good thing > anyway for purposes of code coverage. Does that fix it for you? > > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving avg performance for numeric
Hadi Moshayedi wrote: > I updated the patch by taking ideas from your patch, and unifying > the transition struct and update function for different > aggregates. The speed of avg improved even more. It now has 60% > better performance than the current committed version. Outstanding! > I also noticed that this patch makes matview test fail. It seems > that it just changes the ordering of rows for queries like > "SELECT * FROM tv;". Does this seem like a bug in my patch, or > should we add "ORDER BY" clauses to this test to make it more > deterministic? I added some ORDER BY clauses. That is probably a good thing anyway for purposes of code coverage. Does that fix it for you? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Writable FDW: returning clauses.
Ronan Dunklau writes: > While implementing the new writable API for FDWs, I wondered wether they > are any obvious problems with the following behavior for handling returning > clauses (for the delete case). > The goal is to fetch all required attributes during the initial scan, and > avoid fetching data on the delete operation itself. > - in the AddForeignUpdateTargets hook, add resjunk entries for the columns in > the returning clause > - in the ExecForeignDelete hook, fill the returned slot with values taken > from > the planSlot. You can try it if you want. There were some reasons not to try it in postgres_fdw: * this would foreclose doing something closer to the semantics for local tables, in which the initial scan doesn't lock the rows. * at least update operations have to pull back the actual row anyhow in order to tell the truth when a BEFORE UPDATE trigger on the remote changes the stored data. Both of those boil down to the fact that the initial scan may not see the right data to return, if there are other actors involved. I grant that some remote data sources don't have any such issues to worry about, but you need to be careful. There are some comments in postgres_fdw about eventually optimizing the case where all update/delete quals are remote into a scan node that does UPDATE/DELETE RETURNING to start with, and then the Modify node would have to do what you suggest in order to have data to return. It didn't seem like something to tackle in the first iteration though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value
On 13-03-18 09:17 PM, Bruce Momjian wrote: On Mon, Mar 18, 2013 at 12:08:09PM -0400, Steve Singer wrote: If you try running pg_upgrade with the PGSERVICE environment variable set to some invalid/non-existent service pg_upgrade segfaults Program received signal SIGSEGV, Segmentation fault. 0x0040bdd1 in check_pghost_envvar () at server.c:304 304 for (option = start; option->keyword != NULL; option++) (gdb) p start $5 = (PQconninfoOption *) 0x0 PQconndefaults can return NULL if it has issues. The attached patch prints a minimally useful error message. I don't a good way of getting a more useful error message out of PQconndefaults() I checked this against master but it was reported to me as a issue in 9.2 Well, that's interesting. There is no mention of PQconndefaults() returning NULL except for out of memory: Returns a connection options array. This can be used to determine all possible PQconnectdb options and their current default values. The return value points to an array of PQconninfoOption structures, which ends with an entry having a null keyword pointer. The -->null pointer is returned if memory could not be allocated. Note that the current default values (val fields) will depend on environment variables and other context. Callers must treat the connection options data as read-only. Looking at libpq/fe-connect.c::PQconndefaults(), it calls conninfo_add_defaults(), which has this: /* * If there's a service spec, use it to obtain any not-explicitly-given * parameters. */ if (parseServiceInfo(options, errorMessage) != 0) return false; so it is clearly possible for PQconndefaults() to return NULL for service file failures. The questions are: * Is this what we want? What other choices do we have? I don't think PQconndefaults() should continue on as if PGSERVICE wasn't set in the environment after a failure from parseServiceInfo. * Should we document this? Yes the documentation should indicate that PQconndefaults() can return NULL for more than just memory failures. * Should we change this to just throw a warning? How would we communicate warnings from PQconndefaults() back to the caller? Also, it seems pg_upgrade isn't the only utility that is confused: contrib/postgres_fdw/option.c and contrib/dblink/dblink.c think PQconndefaults() returning NULL means out of memory and report that as the error string. bin/scripts/pg_isready.c and contrib/pg_upgrade/server.c have no check for NULL return. libpq/test/uri-regress.c knows to throw a generic error message. So, we have some decisions and work to do. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trust intermediate CA for client certificates
* Craig Ringer (cr...@2ndquadrant.com) wrote: > As far as I'm concerned that's the immediate problem fixed. It may be > worth adding a warning on startup if we find non-self-signed certs in > root.crt too, something like 'WARNING: Intermediate certificate found in > root.crt. This does not do what you expect and your configuration may be > insecure; see the Client Certificates chapter in the documentation.' I'm not sure that I follow this logic, unless you're proposing that intermediate CAs only be allowed to be picked up from system-wide configuration? That strikes me as overly constrained as I imagine there are valid configurations today which have intermediate CAs listed, with the intention that they be available for PG to build the chain from a client cert that is presented back up to the root. Now, the client might be able to provide such an intermediate CA cert too (one of the fun things about SSL is that the client can send any 'missing' certs to the server, if it has them available..), but it also might not. > We could then look at using more flexible approaches to match PostgreSQL > users to client certificates, like regexps or (preferably, IMO) > DN-component based solutions to extract usernames from cert DNs, etc. > Various ways to specify *authorization*. Sure. > It's looking more and more like the *authentication* side is basically > "do you trust this CA root not to sign certs for fraudlent/fake > SubjectDNs or issue intermediate certs that might do so? Trust: include > it. No trust: Don't." That's what we have now, it just needs to be > explained better in the docs. I certainly agree that the docs could be improved in this area. :) Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Trust intermediate CA for client certificates
On Tue, Mar 19, 2013 at 09:37:18PM +0800, Craig Ringer wrote: > On 03/19/2013 08:39 PM, Stephen Frost wrote: > > Craig, > > > > * Craig Ringer (cr...@2ndquadrant.com) wrote: > >> Yep, in most applications I've seen you usually store a list of > >> authorized SubjectDNs or you just use your own self-signed root and > >> issue certs from it. > > > > Even with a self-signed root issuing certs, you need to map the > > individual cert to a PG user in some fashion. > > > > The more I look a this, the more it looks like trying to use > intermediate CAs as authentication roots is largely wrong anyway. We > should document this with something like: > > NOTE: Only self-signed root CA certificates should be added to > ssl_ca_file. If you add an intermediate CA certificate (one that's not > self-signed) then PostgreSQL will not be able to validate client > certificates against it because it will not have access to the full > certificate chain. You can't fix that by adding the full certificate > chain then PostgreSQL will then accept client certificates trusted by > any member of the chain, including the root, so the effect is the same > as placing only the root certificate in the file. It is not currently > possible to trust certificates signed by an intermediate CA but not the > parents in its certificate chain. > > ... plus some explanation that having a valid trusted cert doesn't mean > you're authorized for access, you still have to meet the requrements in > pg_hba.conf, have a valid username/password or match an authorized > certificate DN (depending on config), etc. > > As far as I'm concerned that's the immediate problem fixed. It may be > worth adding a warning on startup if we find non-self-signed certs in > root.crt too, something like 'WARNING: Intermediate certificate found in > root.crt. This does not do what you expect and your configuration may be > insecure; see the Client Certificates chapter in the documentation.' Yes, I like this. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trust intermediate CA for client certificates
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/19/2013 08:39 PM, Stephen Frost wrote: > Craig, > > * Craig Ringer (cr...@2ndquadrant.com) wrote: >> Yep, in most applications I've seen you usually store a list of >> authorized SubjectDNs or you just use your own self-signed root and >> issue certs from it. > > Even with a self-signed root issuing certs, you need to map the > individual cert to a PG user in some fashion. > The more I look a this, the more it looks like trying to use intermediate CAs as authentication roots is largely wrong anyway. We should document this with something like: NOTE: Only self-signed root CA certificates should be added to ssl_ca_file. If you add an intermediate CA certificate (one that's not self-signed) then PostgreSQL will not be able to validate client certificates against it because it will not have access to the full certificate chain. You can't fix that by adding the full certificate chain then PostgreSQL will then accept client certificates trusted by any member of the chain, including the root, so the effect is the same as placing only the root certificate in the file. It is not currently possible to trust certificates signed by an intermediate CA but not the parents in its certificate chain. ... plus some explanation that having a valid trusted cert doesn't mean you're authorized for access, you still have to meet the requrements in pg_hba.conf, have a valid username/password or match an authorized certificate DN (depending on config), etc. As far as I'm concerned that's the immediate problem fixed. It may be worth adding a warning on startup if we find non-self-signed certs in root.crt too, something like 'WARNING: Intermediate certificate found in root.crt. This does not do what you expect and your configuration may be insecure; see the Client Certificates chapter in the documentation.' We could then look at using more flexible approaches to match PostgreSQL users to client certificates, like regexps or (preferably, IMO) DN-component based solutions to extract usernames from cert DNs, etc. Various ways to specify *authorization*. It's looking more and more like the *authentication* side is basically "do you trust this CA root not to sign certs for fraudlent/fake SubjectDNs or issue intermediate certs that might do so? Trust: include it. No trust: Don't." That's what we have now, it just needs to be explained better in the docs. - -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRSGoOAAoJELBXNkqjr+S26esIALSmgX6/4lC+J7W3YPDpl1DE UJsSGc46oBZbC/5xDwBELh2Tg+fqzIe+Kmx+EpsC20MaGinqEz9iwTb2M7vTFhxh nvAkp1Em8MhR6lvCKITjPnDBCv7yQ7K3yTAfHO+LU2J1t3eVhStpXh71/73pRLoQ p3SAUwO0EBnZFdY2HVLPABK7tpjuf5Mpn0QFR9T+KvsgcP9QXiV0UTFI0IxlQrpE NRlJfPwkoYAweISTACrDwqJHJ3sL/qLdOQ8l4BCsiwtqynX7fPhxmDUuBXrOTqlS dwW9ZkBJ9jvXjF3PPk1t0oujlMJGBC4Y7xgIb0Kd87Vyv/OTkWE4XKriDhIH6oQ= =f3qr -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)
On Thu, Mar 14, 2013 at 8:07 PM, Tom Lane wrote: > Daniel Farina writes: >> On Tue, Mar 12, 2013 at 11:51 AM, Tom Lane wrote: >>> Yeah, watching the remote side's datestyle and intervalstyle and >>> matching them (for both input and output) would probably work. > >> Alright, so I've been whacking at this and there's one interesting >> thing to ask about: saving and restoring the local GUCs. There are a >> bunch of things about GUCs besides their value that are maintained, >> such as their 'source', so writing a little ad-hoc save/restore is not >> going to do the right thing. > > Right, you should use NewGUCNestLevel/AtEOXact_GUC. Look at the fixes > I committed in postgres_fdw a day or two ago for an example. Thanks. Here's a patch. Here is the comments on top of the patch file inlined: Similar in purpose to cc3f281ffb0a5d9b187e7a7b7de4a045809ff683, but taking into account that a dblink caller may choose to cause arbitrary changes to DateStyle and IntervalStyle. To handle this, it is necessary to use PQparameterStatus before parsing any input, every time. This is avoided in cases that do not include the two problematic types treated -- interval and timestamptz -- by scanning the TupleDesc's types first. Although it has been suggested that extra_float_digits would need similar treatment as IntervalStyle and DateStyle (as it's seen in pg_dump), extra_float_digits is not an GUC_REPORT-ed GUC and is not able to be checked in PQparameterStatus, and furthermore, the float4 and float8 parsers are not sensitive to the GUC anyway: both accept as much precision as is provided in an unambiguous way. >> So, I can add one more such use of AtEOXact_GUC for the dblink fix, >> but would it also be appropriate to find some new names for the >> concepts (instead of AtEOXact_GUC and isCommit) here to more >> accurately express what's going on? > > Meh. I guess we could invent an "EndGUCNestLevel" that's a wrapper > around AtEOXact_GUC, but I'm not that excited about it ... Per your sentiment, I won't pursue that then. Overall, the patch I think has reasonably thorough regression testing (I tried to hit the several code paths whereby one would have to scan TupleDescs, as well as a few other edge cases) and has some of the obvious optimizations in place: it doesn't call PQparameterStatus more than once per vulnerable type per TupleDesc scan, and avoids using the parameter status procedure at all if there are no affected types. The mechanisms may be overwrought though, since it was originally intended to generalize to three types before I realized that extra_float_digits is another kind of problem entirely, leaving just IntervalStyle and DateStyle remaining, which perhaps could have been treated even more specially to make the patch more terse. I'll add it to the commitfest. -- fdr dblink-guc-sensitive-types-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trust intermediate CA for client certificates
Craig, * Craig Ringer (cr...@2ndquadrant.com) wrote: > Yep, in most applications I've seen you usually store a list of > authorized SubjectDNs or you just use your own self-signed root and > issue certs from it. Even with a self-signed root issuing certs, you need to map the individual cert to a PG user in some fashion. > I'm pretty sure I've seen tools match on part of the DN, like the > organisation field, but since I can't remember *where* I'm not sure > that's all that useful. Looking at what other tools do and how they handle this question would certainly be a good idea. > I don't know about "very rare" but it's certainly not common outside > very large orgs. I tried to find a CA that'd let me issue intermediate > client certs for 2ndQuadrant but found nobody that'd do it for > certificate volumes less than several thousand new certs a month. I'd > been using intermediate CAs based on my own self-signed CA root quite > heavily in infrastructure elsewhere I was rather surprised that the same > sort of thing wasn't easily available for public CAs. It's pretty simple, really- issuing certs is how the public CAs make their money. If they give you a CA cert that can issue certs, they're cut out of the loop. > I get the impression it's fairly common in internal infrastructure, > especially with the cert management tools offered by Microsoft Active > Directory servers, but have no strong information to substantiate this. > Nor do I know whether we need to support this mode of operation. In general, CAs view intermediate certs as a way to provide automated systems while having their self-signed root private key highly protected. The intermediate CAs therefore have a shorter life-span and end up changing much more frequently than the root certs (though root certs certainly also do change, but it's much more painful). That's one of the reasons that they're bad to use as part of the authentication criteria. The problem with trying to use intermediate CAs as a way of dividing up organizational responsibility is simply that there's very few systems out there which will support that kind of configuration, from what I've seen. Wrt Active Directory this problem is actually very well solved through use of Kerberos, where you have multiple realms, directional trust between the realms, and most tools (including PG) understand that a principal is the combination of username@REALM and let you authorize based on that. > BTW, This discussion has made me realise that I know less about SSL/TLS > and X.509 certificate extensions than I'd like to when dealing with this > topic. In particular, I don't know whether a CA can issue an > intermediate CA with extensions that restrict it to validly signing only > host certificates for hosts under a particular domain or > user-identifying client certs with CNs under a particular organisation - > and whether, if such extensions exist, applications actually check them > when verifying the certificate trust chain. You can set flags on certificates but I've not seen the kind of complex restrictions that you're describing. Remember that anything that the CA sets for an intermediate CA cert would have to be checked by whatever software is doing the certificate validation, meaning that you'd have to make sure all your certificate-based software is updated to do that kind of validation (and do it in a consistent way, or you'd get all kinds of fun failures). > Ugh, that's not something I've ever had the ... privilege ... to deal > with before. I worked for the company that built the original federal bridge software. :) I wasn't directly involved in that project, but I certainly gained some understanding of the complexities through working with the folks who were. > Only for using intermediate certs as authorization roots, and it may be > reasonable to say "we don't support that, use an authorized DN list". Or > come up with a better solution like checking attributes of the SubjectDN > for authorization purposes after validating the signature chain to prove > authenticity. I think it'd be valuable to distinguish "trusted CAs" from "intermediate CAs" in PG explicitly (as I recall, you can already do this by simply ensuring that your OpenSSL config is set up correctly for the system wide defaults). That's what most serious SSL users will be familiar with anyway. I'm on the fence about if only supporting a list of "trusted CAs" (through the cert files that we currently have) rises to the level of being a security issue, but we should at least update the documentation to reflect that all CAs listed in the file are fully trusted and plan to provide an intermediate CA list option. To be honest, our entire SSL support mechanism could really use some work and it'd be great if we had some folks looking into it seriously. One of the problems we've long had is the dependency on OpenSSL (yes, it's a problem) and it'd be good to consider, if we're changing the SSL configuration, how
Re: [GENERAL] [HACKERS] Trust intermediate CA for client certificates
On Tue, Mar 19, 2013 at 01:46:32AM -0400, Stephen Frost wrote: > > I guess that suggests we should be calling this something like > > 'ssl_authorized_client_roots'. > > I'm no longer convinced that this really makes sense and I'm a bit > worried about the simple authentication issue which I thought was at the > heart of this concern. Is there anything there that you see as being an > issue with what we're doing currently..? I too am worried that make SSL even more flexible will make simple setups more complex to setup. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] backward incompatible pg_basebackup and pg_receivexlog
On Tue, Mar 19, 2013 at 11:39 AM, Heikki Linnakangas wrote: > On 19.03.2013 04:42, Peter Eisentraut wrote: >> >> pg_basebackup and pg_receivexlog from 9.3 won't work with earlier >> servers anymore. I wonder if this has been fully thought through. We >> have put in a lot of effort to make client programs compatible with many >> server versions as well as keeping the client/server protocol compatible >> across many versions. Both of these assumptions are now being broken, >> which will result in all kinds of annoyances. >> >> It seems to me that these tools could probably be enhanced to understand >> both old and new formats. > > > Yes, this was discussed, and the consensus was to break > backwards-compatibility in 9.3, so that we can clean up the protocol to be > architecture-independent. That makes it easier to write portable clients, > from 9.3 onwards. See the thread ending at > http://www.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com. > > >> Also, using the old tools against new server versions either behaves >> funny or silently appears to work, both of which might be a problem. > > > Hmm, it would be good to fix that. I wonder how, though. The most > straightforward way would be to add an explicit version check in the > clients, in backbranches. That would give a nice error message, but that > would only help with new minor versions. Still better to do it in a backbranch, than not at all. At least we are then nicer to the ones that do keep up with upgrades, which we recommend they do... >> I think if we are documenting the replication protocol as part of the >> frontend/backend protocol and are exposing client tools that use it, >> changes need to be done with the same rigor as other protocol changes. > > > Agreed. The plan is that we're going to be more careful with it from now on. > > >> As far as I can tell, there is no separate version number for the >> replication part of the protocol, so either there needs to be one or the >> protocol as a whole needs to be updated. > > > Good point. > > I propose that we add a version number, and call the 9.3 version version 2. > Let's add a new field to the result set of the IDENTIFY_SYSTEM command for > the replication protocol version number. The version number should be bumped > if the replication protocol is changed in a non-backwards-compatible way. +1. > That includes changes to the messages sent in the COPY-both mode, after the > START_REPLICATION command. If we just add new commands, there's no need to > bump the version; a client can still check the server version number to > determine if a command exists or not. Sounds good. > We could also try to support old client + new server combination to some > extent by future-proofing the protocol a bit. We could specify that the > client should ignore any message types that it does not understand, and also > add a header length field to the WalData message ('w'), so that we can add > new header fields to it that old clients can just ignore. That way we can > keep the protocol version unchanged if we just add some optional stuff to > it. I'm not sure how useful that is in practice though; it's not > unreasonable that you must upgrade to the latest client, as long as the new > client works with old server versions. I think that's quite reasonable, as long as we detect it, and can give a nice error message telling the user how to deal with it. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] backward incompatible pg_basebackup and pg_receivexlog
On 19.03.2013 04:42, Peter Eisentraut wrote: pg_basebackup and pg_receivexlog from 9.3 won't work with earlier servers anymore. I wonder if this has been fully thought through. We have put in a lot of effort to make client programs compatible with many server versions as well as keeping the client/server protocol compatible across many versions. Both of these assumptions are now being broken, which will result in all kinds of annoyances. It seems to me that these tools could probably be enhanced to understand both old and new formats. Yes, this was discussed, and the consensus was to break backwards-compatibility in 9.3, so that we can clean up the protocol to be architecture-independent. That makes it easier to write portable clients, from 9.3 onwards. See the thread ending at http://www.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com. Also, using the old tools against new server versions either behaves funny or silently appears to work, both of which might be a problem. Hmm, it would be good to fix that. I wonder how, though. The most straightforward way would be to add an explicit version check in the clients, in backbranches. That would give a nice error message, but that would only help with new minor versions. I think if we are documenting the replication protocol as part of the frontend/backend protocol and are exposing client tools that use it, changes need to be done with the same rigor as other protocol changes. Agreed. The plan is that we're going to be more careful with it from now on. As far as I can tell, there is no separate version number for the replication part of the protocol, so either there needs to be one or the protocol as a whole needs to be updated. Good point. I propose that we add a version number, and call the 9.3 version version 2. Let's add a new field to the result set of the IDENTIFY_SYSTEM command for the replication protocol version number. The version number should be bumped if the replication protocol is changed in a non-backwards-compatible way. That includes changes to the messages sent in the COPY-both mode, after the START_REPLICATION command. If we just add new commands, there's no need to bump the version; a client can still check the server version number to determine if a command exists or not. We could also try to support old client + new server combination to some extent by future-proofing the protocol a bit. We could specify that the client should ignore any message types that it does not understand, and also add a header length field to the WalData message ('w'), so that we can add new header fields to it that old clients can just ignore. That way we can keep the protocol version unchanged if we just add some optional stuff to it. I'm not sure how useful that is in practice though; it's not unreasonable that you must upgrade to the latest client, as long as the new client works with old server versions. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Writable FDW: returning clauses.
Hello. While implementing the new writable API for FDWs, I wondered wether they are any obvious problems with the following behavior for handling returning clauses (for the delete case). The goal is to fetch all required attributes during the initial scan, and avoid fetching data on the delete operation itself. - in the AddForeignUpdateTargets hook, add resjunk entries for the columns in the returning clause - in the ExecForeignDelete hook, fill the returned slot with values taken from the planSlot. -- Ronan Dunklau signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Identity projection
Thank you to all involved. > On Friday, March 15, 2013 12:52 AM Tom Lane wrote: > > I wrote: > > > ... So I think this patch is missing a bet by not > > > accepting equal() expressions. > > > > I've committed this with that logic, a comment explaining exactly why > > this is the way to do it, and some other cosmetic improvements. > > Thank you. Thank you for your time and the good job. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
Thank you for committing this patch. > Applied with some mostly-cosmetic adjustments. I also took the > liberty of changing some of the error message texts to line up > more closely with the expanded documentation (eg, use "format > specifier" not "conversion specifier" because that's the phrase > used in the docs). I looked over the modifications. Thanks for refining rather large portion of documentation and comments.. and code. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers