Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
On 12/19/2014 11:30 AM, Petr Jelinek wrote: as promised I am sending code-comment patch that explains the use of commit timestamps + nodeid C api for the conflict resolution, comments welcome. That's a little bit better, but I have to say I'm still not impressed. There are so many implicit assumptions in the system. The first assumption is that a 32-bit node id is sufficient. I'm sure it is for many replication systems, but might not be for all. Then there's the assumption that the node id should be sticky, i.e. it's set for the whole session. Again, I'm sure that's useful for many systems, but I could just as easily imagine that you'd want it to reset after every commit. To be honest, I think this patch should be reverted. Instead, we should design a system where extensions can define their own SLRUs to store additional per-transaction information. That way, each extension can have as much space per transaction as needed, and support functions that make most sense with the information. Commit timestamp tracking would be one such extension, and for this node ID stuff, you could have another one (or include it in the replication extension). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
On 2014-12-29 12:06:07 +0200, Heikki Linnakangas wrote: That's a little bit better, but I have to say I'm still not impressed. There are so many implicit assumptions in the system. The first assumption is that a 32-bit node id is sufficient. Seriously? Are we going to build facilities for replication systems with that many nodes? It seems absolutely unrealistic that a) somebody does this b) that we'll blindly meet the demands of such a super hypothetical scenario. Then there's the assumption that the node id should be sticky, i.e. it's set for the whole session. Again, I'm sure that's useful for many systems, but I could just as easily imagine that you'd want it to reset after every commit. It's trivial to add that/reset it manually. So what? To be honest, I think this patch should be reverted. Instead, we should design a system where extensions can define their own SLRUs to store additional per-transaction information. That way, each extension can have as much space per transaction as needed, and support functions that make most sense with the information. Commit timestamp tracking would be one such extension, and for this node ID stuff, you could have another one (or include it in the replication extension). If somebody wants that they should develop it. But given that we, based on previous discussions, don't want to run user defined code in the relevant phase during transaction commit *and* replay I don't think it'd be all that easy to do it fast and flexible. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
On 29/12/14 11:16, Andres Freund wrote: On 2014-12-29 12:06:07 +0200, Heikki Linnakangas wrote: That's a little bit better, but I have to say I'm still not impressed. There are so many implicit assumptions in the system. The first assumption is that a 32-bit node id is sufficient. Seriously? Are we going to build facilities for replication systems with that many nodes? It seems absolutely unrealistic that a) somebody does this b) that we'll blindly meet the demands of such a super hypothetical scenario. +1, Honestly, if somebody will ever have setup with more nodes than what fits into 32bits they will run into bigger problems than nodeid being too small. Then there's the assumption that the node id should be sticky, i.e. it's set for the whole session. Again, I'm sure that's useful for many systems, but I could just as easily imagine that you'd want it to reset after every commit. It's trivial to add that/reset it manually. So what? Yes you can reset in the extension after commit, or you can actually override both commit timestamp and nodeid after commit if you so wish. To be honest, I think this patch should be reverted. Instead, we should design a system where extensions can define their own SLRUs to store additional per-transaction information. That way, each extension can have as much space per transaction as needed, and support functions that make most sense with the information. Commit timestamp tracking would be one such extension, and for this node ID stuff, you could have another one (or include it in the replication extension). If somebody wants that they should develop it. But given that we, based on previous discussions, don't want to run user defined code in the relevant phase during transaction commit *and* replay I don't think it'd be all that easy to do it fast and flexible. Right, I would love to have custom SLRUs but I don't see it happening given those two restrictions, otherwise I would write the CommitTs patch that way in the first place... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
On 12/29/2014 12:39 PM, Petr Jelinek wrote: On 29/12/14 11:16, Andres Freund wrote: On 2014-12-29 12:06:07 +0200, Heikki Linnakangas wrote: To be honest, I think this patch should be reverted. Instead, we should design a system where extensions can define their own SLRUs to store additional per-transaction information. That way, each extension can have as much space per transaction as needed, and support functions that make most sense with the information. Commit timestamp tracking would be one such extension, and for this node ID stuff, you could have another one (or include it in the replication extension). If somebody wants that they should develop it. But given that we, based on previous discussions, don't want to run user defined code in the relevant phase during transaction commit *and* replay I don't think it'd be all that easy to do it fast and flexible. Right, I would love to have custom SLRUs but I don't see it happening given those two restrictions, otherwise I would write the CommitTs patch that way in the first place... Transaction commit and replay can treat the per-transaction information as an opaque blob. It just needs to be included in the commit record, and replay needs to write it to the SLRU. That way you don't need to run any user-defined code in those phases. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The return value of allocate_recordbuf()
On 12/26/2014 09:31 AM, Fujii Masao wrote: Hi, While reviewing FPW compression patch, I found that allocate_recordbuf() always returns TRUE though its source code comment says that FALSE is returned if out of memory. Its return value is checked in two places, but which is clearly useless. allocate_recordbuf() was introduced by 7fcbf6a, and then changed by 2c03216 so that palloc() is used instead of malloc and FALSE is never returned even if out of memory. So this seems an oversight of 2c03216. Maybe we should change it so that it checks whether we can enlarge the memory with the requested size before actually allocating the memory? Hmm. There is no way to check beforehand if a palloc() will fail because of OOM. We could check for MaxAllocSize, though. Actually, before 2c03216, when we used malloc() here, the maximum record size was 4GB. Now it's only 1GB, because of MaxAllocSize. Are we OK with that, or should we use palloc_huge? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Hi. I've changed pgaudit to work as you suggested. A quick note on the implementation: pgaudit was already installing an ExecutorCheckPerms_hook anyway; I adapted code from ExecRTECheckPerms to check if the audit role has been granted any of the permissions required for the operation. This means there are three ways to configure auditing: 1. GRANT … ON … TO audit, which logs any operations that correspond to the granted permissions. 2. Set pgaudit.roles = 'r1, r2, …', which logs everything done by r1, r2, and any of their descendants. 3. Set pgaudit.log = 'read, write, …', which logs any events in any of the listed classes. (This is a small change from the earlier behaviour where, if a role was listed in .roles, it was still subject to the .log setting. I find that more useful in practice, but since we're discussing Stephen's proposal, I implemented what he said.) The new pgaudit.c is attached here for review. Nothing else has changed from the earlier submission; and everything is in the github repository (github.com/2ndQuadrant/pgaudit). -- Abhijit /* * pgaudit/pgaudit.c * * Copyright © 2014, PostgreSQL Global Development Group * * Permission to use, copy, modify, and distribute this software and * its documentation for any purpose, without fee, and without a * written agreement is hereby granted, provided that the above * copyright notice and this paragraph and the following two * paragraphs appear in all copies. * * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT, * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED * OF THE POSSIBILITY OF SUCH DAMAGE. * * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN AS * IS BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. */ #include postgres.h #include access/htup_details.h #include access/sysattr.h #include access/xact.h #include catalog/catalog.h #include catalog/objectaccess.h #include catalog/pg_class.h #include commands/dbcommands.h #include catalog/pg_proc.h #include commands/event_trigger.h #include executor/executor.h #include executor/spi.h #include miscadmin.h #include libpq/auth.h #include nodes/nodes.h #include tcop/utility.h #include utils/acl.h #include utils/builtins.h #include utils/guc.h #include utils/lsyscache.h #include utils/memutils.h #include utils/rel.h #include utils/syscache.h #include utils/timestamp.h PG_MODULE_MAGIC; void _PG_init(void); Datum pgaudit_func_ddl_command_end(PG_FUNCTION_ARGS); Datum pgaudit_func_sql_drop(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(pgaudit_func_ddl_command_end); PG_FUNCTION_INFO_V1(pgaudit_func_sql_drop); /* * pgaudit_roles_str is the string value of the pgaudit.roles * configuration variable, which is a list of role names. */ char *pgaudit_roles_str = NULL; /* * pgaudit_log_str is the string value of the pgaudit.log configuration * variable, e.g. read, write, user. Each token corresponds to a flag * in enum LogClass below. We convert the list of tokens into a bitmap * in pgaudit_log for internal use. */ char *pgaudit_log_str = NULL; static uint64 pgaudit_log = 0; enum LogClass { LOG_NONE = 0, /* SELECT */ LOG_READ = (1 0), /* INSERT, UPDATE, DELETE, TRUNCATE */ LOG_WRITE = (1 1), /* GRANT, REVOKE, ALTER … */ LOG_PRIVILEGE = (1 2), /* CREATE/DROP/ALTER ROLE */ LOG_USER = (1 3), /* DDL: CREATE/DROP/ALTER */ LOG_DEFINITION = (1 4), /* DDL: CREATE OPERATOR etc. */ LOG_CONFIG = (1 5), /* VACUUM, REINDEX, ANALYZE */ LOG_ADMIN = (1 6), /* Function execution */ LOG_FUNCTION = (1 7), /* Absolutely everything; not available via pgaudit.log */ LOG_ALL = ~(uint64)0 }; /* * This module collects AuditEvents from various sources (event * triggers, and executor/utility hooks) and passes them to the * log_audit_event() function. * * An AuditEvent represents an operation that potentially affects a * single object. If an underlying command affects multiple objects, * multiple AuditEvents must be created to represent it. */ typedef struct { NodeTag type; const char *object_id; const char *object_type; const char *command_tag; const char *command_text; bool granted; } AuditEvent; /* * Returns the oid of the hardcoded audit role. */ static Oid audit_role_oid() { HeapTuple roleTup; Oid oid = InvalidOid; roleTup = SearchSysCache1(AUTHNAME, PointerGetDatum(audit)); if (HeapTupleIsValid(roleTup)) { oid = HeapTupleGetOid(roleTup); ReleaseSysCache(roleTup); } return oid; } /* Returns true if either pgaudit.roles or pgaudit.log is set. */ static inline bool pgaudit_configured() { return (pgaudit_roles_str *pgaudit_roles_str)
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
On 2014-12-29 12:50:23 +0200, Heikki Linnakangas wrote: On 12/29/2014 12:39 PM, Petr Jelinek wrote: On 29/12/14 11:16, Andres Freund wrote: On 2014-12-29 12:06:07 +0200, Heikki Linnakangas wrote: To be honest, I think this patch should be reverted. Instead, we should design a system where extensions can define their own SLRUs to store additional per-transaction information. That way, each extension can have as much space per transaction as needed, and support functions that make most sense with the information. Commit timestamp tracking would be one such extension, and for this node ID stuff, you could have another one (or include it in the replication extension). If somebody wants that they should develop it. But given that we, based on previous discussions, don't want to run user defined code in the relevant phase during transaction commit *and* replay I don't think it'd be all that easy to do it fast and flexible. Right, I would love to have custom SLRUs but I don't see it happening given those two restrictions, otherwise I would write the CommitTs patch that way in the first place... Transaction commit and replay can treat the per-transaction information as an opaque blob. It just needs to be included in the commit record, and replay needs to write it to the SLRU. That way you don't need to run any user-defined code in those phases. Meh. Only if you want to duplicate the timestamps from the commits. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
Hi, On 2014-12-25 11:57:29 +0530, Abhijit Menon-Sen wrote: -extern pg_crc32 pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len); +extern void pg_init_comp_crc32c(void); How about pg_choose_crc_impl() or something? +extern pg_crc32 (*pg_comp_crc32c)(pg_crc32 crc, const void *data, size_t len); /* * CRC calculation using the CRC-32C (Castagnoli) polynomial. diff --git a/src/port/pg_crc.c b/src/port/pg_crc.c index 2f9857b..6be17b0 100644 --- a/src/port/pg_crc.c +++ b/src/port/pg_crc.c @@ -21,6 +21,13 @@ #include utils/pg_crc.h #include utils/pg_crc_tables.h +#if defined(HAVE_CPUID_H) +#include cpuid.h +#elif defined(_MSC_VER) +#include intrin.h +#include nmmintrin.h +#endif + static inline uint32 bswap32(uint32 x) { #if defined(__GNUC__) || defined(__clang__) @@ -39,8 +46,8 @@ static inline uint32 bswap32(uint32 x) #define cpu_to_le32(x) x #endif -pg_crc32 -pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len) +static pg_crc32 +pg_comp_crc32c_sb8(pg_crc32 crc, const void *data, size_t len) _sb8? Unless I miss something it's not slice by 8 but rather bytewise? { const unsigned char *p = data; const uint32 *p8; @@ -61,7 +68,6 @@ pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len) */ p8 = (const uint32 *) p; - while (len = 8) { uint32 a = *p8++ ^ cpu_to_le32(crc); @@ -101,8 +107,102 @@ pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len) */ p = (const unsigned char *) p8; - while (len-- 0) + while (len 0) + { crc = pg_crc32c_table[0][(crc ^ *p++) 0xFF] ^ (crc 8); + len--; + } + + return crc; +} +static pg_crc32 +pg_asm_crc32b(pg_crc32 crc, unsigned char data) +{ Should be marked inline. +#ifdef __GNUC__ + __asm__ (crc32b %[data], %[crc]\n : [crc] +r (crc) : [data] rm (data)); Have you checked which version of gcc introduced named references to input/output parameters? return crc; +#elif defined(_MSC_VER) + return _mm_crc32_u8(crc, data); +#else +#error Don't know how to generate crc32b instruction +#endif } + +static pg_crc32 +pg_asm_crc32q(uint64 crc, unsigned long long data) +{ inline. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
At 2014-12-29 13:22:28 +0100, and...@2ndquadrant.com wrote: How about pg_choose_crc_impl() or something? Done. _sb8? Unless I miss something it's not slice by 8 but rather bytewise? This is meant to apply on top of the earlier patches I posted to implement slice-by-8. I'll attach both here. Should be marked inline. Done (and the other one too). +#ifdef __GNUC__ + __asm__ (crc32b %[data], %[crc]\n : [crc] +r (crc) : [data] rm (data)); Have you checked which version of gcc introduced named references to input/output parameters? No. The documentation calls them asmSymbolicNames, but I can't find the term (or various likely alternatives, e.g. symbolic_operand may be related) either in the release notes, the changelog, or the source (on Github). Does anyone know, or know how to find out? Meanwhile, I have attached the two patches with the other modifications. -- Abhijit diff --git a/src/include/utils/pg_crc.h b/src/include/utils/pg_crc.h index f871cba..55934e5 100644 --- a/src/include/utils/pg_crc.h +++ b/src/include/utils/pg_crc.h @@ -41,6 +41,8 @@ typedef uint32 pg_crc32; +extern pg_crc32 pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len); + /* * CRC calculation using the CRC-32C (Castagnoli) polynomial. * @@ -51,7 +53,7 @@ typedef uint32 pg_crc32; #define INIT_CRC32C(crc) ((crc) = 0x) #define FIN_CRC32C(crc) ((crc) ^= 0x) #define COMP_CRC32C(crc, data, len) \ - COMP_CRC32_NORMAL_TABLE(crc, data, len, pg_crc32c_table) + ((crc) = pg_comp_crc32c((crc), (char *) (data), (len))) #define EQ_CRC32C(c1, c2) ((c1) == (c2)) /* @@ -115,7 +117,7 @@ do { \ } while (0) /* Constant tables for CRC-32C and CRC-32 polynomials */ -extern CRCDLLIMPORT const uint32 pg_crc32c_table[]; +extern CRCDLLIMPORT const uint32 pg_crc32c_table[8][256]; extern CRCDLLIMPORT const uint32 pg_crc32_table[]; #endif /* PG_CRC_H */ diff --git a/src/include/utils/pg_crc_tables.h b/src/include/utils/pg_crc_tables.h index cb6b470..f814c91 100644 --- a/src/include/utils/pg_crc_tables.h +++ b/src/include/utils/pg_crc_tables.h @@ -28,71 +28,519 @@ * This table is based on the so-called Castagnoli polynomial (the same * that is used e.g. in iSCSI). */ -const uint32 pg_crc32c_table[256] = { - 0x, 0xF26B8303, 0xE13B70F7, 0x1350F3F4, - 0xC79A971F, 0x35F1141C, 0x26A1E7E8, 0xD4CA64EB, - 0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B, - 0x4D43CFD0, 0xBF284CD3, 0xAC78BF27, 0x5E133C24, - 0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B, - 0xD7C45070, 0x25AFD373, 0x36FF2087, 0xC494A384, - 0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54, - 0x5D1D08BF, 0xAF768BBC, 0xBC267848, 0x4E4DFB4B, - 0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A, - 0xE72719C1, 0x154C9AC2, 0x061C6936, 0xF477EA35, - 0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5, - 0x6DFE410E, 0x9F95C20D, 0x8CC531F9, 0x7EAEB2FA, - 0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45, - 0xF779DEAE, 0x05125DAD, 0x1642AE59, 0xE4292D5A, - 0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A, - 0x7DA08661, 0x8FCB0562, 0x9C9BF696, 0x6EF07595, - 0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48, - 0x86E18AA3, 0x748A09A0, 0x67DAFA54, 0x95B17957, - 0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687, - 0x0C38D26C, 0xFE53516F, 0xED03A29B, 0x1F682198, - 0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927, - 0x96BF4DCC, 0x64D4CECF, 0x77843D3B, 0x85EFBE38, - 0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8, - 0x1C661503, 0xEE0D9600, 0xFD5D65F4, 0x0F36E6F7, - 0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096, - 0xA65C047D, 0x5437877E, 0x4767748A, 0xB50CF789, - 0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859, - 0x2C855CB2, 0xDEEEDFB1, 0xCDBE2C45, 0x3FD5AF46, - 0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9, - 0xB602C312, 0x44694011, 0x5739B3E5, 0xA55230E6, - 0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36, - 0x3CDB9BDD, 0xCEB018DE, 0xDDE0EB2A, 0x2F8B6829, - 0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C, - 0x456CAC67, 0xB7072F64, 0xA457DC90, 0x563C5F93, - 0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043, - 0xCFB5F4A8, 0x3DDE77AB, 0x2E8E845F, 0xDCE5075C, - 0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3, - 0x55326B08, 0xA759E80B, 0xB4091BFF, 0x466298FC, - 0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C, - 0xDFEB33C7, 0x2D80B0C4, 0x3ED04330, 0xCCBBC033, - 0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652, - 0x65D122B9, 0x97BAA1BA, 0x84EA524E, 0x7681D14D, - 0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D, - 0xEF087A76, 0x1D63F975, 0x0E330A81, 0xFC588982, - 0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D, - 0x758FE5D6, 0x87E466D5, 0x94B49521, 0x66DF1622, - 0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2, - 0xFF56BD19, 0x0D3D3E1A, 0x1E6DCDEE, 0xEC064EED, - 0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530, - 0x0417B1DB, 0xF67C32D8, 0xE52CC12C, 0x1747422F, - 0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF, - 0x8ECEE914, 0x7CA56A17, 0x6FF599E3, 0x9D9E1AE0, - 0xD3D3E1AB, 0x21B862A8, 0x32E8915C, 0xC083125F, - 0x144976B4, 0xE622F5B7, 0xF5720643, 0x07198540, - 0x590AB964,
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
Merlin Moncure mmonc...@gmail.com wrote: On Fri, Dec 26, 2014 at 12:38 PM, Kevin Grittner kgri...@ymail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: Just for starters, a 40XXX error report will fail to provide the duplicated key's value. This will be a functional regression, Not if, as is normally the case, the transaction is retried from the beginning on a serialization failure. Either the code will check for a duplicate (as in the case of the OP on this thread) and they won't see the error, *or* the the transaction which created the duplicate key will have committed before the start of the retry and you will get the duplicate key error. I'm not buying that; that argument assumes duplicate key errors are always 'upsert' driven. Although OP's code may have checked for duplicates it's perfectly reasonable (and in many cases preferable) to force the transaction to fail and report the error directly back to the application. The application will then switch on the error code and decide what to do: retry for deadlock/serialization or abort for data integrity error. IOW, the error handling semantics are fundamentally different and should not be mixed. I think you might be agreeing with me without realizing it. Right now you get duplicate key error even if the duplication is caused by a concurrent transaction -- it is not possible to check the error code (well, SQLSTATE, technically) to determine whether this is fundamentally a serialization problem. What we're talking about is returning the serialization failure return code for the cases where it is a concurrent transaction causing the failure and continuing to return the duplicate key error for all other cases. Either I'm not understanding what you wrote above, or you seem to be arguing for being able to distinguish between errors caused by concurrent transactions and those which aren't. -- Kevin GrittnerEDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
On Mon, Dec 29, 2014 at 8:03 AM, Kevin Grittner kgri...@ymail.com wrote: Merlin Moncure mmonc...@gmail.com wrote: On Fri, Dec 26, 2014 at 12:38 PM, Kevin Grittner kgri...@ymail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: Just for starters, a 40XXX error report will fail to provide the duplicated key's value. This will be a functional regression, Not if, as is normally the case, the transaction is retried from the beginning on a serialization failure. Either the code will check for a duplicate (as in the case of the OP on this thread) and they won't see the error, *or* the the transaction which created the duplicate key will have committed before the start of the retry and you will get the duplicate key error. I'm not buying that; that argument assumes duplicate key errors are always 'upsert' driven. Although OP's code may have checked for duplicates it's perfectly reasonable (and in many cases preferable) to force the transaction to fail and report the error directly back to the application. The application will then switch on the error code and decide what to do: retry for deadlock/serialization or abort for data integrity error. IOW, the error handling semantics are fundamentally different and should not be mixed. I think you might be agreeing with me without realizing it. Right now you get duplicate key error even if the duplication is caused by a concurrent transaction -- it is not possible to check the error code (well, SQLSTATE, technically) to determine whether this is fundamentally a serialization problem. What we're talking about is returning the serialization failure return code for the cases where it is a concurrent transaction causing the failure and continuing to return the duplicate key error for all other cases. Either I'm not understanding what you wrote above, or you seem to be arguing for being able to distinguish between errors caused by concurrent transactions and those which aren't. Well, I'm arguing that duplicate key errors are not serialization failures unless it's likely the insertion would succeed upon a retry; a proper insert, not an upsert. If that's the case with what you're proposing, then it makes sense to me. But that's not what it sounds like...your language suggests AIUI that having the error simply be caused by another transaction being concurrent would be sufficient to switch to a serialization error (feel free to correct me if I'm wrong!). In other words, the current behavior is: txn A,B begin txn A inserts txn B inserts over A, locks, waits txn A commits. B aborts with duplicate key error Assuming that case is untouched, then we're good! My long winded point above is that case must fail with duplicate key error; a serialization error is suggesting the transaction should be retried and it shouldn't be...it would simply fail a second time. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
2014-12-29 14:38 GMT+07:00 Jeff Davis pg...@j-davis.com: Just jumping into this patch now. Do we think this is worth changing the signature of functions in array.h, which might be used from a lot of third-party code? We might want to provide new functions to avoid a breaking change. V6 patch from Tomas only change initArrayResult* functions. initArrayResult is new API in 9.5 (commit bac2739), with old API still works as-is. -- Ali Akbar
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
Merlin Moncure mmonc...@gmail.com wrote: Well, I'm arguing that duplicate key errors are not serialization failures unless it's likely the insertion would succeed upon a retry; a proper insert, not an upsert. If that's the case with what you're proposing, then it makes sense to me. But that's not what it sounds like...your language suggests AIUI that having the error simply be caused by another transaction being concurrent would be sufficient to switch to a serialization error (feel free to correct me if I'm wrong!). In other words, the current behavior is: txn A,B begin txn A inserts txn B inserts over A, locks, waits txn A commits. B aborts with duplicate key error Assuming that case is untouched, then we're good! My long winded point above is that case must fail with duplicate key error; a serialization error is suggesting the transaction should be retried and it shouldn't be...it would simply fail a second time. What I'm proposing is that for serializable transactions B would get a serialization failure; otherwise B would get a duplicate key error. If the retry of B looks at something in the database to determine what it's primary key should be it will get a new value on the retry, since it will be starting after the commit of A. If it is using a literal key, not based on something changed by A, it will get a duplicate key error on the retry, since it will be starting after the commit of A. It will either succeed on retry or get an error for a different reason. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serialization exception : Who else was involved?
Craig Ringer cr...@2ndquadrant.com wrote: I don't see how that'd necessarily correctly identify the query/queries in the other tx that're involved, though. Perhaps I'm thinking in terms of more complicated serialization failures? Yeah, it might be possible to provide useful information about specific conflicting queries in some cases, but I'm skeptical that it would be available in the majority of cases. In most cases you can probably identify one or two other transactions that are involved. In at least some cases you won't even have that. For one fun case to think about, see this example where a read-only transaction fails with on conflict with two already-committed transactions: https://wiki.postgresql.org/wiki/SSI#Rollover Also consider when there is a long-running transactions and SSI falls back to SLRU summarization. If we can find a way to provide some useful information in some cases without harming performance, that's fine as long as nobody expects that it will be available in all cases. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
On Mon, Dec 29, 2014 at 9:09 AM, Kevin Grittner kgri...@ymail.com wrote: Merlin Moncure mmonc...@gmail.com wrote: In other words, the current behavior is: txn A,B begin txn A inserts txn B inserts over A, locks, waits txn A commits. B aborts with duplicate key error What I'm proposing is that for serializable transactions B would get a serialization failure; otherwise B would get a duplicate key error. If the retry of B looks at something in the database to determine what it's primary key should be it will get a new value on the retry, since it will be starting after the commit of A. If it is using a literal key, not based on something changed by A, it will get a duplicate key error on the retry, since it will be starting after the commit of A. It will either succeed on retry or get an error for a different reason. In that case: we don't agree. How come duplicate key errors would be reported as serialization failures but not RI errors (for example, inserting a record pointing to another record which a concurrent transaction deleted)? IMO, serialization errors are an implementation artifact and should not mask well defined errors in SQL under any circumstances (or if they must, those cases should absolutely be as narrow as possible). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
On Mon, Dec 29, 2014 at 3:31 PM, Merlin Moncure mmonc...@gmail.com wrote: In that case: we don't agree. How come duplicate key errors would be reported as serialization failures but not RI errors (for example, inserting a record pointing to another record which a concurrent transaction deleted)? The key question is not the type of error but whether the transaction saw another state previously. That is, if you select to check for duplicate keys, don't see any, and then try to insert and get a duplicate key error that would be easy to argue is a serialization error. The same could be true for an RI check -- if you select some data and then insert a record that refers to the data you already verified existed and get an RI failure then you could consider that a serialization failure. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Publish autovacuum informations
Hey, There are times where I would need more informations on the autovacuum processes. I'd love to know what each worker is currently doing. I can get something like this from the pg_stat_activity view but it doesn't give me as much informations as the WorkerInfoData struct. I'd also love to have more informations on the contents of the tables list (how many tables still to process, which table next, what kind of processing they'll get, etc... kinda what you have in the autovac_table struct). All in all, I want to get informations that are typically stored in shared memory, handled by the autovacuum launcher and autovacuum workers. I first thought I could get that by writing some C functions embedded in an extension. But it doesn't seem to me I can access this part of the shared memory from a C function. If I'm wrong, I'd love to get a pointer on how to do this. Otherwise, I wonder what would be more welcome: making the shared memory structs handles available outside of the autovacuum processes (and then build an extension to decode the informations I need), or adding functions in core to get access to this information (in that case, no need for an extension)? Thanks. Regards. -- Guillaume. http://blog.guillaume.lelarge.info http://www.dalibo.com
Re: [HACKERS] Publish autovacuum informations
Guillaume Lelarge guilla...@lelarge.info writes: All in all, I want to get informations that are typically stored in shared memory, handled by the autovacuum launcher and autovacuum workers. I first thought I could get that by writing some C functions embedded in an extension. But it doesn't seem to me I can access this part of the shared memory from a C function. If I'm wrong, I'd love to get a pointer on how to do this. Otherwise, I wonder what would be more welcome: making the shared memory structs handles available outside of the autovacuum processes (and then build an extension to decode the informations I need), or adding functions in core to get access to this information (in that case, no need for an extension)? Either one of those approaches would cripple our freedom to change those data structures; which we've done repeatedly in the past and will surely want to do again. So I'm pretty much -1 on exposing them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish autovacuum informations
2014-12-29 17:03 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: Guillaume Lelarge guilla...@lelarge.info writes: All in all, I want to get informations that are typically stored in shared memory, handled by the autovacuum launcher and autovacuum workers. I first thought I could get that by writing some C functions embedded in an extension. But it doesn't seem to me I can access this part of the shared memory from a C function. If I'm wrong, I'd love to get a pointer on how to do this. Otherwise, I wonder what would be more welcome: making the shared memory structs handles available outside of the autovacuum processes (and then build an extension to decode the informations I need), or adding functions in core to get access to this information (in that case, no need for an extension)? Either one of those approaches would cripple our freedom to change those data structures; which we've done repeatedly in the past and will surely want to do again. So I'm pretty much -1 on exposing them. I don't see how that's going to deny us the right to change any structs. If they are in-core functions, we'll just have to update them. If they are extension functions, then the developer of those functions would simply need to update his code. -- Guillaume. http://blog.guillaume.lelarge.info http://www.dalibo.com
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
On Mon, Dec 29, 2014 at 9:44 AM, Greg Stark st...@mit.edu wrote: On Mon, Dec 29, 2014 at 3:31 PM, Merlin Moncure mmonc...@gmail.com wrote: In that case: we don't agree. How come duplicate key errors would be reported as serialization failures but not RI errors (for example, inserting a record pointing to another record which a concurrent transaction deleted)? The key question is not the type of error but whether the transaction saw another state previously. [combining replies -- nikita, better not to top-post (FYI)] How is that relevant? Serialization errors only exist as a concession to concurrency and performance. Again, they should be returned as sparsely as possible because they provide absolutely (as Tom pointed out) zero detail to the application. The precise definition of the error is up to us, but I'd rather keep it to it's current rather specific semantics. On Mon, Dec 29, 2014 at 9:48 AM, Nikita Volkov nikita.y.vol...@mail.ru wrote: I believe, the objections expressed in this thread miss a very important point of all this: the isolation property (the I in ACID) is violated. Here’s a quote from the Wikipedia article on ACID: The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e., one after the other. The original example proves that it's violated. Such behaviour can neither be expected by a user, nor is even mentioned anywhere. Instead in the first paragraph of the “About” section of the Postgres site it states: It is fully ACID compliant Which is basically just a lie, until issues like this one get dealt with. That's simply untrue: inconvenience != acid violation Transaction levels provide certain guarantees regarding the state of the data in the presence of concurrent overlapping operations. They do not define the mechanism of failure or how/when those failures should occur. To prove your statement, you need to demonstrate how a transaction left the database in a bad state given concurrent activity without counting failures. Postgres can, and does, for example, return concurrency type errors more aggressively than it needs to for the 'repeatable read', level. Point being, this is completely ok as database implementation is free to understand that, just as it's free to define precisely how and when it fails given concurrency as long as those guarantees are provided. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
Merlin Moncure mmonc...@gmail.com wrote: Serialization errors only exist as a concession to concurrency and performance. Again, they should be returned as sparsely as possible because they provide absolutely (as Tom pointed out) zero detail to the application. That is false. They provide an *extremely* valuable piece of information which is not currently available when you get a duplicate key error -- whether the error occurred because of a race condition and will not fail for the same cause if retried. The precise definition of the error is up to us, but I'd rather keep it to it's current rather specific semantics. The semantics are so imprecise that Tom argued that we should document that transactions should be retried from the start when you get the duplicate key error, since it *might* have been caused by a race condition. I'm curious how heavily you use serializable transactions, because I have trouble believing that those who rely on them as their primary (or only) strategy for dealing with race conditions under high concurrency would take that position. As for the fact that RI violations also don't return a serialization failure when caused by a race with concurrent transactions, I view that as another weakness in PostgreSQL. I don't think there is a problem curing one without curing the other at the same time. I have known of people writing their own triggers to enforce RI rather than defining FKs precisely so that they can get a serialization failure return code and do automatic retry if it is caused by a race condition. That's less practical to compensate for when it comes to unique indexes or constraints. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
[combining replies -- nikita, better not to top-post (FYI)] I'm sorry. I don't know what you mean. I just replied to an email. To prove your statement, you need to demonstrate how a transaction left the database in a bad state given concurrent activity without counting failures. 1. Transaction A looks up a row by ID 1 and gets an empty result. 2. Concurrent transaction B inserts a row with ID 1. 3. Transaction A goes on with the presumption that a row with ID 1 does not exist, because a transaction is supposed to be isolated and because it has made sure that the row does not exist. With this presumption it confidently inserts a row with ID 1 only to get Postgres report a duplicate key. Wat? 2014-12-29 19:17 GMT+03:00 Merlin Moncure mmonc...@gmail.com: On Mon, Dec 29, 2014 at 9:44 AM, Greg Stark st...@mit.edu wrote: On Mon, Dec 29, 2014 at 3:31 PM, Merlin Moncure mmonc...@gmail.com wrote: In that case: we don't agree. How come duplicate key errors would be reported as serialization failures but not RI errors (for example, inserting a record pointing to another record which a concurrent transaction deleted)? The key question is not the type of error but whether the transaction saw another state previously. [combining replies -- nikita, better not to top-post (FYI)] How is that relevant? Serialization errors only exist as a concession to concurrency and performance. Again, they should be returned as sparsely as possible because they provide absolutely (as Tom pointed out) zero detail to the application. The precise definition of the error is up to us, but I'd rather keep it to it's current rather specific semantics. On Mon, Dec 29, 2014 at 9:48 AM, Nikita Volkov nikita.y.vol...@mail.ru wrote: I believe, the objections expressed in this thread miss a very important point of all this: the isolation property (the I in ACID) is violated. Here’s a quote from the Wikipedia article on ACID: The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e., one after the other. The original example proves that it's violated. Such behaviour can neither be expected by a user, nor is even mentioned anywhere. Instead in the first paragraph of the “About” section of the Postgres site it states: It is fully ACID compliant Which is basically just a lie, until issues like this one get dealt with. That's simply untrue: inconvenience != acid violation Transaction levels provide certain guarantees regarding the state of the data in the presence of concurrent overlapping operations. They do not define the mechanism of failure or how/when those failures should occur. To prove your statement, you need to demonstrate how a transaction left the database in a bad state given concurrent activity without counting failures. Postgres can, and does, for example, return concurrency type errors more aggressively than it needs to for the 'repeatable read', level. Point being, this is completely ok as database implementation is free to understand that, just as it's free to define precisely how and when it fails given concurrency as long as those guarantees are provided. merlin
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
I believe, the objections expressed in this thread miss a very important point of all this: the isolation property (the I in ACID) is violated. Here’s a quote from the Wikipedia article on ACID http://en.wikipedia.org/wiki/ACID: The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e., one after the other. The original example proves that it's violated. Such behaviour can neither be expected by a user, nor is even mentioned anywhere. Instead in the first paragraph of the “About” section of the Postgres site http://www.postgresql.org/about/ it states: It is fully ACID compliant Which is basically just a lie, until issues like this one get dealt with. 2014-12-29 18:31 GMT+03:00 Merlin Moncure mmonc...@gmail.com: On Mon, Dec 29, 2014 at 9:09 AM, Kevin Grittner kgri...@ymail.com wrote: Merlin Moncure mmonc...@gmail.com wrote: In other words, the current behavior is: txn A,B begin txn A inserts txn B inserts over A, locks, waits txn A commits. B aborts with duplicate key error What I'm proposing is that for serializable transactions B would get a serialization failure; otherwise B would get a duplicate key error. If the retry of B looks at something in the database to determine what it's primary key should be it will get a new value on the retry, since it will be starting after the commit of A. If it is using a literal key, not based on something changed by A, it will get a duplicate key error on the retry, since it will be starting after the commit of A. It will either succeed on retry or get an error for a different reason. In that case: we don't agree. How come duplicate key errors would be reported as serialization failures but not RI errors (for example, inserting a record pointing to another record which a concurrent transaction deleted)? IMO, serialization errors are an implementation artifact and should not mask well defined errors in SQL under any circumstances (or if they must, those cases should absolutely be as narrow as possible). merlin
Re: [HACKERS] replicating DROP commands across servers
Alvaro Herrera wrote: Patch 0005 adds getObjectIdentityParts(), which returns the object identity in arrays that can be passed to pg_get_object_address. This part needs slight revisions so I'm not sure I will be able to push tomorrow. Here's a fresh version of this patch. I chose to add a SQL-accessible version, pg_identify_object_as_address, to make it easier to test. In doing so I noticed a couple of bugs, and most interestingly I noticed that it was essentially impossible to cleanly address an array type; doing a roundtrip through the new functions would get me the base type when I used integer[] but the array type when I used _int4. This looked like a problem, so I traced through it and noticed that we're using the type name *list* as a list, rather than as a TypeName, to refer to OBJECT_TYPE and OBJECT_DOMAIN; I hadn't understood the significance of this until I realized that domains would be represented with arrayBounds set to a non-empty list for the integer[] syntax, but the name list would have pg_catalog integer only; when the rest of TypeName was discarded, the fact that we were talking about an array was completely forgotten. Before the dawn of time we had this: -static void -CommentType(List *typename, char *comment) -{ - TypeName *tname; - Oid oid; - - /* XXX a bit of a crock; should accept TypeName in COMMENT syntax */ - tname = makeTypeNameFromNameList(typename); where the XXX comment was removed by commit c10575ff005c330d047534562 without a corresponding comment in the new function. I'm going to see about changing the grammar to get this fixed; this patch is important because it will enable us to complete^Wcontinue working on the DDL deparse testing framework. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
(The changes in the regression test are bogus, BTW; I didn't care enough to get them fixed before sorting out the rest of the mess.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services commit 89c8cbed0072ad4d921128b834fcb4f9e2eb4c33 Author: Alvaro Herrera alvhe...@alvh.no-ip.org Date: Mon Dec 22 18:32:43 2014 -0300 array objname/objargs stuff diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 24c64b7..112b6a0 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -17772,6 +17772,23 @@ FOR EACH ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger(); identifier present in the identity is quoted if necessary. /entry /row + row +entryliteraladdress_names/literal/entry +entrytypetext[]/type/entry +entry + An array that, together with literaladdress_args/literal, + can be used by the functionpg_get_object_address()/function to + recreate the object address in a remote server containing an + identically named object of the same kind. +/entry + /row + row +entryliteraladdress_args/literal/entry +entrytypetext[]/type/entry +entry + Complement for literaladdress_names/literal above. +/entry + /row /tbody /tgroup /informaltable diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c index 85079d6..789af5f 100644 --- a/src/backend/catalog/objectaddress.c +++ b/src/backend/catalog/objectaddress.c @@ -74,6 +74,7 @@ #include utils/builtins.h #include utils/fmgroids.h #include utils/lsyscache.h +#include utils/memutils.h #include utils/syscache.h #include utils/tqual.h @@ -556,8 +557,9 @@ static void getRelationTypeDescription(StringInfo buffer, Oid relid, int32 objectSubId); static void getProcedureTypeDescription(StringInfo buffer, Oid procid); static void getConstraintTypeDescription(StringInfo buffer, Oid constroid); -static void getOpFamilyIdentity(StringInfo buffer, Oid opfid); -static void getRelationIdentity(StringInfo buffer, Oid relid); +static void getOpFamilyIdentity(StringInfo buffer, Oid opfid, List **objname, + List **objargs); +static void getRelationIdentity(StringInfo buffer, Oid relid, List **objname); /* * Translate an object name and arguments (as passed by the parser) to an @@ -2960,6 +2962,66 @@ pg_identify_object(PG_FUNCTION_ARGS) } /* + * SQL-level callable function to obtain object type + identity + */ +Datum +pg_identify_object_as_address(PG_FUNCTION_ARGS) +{ + Oid classid = PG_GETARG_OID(0); + Oid objid = PG_GETARG_OID(1); + int32 subobjid = PG_GETARG_INT32(2); + ObjectAddress address; + char *identity; + List *names; + List *args; + Datum values[3]; + bool nulls[3]; + TupleDesc tupdesc; + HeapTuple htup; + + address.classId = classid; + address.objectId = objid; + address.objectSubId = subobjid; + + /* + * Construct a tuple descriptor for the result row. This must match this + * function's pg_proc entry! + */ + tupdesc = CreateTemplateTupleDesc(3, false); + TupleDescInitEntry(tupdesc, (AttrNumber) 1, type, + TEXTOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 2, object_names, + TEXTARRAYOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 3, object_args, + TEXTARRAYOID, -1, 0); + + tupdesc = BlessTupleDesc(tupdesc); + + /* object type */ + values[0] = CStringGetTextDatum(getObjectTypeDescription(address)); + nulls[0] = false; + + /* object identity */ + identity = getObjectIdentityParts(address, names, args); + pfree(identity); + + /* object_names */ + values[1] = PointerGetDatum(strlist_to_textarray(names)); + nulls[1] = false; + + /* object_args */ + if (args) + values[2] = PointerGetDatum(strlist_to_textarray(args)); + else + values[2] = PointerGetDatum(construct_empty_array(TEXTOID)); + nulls[2] = false; + + htup = heap_form_tuple(tupdesc, values, nulls); + + PG_RETURN_DATUM(HeapTupleGetDatum(htup)); +} + +/* * Return a palloc'ed string that describes the type of object that the * passed address is for. * @@ -3215,7 +3277,7 @@ getProcedureTypeDescription(StringInfo buffer, Oid procid) } /* - * Return a palloc'ed string that identifies an object. + * Obtain a given object's identity, as a palloc'ed string. * * This is for machine consumption, so it's not translated. All elements are * schema-qualified when appropriate. @@ -3223,14 +3285,42 @@ getProcedureTypeDescription(StringInfo buffer, Oid procid) char * getObjectIdentity(const ObjectAddress *object) { + return getObjectIdentityParts(object, NULL, NULL); +} + +/* + * As above, but more detailed. + * + * There are two sets of return values: the identity itself as a palloc'd + * string is returned. objname and objargs, if not NULL, are output parameters + * that receive lists of C-strings that are useful to give
Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK
Adrian Klaver adrian.kla...@aklaver.com writes: So it seems you can turn ON_ERROR_ROLLBACK on with either 1 or 'on', but you can only turn it off with 'off'. With ON_ERROR_STOP 1/on and 0/off both seem to work. Is this expected? on_error_stop_hook() uses ParseVariableBool, while on_error_rollback_hook() uses some hard-coded logic that checks for interactive and off and otherwise assumes on. This is inconsistent first in not accepting as many spellings as ParseVariableBool, and second in not complaining about invalid input --- ParseVariableBool does, so why not here? echo_hook, echo_hidden_hook, histcontrol_hook, and verbosity_hook are equally cavalierly written. For on_error_rollback_hook and echo_hidden_hook, where on and off are documented values, I think it would make sense to allow all spellings accepted by ParseVariableBool, say like this: if (newval == NULL) pset.on_error_rollback = PSQL_ERROR_ROLLBACK_OFF; else if (pg_strcasecmp(newval, interactive) == 0) pset.on_error_rollback = PSQL_ERROR_ROLLBACK_INTERACTIVE; -else if (pg_strcasecmp(newval, off) == 0) -pset.on_error_rollback = PSQL_ERROR_ROLLBACK_OFF; -else -pset.on_error_rollback = PSQL_ERROR_ROLLBACK_ON; +else if (ParseVariableBool(newval)) +pset.on_error_rollback = PSQL_ERROR_ROLLBACK_ON; +else +pset.on_error_rollback = PSQL_ERROR_ROLLBACK_OFF; The other 3 functions don't need to accept on/off I think, but they should print a warning for unrecognized input like ParseVariableBool does. And while we're at it, we should consistently allow case-insensitive input; right now it looks like somebody rolled dice to decide whether to use strcmp or pg_strcasecmp in these functions. Per line, not even per function. Given the lack of previous complaints, this probably isn't backpatching material, but it sure seems like a bit of attention to consistency would be warranted here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_min_apply_delay with a negative value
On Sun, Dec 28, 2014 at 12:31 PM, Michael Paquier michael.paqu...@gmail.com wrote: Hi all, While reviewing another patch, I have noticed that recovery_min_apply_delay can have a negative value. And the funny part is that we actually attempt to apply a delay even in this case, per se this condition recoveryApplyDelay@xlog.c: /* nothing to do if no delay configured */ if (recovery_min_apply_delay == 0) return false; Shouldn't we simply leave if recovery_min_apply_delay is lower 0, and not only equal to 0? Seems reasonable. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
On Mon, Dec 29, 2014 at 10:47 AM, Nikita Volkov nikita.y.vol...@mail.ru wrote: [combining replies -- nikita, better not to top-post (FYI)] [combining replied again] I'm sorry. I don't know what you mean. I just replied to an email. http://www.idallen.com/topposting.html To prove your statement, you need to demonstrate how a transaction left the database in a bad state given concurrent activity without counting failures. 1. Transaction A looks up a row by ID 1 and gets an empty result. 2. Concurrent transaction B inserts a row with ID 1. 3. Transaction A goes on with the presumption that a row with ID 1 does not exist, because a transaction is supposed to be isolated and because it has made sure that the row does not exist. With this presumption it confidently inserts a row with ID 1 only to get Postgres report a duplicate key. Wat? Your understanding of isolation is incorrect. Transaction A does not go on with anything -- it's guaranteed to fail in this case. The only debatable point here is how exactly it fails. Again, isolation's job is to protect the data. On Mon, Dec 29, 2014 at 10:53 AM, Kevin Grittner kgri...@ymail.com wrote: The semantics are so imprecise that Tom argued that we should document that transactions should be retried from the start when you get the duplicate key error, since it *might* have been caused by a race condition. That sounds off to me also. In terms of a classic uniqueness constraint (say, a identifying user name), every violation is technically a race condition -- whether or not the transactions overlap on time is completely irrelevant. If the transactions touching off the error happen to overlap or not is an accident of timing and irrelevant; a serialization error suggests that the transaction should be retried when in fact it shouldn't be, particularly just to get the *actual* error. What if the transaction is non-trivial? Why do we want to bother our users about those details at all? Consider the 'idiomatic upsert' as it exists in the documentation (!): LOOP -- first try to update the key UPDATE db SET b = data WHERE a = key; IF found THEN RETURN; END IF; -- XXX merlin's note: if any dependent table throws a UV, -- say, via a trigger, this code will loop endlessly -- not there, so try to insert the key -- if someone else inserts the same key concurrently, -- we could get a unique-key failure BEGIN INSERT INTO db(a,b) VALUES (key, data); RETURN; EXCEPTION WHEN unique_violation THEN -- do nothing, and loop to try the UPDATE again END; END LOOP; By changing the error code, for decades worth of dealing with this problem, you've just converted a server side loop to a full round trip, and, if the user does not automatically retry serialization failures, broken his/her code. It's impossible to fix the round trip issue, at least provably, because there is no way to know for sure that the serialization failure is coming from this exact insertion, or say, a dependent trigger (aside: the idiomatic example aught to be checking the table name!) such that your loop (either here or from application) would execute a bazillion times until some other transaction clears. OK, this is a mostly academic detail, but the picture is not so clear as you're saying, I think; you're travelling at high speed in uncertain waters. The key point here is that OP issued a SELECT first, and he's chaining DML decisions to the output of that select. He's expecting that SELECT to be protected via ACID, but it isn't and can't be unless you're prepared to predicate lock every row selected. What he wants is for the database to bounce his transaction because the select lied to him, but that can't be done obviously. I'm curious how heavily you use serializable transactions, because I have trouble believing that those who rely on them as their primary (or only) strategy for dealing with race conditions under high concurrency would take that position. I don't use them much, admittedly. That said, I don't use them as race condition guards. I use locks or other techniques to manage the problem. I tend to build out applications on top of functions and the inability to set isolation mode inside a function confounds me from using anything but 'read committed'. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK
On 12/29/2014 09:55 AM, Tom Lane wrote: Adrian Klaver adrian.kla...@aklaver.com writes: So it seems you can turn ON_ERROR_ROLLBACK on with either 1 or 'on', but you can only turn it off with 'off'. With ON_ERROR_STOP 1/on and 0/off both seem to work. Is this expected? Given the lack of previous complaints, this probably isn't backpatching material, but it sure seems like a bit of attention to consistency would be warranted here. I would appreciate it, thanks. regards, tom lane -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
Magnus, Thanks for the feedback. BACKUP - allows role to perform pg_dump* backups of whole database. I'd suggest it's called DUMP if that's what it allows, to keep it separate from the backup parts. Makes sense to me. That seems really bad names, IMHO. Why? Because we use WAL and XLOG throughout documentation and parameters and code to mean *the same thing*. And here they'd suddenly mean different things. If we need them as separate privileges, I think we need much better names. (And a better description - what is xlog operations really?) Fair enough, ultimately what I was trying to address is the following concern raised by Alvaro: To me, what this repeated discussion on this particular BACKUP point says, is that the ability to run pg_start/stop_backend and the xlog related functions should be a different privilege, i.e. something other than BACKUP; because later we will want the ability to grant someone the ability to run pg_dump on the whole database without being superuser, and we will want to use the name BACKUP for that. So I'm inclined to propose something more specific for this like WAL_CONTROL or XLOG_OPERATOR, say. Upon re-reading it (and other discussions around it) I believe that I must have misinterpreted. Initially, I read it to mean that we needed the following separate permissions. 1) ability to pg_dump 2) ability to start/stop backups 3) ability to execute xlog related functions When indeed, what it meant was to have the following separate (effectively merging #2 and #3): 1) ability to pg_dump 2) ability to start/stop backups *and* ability to execute xlog related functions. My apologies on that confusion. Given this clarification: I think #1 could certainly be answered by using DUMP. I have no strong opinion in either direction, though I do think that BACKUP does make the most sense for #2. Previously, Stephen had mentioned a READONLY capability that could effectively work for pg_dump, though, Jim's suggestion of keeping 'read-all' separate from 'ability to pg_dump' seems logical. In either case, I certainly wouldn't mind having a wider agreement/consensus on this approach. So, here is a revised list of proposed attributes: * BACKUP - allows role to perform backup operations (as originally proposed) * LOG - allows role to rotate log files - remains broad enough to consider future log related operations * DUMP - allows role to perform pg_dump* backups of whole database * MONITOR - allows role to view pg_stat_* details (as originally proposed) * PROCSIGNAL - allows role to signal backend processes (as originally proposed)well Thanks, Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
Merlin Moncure mmonc...@gmail.com wrote: On Mon, Dec 29, 2014 at 10:53 AM, Kevin Grittner kgri...@ymail.com wrote: The semantics are so imprecise that Tom argued that we should document that transactions should be retried from the start when you get the duplicate key error, since it *might* have been caused by a race condition. That sounds off to me also. In terms of a classic uniqueness constraint (say, a identifying user name), every violation is technically a race condition -- whether or not the transactions overlap on time is completely irrelevant. That is completely bogus. The point is that if you return a serialization failure the transaction should be immediately retried from the beginning (including, in many cases, acquiring key values). If the error was caused by concurrent insertion of a duplicate key where the transaction does *not* acquire the value within the transaction, *then* you get the duplicate key error on the retry. If the transactions touching off the error happen to overlap or not is an accident of timing and irrelevant; a serialization error suggests that the transaction should be retried when in fact it shouldn't be, particularly just to get the *actual* error. What if the transaction is non-trivial? Why do we want to bother our users about those details at all? Where serializable transactions are used to manage race conditions the users typically do not see them. The application code does not see them. There is normally some sort of framework (possibly using dependency injection, although not necessarily) which catches these and retries the transaction from the start without notifying the user or letting the application software know that it happened. There is normally some server-side logging so that high volumes of rollbacks can be identified and fixed. In a real-world situation where this was used for 100 production databases running millions of transactions per day, I saw about 10 or 20 serialization failures per day across the whole set of database servers. While I have certainly heard of workloads where it didn't work out that well, Dan Ports found that many common benchmarks perform about that well. Quoting from the peer-reviewed paper presented in Istanbul[1]: | SSI outperforms S2PL for all transaction mixes, and does so by a | significant margin when the fraction of read-only transactions is | high. On these workloads, there are more rw-conflicts between | concurrent transactions, so locking imposes a larger performance | penalty. (The 100%-read-only workload is a special case; there are | no lock conflicts under S2PL, and SSI has no overhead because all | snapshots are safe.) The 150-warehouse configuration (Figure 5b ) | behaves similarly, but the differences are less pronounced: on this | disk-bound benchmark, CPU overhead is not a factor, and improved | concurrency has a limited benefit. Here, the performance of SSI | is indistinguishable from that of SI. Transactions rarely need to | be retried; in all cases, the serialization failure rate was under | 0.25%. Consider the 'idiomatic upsert' as it exists in the documentation (!): That documentation should probably be updated to indicate which isolation levels need that code. If you are relying on serializable transactions that dance is unnecessary and pointless. The rule when relying on serializable transactions is that you write the code to behave correctly if the transaction executing it is running by itself. Period. No special handling for race conditions. Detecting race conditions is the job of the database engine and retrying affected transactions is the job of the framework. Absolutely nobody who understands serializable transactions would use that idiom inside of serializable transactions. By changing the error code, for decades worth of dealing with this problem, you've just converted a server side loop to a full round trip, and, if the user does not automatically retry serialization failures, broken his/her code. If they are not automatically retrying on serialization failures, they should probably not be using serializable transactions. That is rather the point. No need for table locks. No need for SELECT FOR UPDATE. No need to special-case for concurrent transactions. Just do it. Let the framework retry as needed. I have no problem with there being people who choose not to use this approach. What I'm asking is that they not insist on PostgreSQL being needlessly crippled for those who *do* use it this way. It's impossible to fix the round trip issue, at least provably, because there is no way to know for sure that the serialization failure is coming from this exact insertion, or say, a dependent trigger (aside: the idiomatic example aught to be checking the table name!) such that your loop (either here or from application) would execute a bazillion times until some other transaction clears. Until the inserting transaction commits, it would block the other insertion
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On Sat, Dec 27, 2014 at 08:05:42PM -0500, Robert Haas wrote: On Wed, Dec 24, 2014 at 11:20 AM, Andres Freund and...@2ndquadrant.com wrote: I just verified that I can still reproduce the problem: # aligned case (max_connections=401) afreund@axle:~$ pgbench -P 1 -h /tmp/ -p5440 postgres -n -M prepared -c 96 -j 96 -T 100 -S progress: 1.0 s, 405170.2 tps, lat 0.195 ms stddev 0.928 progress: 2.0 s, 467011.1 tps, lat 0.204 ms stddev 0.140 progress: 3.0 s, 462832.1 tps, lat 0.205 ms stddev 0.154 progress: 4.0 s, 471035.5 tps, lat 0.202 ms stddev 0.154 progress: 5.0 s, 500329.0 tps, lat 0.190 ms stddev 0.132 BufferDescriptors is at 0x7f63610a6960 (which is 32byte aligned) # unaligned case (max_connections=400) afreund@axle:~$ pgbench -P 1 -h /tmp/ -p5440 postgres -n -M prepared -c 96 -j 96 -T 100 -S progress: 1.0 s, 202271.1 tps, lat 0.448 ms stddev 1.232 progress: 2.0 s, 223823.4 tps, lat 0.427 ms stddev 3.007 progress: 3.0 s, 227584.5 tps, lat 0.414 ms stddev 4.760 progress: 4.0 s, 221095.6 tps, lat 0.410 ms stddev 4.390 progress: 5.0 s, 217430.6 tps, lat 0.454 ms stddev 7.913 progress: 6.0 s, 210275.9 tps, lat 0.411 ms stddev 0.606 BufferDescriptors is at 0x7f1718aeb980 (which is 64byte aligned) So, should we increase ALIGNOF_BUFFER from 32 to 64? Seems like that's what these results are telling us. I am glad someone else considers this important. Andres reported the above 2x pgbench difference in February, but no action was taken as everyone felt there needed to be more performance testing, but it never happened: http://www.postgresql.org/message-id/20140202151319.gd32...@awork2.anarazel.de I have now performance tested this by developing the attached two patches which both increase the Buffer Descriptors allocation by 64 bytes. The first patch causes each 64-byte Buffer Descriptor struct to align on a 32-byte boundary but not a 64-byte boundary, while the second patch aligns it with a 64-byte boundary. I tried many tests, including this: $ pgbench --initialize --scale 1 pgbench $ pgbench --protocol prepared --client 16 --jobs 16 --transactions 10 --select-only pgbench I cannot measure any difference on my dual-CPU-socket, 16-vcore server (http://momjian.us/main/blogs/pgblog/2012.html#January_20_2012). I thought this test would cause the most Buffer Descriptor contention between the two CPUs. Can anyone else see a difference when testing these two patches? (The patch reports alignment in the server logs.) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c new file mode 100644 index ff6c713..c4dce5b *** a/src/backend/storage/buffer/buf_init.c --- b/src/backend/storage/buffer/buf_init.c *** InitBufferPool(void) *** 67,72 --- 67,73 bool foundBufs, foundDescs; + fprintf(stderr, Buffer Descriptors size = %ld\n, sizeof(BufferDesc)); BufferDescriptors = (BufferDesc *) ShmemInitStruct(Buffer Descriptors, NBuffers * sizeof(BufferDesc), foundDescs); diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c new file mode 100644 index 2ea2216..669c07f *** a/src/backend/storage/ipc/shmem.c --- b/src/backend/storage/ipc/shmem.c *** ShmemInitStruct(const char *name, Size s *** 327,332 --- 327,335 ShmemIndexEnt *result; void *structPtr; + if (strcmp(name, Buffer Descriptors) == 0) + size = BUFFERALIGN(size) + 64; + LWLockAcquire(ShmemIndexLock, LW_EXCLUSIVE); if (!ShmemIndex) *** ShmemInitStruct(const char *name, Size s *** 413,418 --- 416,432 \%s\ (%zu bytes requested), name, size))); } + if (strcmp(name, Buffer Descriptors) == 0) + { + /* align on 32 */ + if ((int64)structPtr % 32 != 0) + structPtr = (void *)((int64)structPtr + 32 - (int64)structPtr % 32); + /* align on 32 but not 64 */ + if ((int64)structPtr % 64 == 0) + structPtr = (void *)((int64)structPtr + 32); + } + fprintf(stderr, shared memory alignment of %s: %ld-byte\n, name, + (int64)structPtr % 64 == 0 ? 64 : (int64)structPtr % 64); result-size = size; result-location = structPtr; } diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c new file mode 100644 index ff6c713..c4dce5b *** a/src/backend/storage/buffer/buf_init.c --- b/src/backend/storage/buffer/buf_init.c *** InitBufferPool(void) *** 67,72 --- 67,73 bool foundBufs, foundDescs; + fprintf(stderr, Buffer Descriptors size = %ld\n, sizeof(BufferDesc)); BufferDescriptors = (BufferDesc *) ShmemInitStruct(Buffer Descriptors, NBuffers * sizeof(BufferDesc), foundDescs); diff --git a/src/backend/storage/ipc/shmem.c
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
[errata] Kevin Grittner kgri...@ymail.com wrote: Quoting from the peer-reviewed paper presented in Istanbul[1]: That should have been [3], not [1]. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
Adam, * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote: I'd suggest it's called DUMP if that's what it allows, to keep it separate from the backup parts. Makes sense to me. I'm fine calling it 'DUMP', but for different reasons. We have no (verifiable) idea what client program is being used to connect and therefore we shouldn't try to tie the name of the client program to the permission. That said, a 'DUMP' privilege which allows the user to dump the contents of the entire database is entirely reasonable. We need to be clear in the documentation though- such a 'DUMP' privilege is essentially granting USAGE on all schemas and table-level SELECT on all tables and sequences (anything else..?). To be clear, a user with this privilege can utilize that access without using pg_dump. One other point is that this shouldn't imply any other privileges, imv. I'm specifically thinking of BYPASSRLS- that's independently grantable and therefore should be explicitly set, if it's intended. Things should work 'sanely' with any combination of the two options. Similairly, DUMP shouldn't imply BACKUP or visa-versa. In particular, I'd like to see roles which have only the BACKUP privilege be unable to directly read any data (modulo things granted to PUBLIC). This would allow an unprivileged user to manage backups, kick off ad-hoc ones, etc, without being able to actually access any of the data (this would require the actual backup system to have a similar control, but that's entirely possible with more advanced SANs and enterprise backup solutions). That seems really bad names, IMHO. Why? Because we use WAL and XLOG throughout documentation and parameters and code to mean *the same thing*. And here they'd suddenly mean different things. If we need them as separate privileges, I think we need much better names. (And a better description - what is xlog operations really?) Fair enough, ultimately what I was trying to address is the following concern raised by Alvaro: To me, what this repeated discussion on this particular BACKUP point says, is that the ability to run pg_start/stop_backend and the xlog related functions should be a different privilege, i.e. something other than BACKUP; because later we will want the ability to grant someone the ability to run pg_dump on the whole database without being superuser, and we will want to use the name BACKUP for that. So I'm inclined to propose something more specific for this like WAL_CONTROL or XLOG_OPERATOR, say. Note that the BACKUP role attribute was never intended to cover the pg_dump use-case. Simply the name of it caused confusion though. I'm not sure if adding a DUMP role attribute is sufficient enough to address that confusion, but I haven't got a better idea either. When indeed, what it meant was to have the following separate (effectively merging #2 and #3): 1) ability to pg_dump 2) ability to start/stop backups *and* ability to execute xlog related functions. That sounds reasonable to me (and is what was initially proposed, though I've come around to the thinking that this BACKUP role attribute should also allow pg_xlog_replay_pause/resume(), as those can be useful on replicas). Given this clarification: I think #1 could certainly be answered by using DUMP. I have no strong opinion in either direction, though I do think that BACKUP does make the most sense for #2. Previously, Stephen had mentioned a READONLY capability that could effectively work for pg_dump, though, Jim's suggestion of keeping 'read-all' separate from 'ability to pg_dump' seems logical. In either case, I certainly wouldn't mind having a wider agreement/consensus on this approach. The read-all vs. ability-to-pg_dump distinction doesn't really exist for role attributes, as I see it (see my comments above). That said, having DUMP or read-all is different from read-*only*, which would probably be good to have independently. I can imagine a use-case for a read-only account which only has read ability for those tables, schemas, etc, explicitly granted to it. There is one issue that occurs to me, however. We're talking about pg_dump, but what about pg_dumpall? In particular, I don't think the DUMP privilege should provide access to pg_authid, as that would allow the user to bypass the privilege system in some environments by using the hash to log in as a superuser. Now, I don't encourage using password based authentication, especially for superuser accounts, but lots of people do. The idea with these privileges is to allow certain operations to be performed by a non-superuser while preventing trivial access to superuser. Perhaps it's pie-in-the-sky, but my hope is to achieve that. So, here is a revised list of proposed attributes: * BACKUP - allows role to perform backup operations (as originally proposed) * LOG - allows role to rotate log files - remains broad enough to consider future log related operations
Re: [HACKERS] replicating DROP commands across servers
Here's a patch that tweaks the grammar to use TypeName in COMMENT, SECURITY LABEL, and DROP for the type and domain cases. The required changes in the code are pretty minimal, thankfully. Note the slight changes to the new object_address test also. I think I'm pretty much done with this now, so I intend to push this first thing tomorrow and then the rebased getObjectIdentityParts patch sometime later. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services commit 701a2b7cc3cc6dff865db64ecaeb2fd8bb07b396 Author: Alvaro Herrera alvhe...@alvh.no-ip.org Date: Mon Dec 29 18:55:59 2014 -0300 Use TypeName to represent type names in certain commands In COMMENT, DROP and SECURITY LABEL we were representing types as a list of names, rather than the TypeName struct that's used everywhere; this practice also recently found its way into the new pg_get_object_address. Right now it doesn't affect anything (other than some code cleanliness), but it is more problematic for future changes that operate with object addresses from the SQL interface; type details such as array-ness were being forgotten. diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c index 7cb46e1..9ca609d 100644 --- a/src/backend/catalog/objectaddress.c +++ b/src/backend/catalog/objectaddress.c @@ -646,13 +646,11 @@ get_object_address(ObjectType objtype, List *objname, List *objargs, break; case OBJECT_DOMCONSTRAINT: { - List *domname; ObjectAddress domaddr; char *constrname; - domname = list_truncate(list_copy(objname), list_length(objname) - 1); - constrname = strVal(llast(objname)); - domaddr = get_object_address_type(OBJECT_DOMAIN, domname, missing_ok); + domaddr = get_object_address_type(OBJECT_DOMAIN, objname, missing_ok); + constrname = strVal(linitial(objargs)); address.classId = ConstraintRelationId; address.objectId = get_domain_constraint_oid(domaddr.objectId, @@ -1291,14 +1289,13 @@ get_object_address_attrdef(ObjectType objtype, List *objname, * Find the ObjectAddress for a type or domain */ static ObjectAddress -get_object_address_type(ObjectType objtype, - List *objname, bool missing_ok) +get_object_address_type(ObjectType objtype, List *objname, bool missing_ok) { ObjectAddress address; TypeName *typename; Type tup; - typename = makeTypeNameFromNameList(objname); + typename = (TypeName *) linitial(objname); address.classId = TypeRelationId; address.objectId = InvalidOid; @@ -1428,27 +1425,8 @@ pg_get_object_address(PG_FUNCTION_ARGS) * given object type. Most use a simple string Values list, but there * are some exceptions. */ - if (type == OBJECT_TYPE || type == OBJECT_DOMAIN) - { - Datum *elems; - bool *nulls; - int nelems; - TypeName *typname; - - deconstruct_array(namearr, TEXTOID, -1, false, 'i', - elems, nulls, nelems); - if (nelems != 1) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg(name list length must be exactly %d, 1))); - if (nulls[0]) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg(name or argument lists may not contain nulls))); - typname = typeStringToTypeName(TextDatumGetCString(elems[0])); - name = typname-names; - } - else if (type == OBJECT_CAST) + if (type == OBJECT_TYPE || type == OBJECT_DOMAIN || type == OBJECT_CAST || + type == OBJECT_DOMCONSTRAINT) { Datum *elems; bool *nulls; @@ -1533,18 +1511,13 @@ pg_get_object_address(PG_FUNCTION_ARGS) */ switch (type) { - case OBJECT_DOMCONSTRAINT: - if (list_length(name) 2) -ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg(name list length must be at least %d, 2))); - break; case OBJECT_LARGEOBJECT: if (list_length(name) != 1) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg(name list length must be %d, 1))); + errmsg(name list length must be exactly %d, 1))); break; + case OBJECT_DOMCONSTRAINT: case OBJECT_OPCLASS: case OBJECT_OPFAMILY: case OBJECT_CAST: diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c index 8583581..21a2ae4 100644 --- a/src/backend/commands/dropcmds.c +++ b/src/backend/commands/dropcmds.c @@ -264,10 +264,14 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs) { case OBJECT_TYPE: case OBJECT_DOMAIN: - if (!schema_does_not_exist_skipping(objname, msg, name)) { -msg = gettext_noop(type \%s\ does not exist, skipping); -name = TypeNameToString(makeTypeNameFromNameList(objname)); +TypeName *typ = linitial(objname); + +if (!schema_does_not_exist_skipping(typ-names, msg, name)) +{ + msg = gettext_noop(type \%s\ does not exist, skipping); + name = TypeNameToString(typ); +} } break; case
Re: [HACKERS] nls and server log
On 12/28/14, 2:56 AM, Craig Ringer wrote: On 12/25/2014 02:35 AM, Euler Taveira wrote: Hi, Currently the same message goes to server log and client app. Sometimes it bothers me since I have to analyze server logs and discovered that lc_messages is set to pt_BR and to worse things that stup^H^H^H application parse some error messages in portuguese. IMO logging is simply broken for platforms where the postmaster and all DBs don't share an encoding. We mix different encodings in log messages and provide no way to separate them out. Nor is there a way to log different messages to different files. It's not just an issue with translations. We mix and mangle encodings of user-supplied text, like RAISE strings in procs, for example. We really need to be treating encoding for logging and for the client much more separately than we currently do. I think any consideration of translations for logging should be done with the underlying encoding issues in mind. Agreed. My personal opinion is that we should require the server log to be capable of representing all chars in the encodings used by any DB. Which in practice means that we always just log in utf-8 if the user wants to permit DBs with different encodings. An alternative would be one file per database, always in the encoding of that database. How much of this issue is caused by trying to machine-parse log files? Is a better option to improve that case, possibly doing something like including a field in each line that tells you the encoding for that entry? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
On 12/28/14, 2:45 PM, Peter Geoghegan wrote: On Sun, Dec 28, 2014 at 12:37 PM, Jeff Davis pg...@j-davis.com wrote: Do others have similar numbers? I'm quite surprised at how little work_mem seems to matter for these plans (HashJoin might be a different story though). I feel like I made a mistake -- can someone please do a sanity check on my numbers? I have seen external sorts that were quicker than internal sorts before. With my abbreviated key patch, under certain circumstances external sorts are faster, while presumably the same thing is true of int4 attribute sorts today. Actually, I saw a 10MB work_mem setting that was marginally faster than a multi-gigabyte one that fit the entire sort in memory. It probably has something to do with caching effects dominating over the expense of more comparisons, since higher work_mem settings that still resulted in an external sort were slower than the 10MB setting. I was surprised by this too, but it has been independently reported by Jeff Janes. I haven't tested for external faster than internal in a while, but I've certainly seen this effect before. Generally, once you get beyond a certain size (maybe 100MB?) you run the risk of a tapesort being faster than an internal sort. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
Hi Jeff, On Mon, Dec 29, 2014 at 2:29 PM, Jeff Janes jeff.ja...@gmail.com wrote: Using the vallock2 version of V1.8, using the test I previously described, I get some all-null rows, which my code should never create. Also, the index and table don't agree, in this example I find 3 all-null rows in the table, but only 2 in the index. I've attached an example output of querying via index and via full table scan, and also the pageinspect output of the blocks which have the 3 rows, in case that is helpful. Interesting. Thanks a lot for your help! This was just a straight forward issue of firing queries at the database, the crash-inducing part of my test harness was not active during this test. I also ran it with my crashing patch reversed out, in case I introduced the problem myself, and it still occurs. Using V1.7 of the vallock2 patch, I saw the same thing with some all-null rows. I also saw some other issues where two rows with the same key value would be present twice in the table (violating the unique constraint) but only one of them would appear in the index. I suspect it is caused by the same issue as the all-null rows, and maybe I just didn't run v1.8 enough times to find that particular manifestation under v1.8. This is almost certainly a latent bug with approach #2 to value locking, that has probably been there all along. Semantics and syntax have been a recent focus, and so the probability that I introduced a regression of this nature in any recent revision seems low. I am going to investigate the problem, and hope to have a diagnosis soon. Once again, thanks! -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On 2014-12-29 16:59:05 -0500, Bruce Momjian wrote: I am glad someone else considers this important. I do consider it important. I just considered the lwlock scalability more important... Andres reported the above 2x pgbench difference in February, but no action was taken as everyone felt there needed to be more performance testing, but it never happened: FWIW, I have no idea what exactly should be tested there. Right now I think what we should do is to remove the BUFFERALIGN from shmem.c and instead add explicit alignment code in a couple callers (BufferDescriptors/Blocks, proc.c stuff). $ pgbench --initialize --scale 1 pgbench Scale 1 isn't particularly helpful in benchmarks, not even read only ones. $ pgbench --protocol prepared --client 16 --jobs 16 --transactions 10 --select-only pgbench I'd suspect you're more likely to see differences with a higher client count. Also, I seriously doubt 100k xacts is enough to get stable results - even on my laptop I get 100k+ TPS. I'd suggest using something like -P 1 -T 100 or so. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CATUPDATE confusion?
All, On Sat, Dec 27, 2014 at 6:31 PM, Noah Misch n...@leadboat.com wrote: On Sat, Dec 27, 2014 at 06:26:02PM -0500, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: Plan C (remove CATUPDATE altogether) also has some merit. But adding a superuser override there would be entirely pointless. This is be my recommendation. I do not see the point of carrying the option around just to confuse new users of pg_authid and reviewers of the code. Yeah ... if no one's found it interesting in the 20 years since the code left Berkeley, it's unlikely that interest will emerge in the future. No objection here. Given this discussion, I have attached a patch that removes CATUPDATE for review/discussion. One of the interesting behaviors (or perhaps not) is how 'pg_class_aclmask' handles an invalid role id when checking permissions against 'rolsuper' instead of 'rolcatupdate'. This is demonstrated by the 'has_table_privilege' regression test expected results. In summary, 'has_rolcatupdate' raises an error when an invalid OID is provided, however, as documented in the source 'superuser_arg' does not, simply returning false. Therefore, when checking for superuser-ness of an non-existent role, the result will be false and not an error. Perhaps this is OK, but my concern would be on the expected behavior around items like 'has_table_privilege'. My natural thought would be that we would want to preserve that specific functionality, though short of adding a 'has_rolsuper' function that will raise an appropriate error, I'm uncertain of an approach. Thoughts? Thanks, Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml new file mode 100644 index 9ceb96b..635032d *** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** *** 1416,1430 /row row - entrystructfieldrolcatupdate/structfield/entry - entrytypebool/type/entry - entry -Role can update system catalogs directly. (Even a superuser cannot do -this unless this column is true) - /entry - /row - - row entrystructfieldrolcanlogin/structfield/entry entrytypebool/type/entry entry --- 1416,1421 *** SELECT * FROM pg_locks pl LEFT JOIN pg_p *** 8479,8494 /row row - entrystructfieldrolcatupdate/structfield/entry - entrytypebool/type/entry - entry/entry - entry -Role can update system catalogs directly. (Even a superuser cannot do -this unless this column is true) - /entry - /row - - row entrystructfieldrolcanlogin/structfield/entry entrytypebool/type/entry entry/entry --- 8470,8475 diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c new file mode 100644 index d30612c..18623ef *** a/src/backend/catalog/aclchk.c --- b/src/backend/catalog/aclchk.c *** aclcheck_error_type(AclResult aclerr, Oi *** 3423,3448 } - /* Check if given user has rolcatupdate privilege according to pg_authid */ - static bool - has_rolcatupdate(Oid roleid) - { - bool rolcatupdate; - HeapTuple tuple; - - tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid)); - if (!HeapTupleIsValid(tuple)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(role with OID %u does not exist, roleid))); - - rolcatupdate = ((Form_pg_authid) GETSTRUCT(tuple))-rolcatupdate; - - ReleaseSysCache(tuple); - - return rolcatupdate; - } - /* * Relay for the various pg_*_mask routines depending on object kind */ --- 3423,3428 *** pg_class_aclmask(Oid table_oid, Oid role *** 3620,3627 /* * Deny anyone permission to update a system catalog unless ! * pg_authid.rolcatupdate is set. (This is to let superusers protect ! * themselves from themselves.) Also allow it if allowSystemTableMods. * * As of 7.4 we have some updatable system views; those shouldn't be * protected in this way. Assume the view rules can take care of --- 3600,3606 /* * Deny anyone permission to update a system catalog unless ! * pg_authid.rolsuper is set. Also allow it if allowSystemTableMods. * * As of 7.4 we have some updatable system views; those shouldn't be * protected in this way. Assume the view rules can take care of *** pg_class_aclmask(Oid table_oid, Oid role *** 3630,3636 if ((mask (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) IsSystemClass(table_oid, classForm) classForm-relkind != RELKIND_VIEW ! !has_rolcatupdate(roleid) !allowSystemTableMods) { #ifdef ACLDEBUG --- 3609,3615 if ((mask (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE))
Re: [HACKERS] BUG #12330: ACID is broken for unique constraints
On 12/29/14, 10:53 AM, Kevin Grittner wrote: Merlin Moncure mmonc...@gmail.com wrote: Serialization errors only exist as a concession to concurrency and performance. Again, they should be returned as sparsely as possible because they provide absolutely (as Tom pointed out) zero detail to the application. That is false. They provide an *extremely* valuable piece of information which is not currently available when you get a duplicate key error -- whether the error occurred because of a race condition and will not fail for the same cause if retried. As for missing details like the duplicated key value, is there a reasonable way to add that as an errdetail() to a serialization failure error? We do still have to be careful here though; you could still have code using our documented upsert methodology inside a serialized transaction. For example, via a 3rd party extension that can't assume you're using serialized transactions. Would it still be OK to treat that as a serialization failure and bubble that all the way back to the application? As part of this, we should probably modify our upsert example so it takes transaction_isolation into account... As for the fact that RI violations also don't return a serialization failure when caused by a race with concurrent transactions, I view that as another weakness in PostgreSQL. I don't think there is a problem curing one without curing the other at the same time. I have known of people writing their own triggers to enforce RI rather than defining FKs precisely so that they can get a serialization failure return code and do automatic retry if it is caused by a race condition. That's less practical to compensate for when it comes to unique indexes or constraints. Wow, that's horrible. :( -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Detecting backend failures via libpq / DBD::Pg
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 I am working on enhancing the ping() method of DBD::Pg. The goal of that is for a user to be able to determine if the connection to the database is still valid. The basic way we do this is to send a simple SELECT via PQexec and then check for a valid return value (and when in doubt, we check PQstatus). This works fine for most transaction statuses, including idle, active, and idle in transaction. It even works for copy in and copy out, although it obviously invalidates the current COPY (caveat emptor!). The problem comes when ping() is called and we are in a failed transaction. After some experimenting, the best solution I found is to send the PQexec, and then check if PQresultErrorField(result, 'C') is '25P02'. If it is, then all is well, in that the server is still alive. If it is not, then we can assume the backend is bad (for example, PQerrorMessage gives a could not receive data from server: Bad file descriptor). Being that we cannot do a rollback before calling the PQexec, is this a decent solution? Can we depend on really serious errors always trumping the expected 25P02? - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201412291942 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAlSh9QEACgkQvJuQZxSWSsjDMQCg3CO1eyrFXNUnfRbk/rRJmrCl PEoAnRl+M67kTkuZDi+3zMyVyblLvl9I =uW6Q -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
On 12/29/14, 4:01 PM, Stephen Frost wrote: Adam, * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote: I'd suggest it's called DUMP if that's what it allows, to keep it separate from the backup parts. Makes sense to me. I'm fine calling it 'DUMP', but for different reasons. We have no (verifiable) idea what client program is being used to connect and therefore we shouldn't try to tie the name of the client program to the permission. That said, a 'DUMP' privilege which allows the user to dump the contents of the entire database is entirely reasonable. We need to be clear in the documentation though- such a 'DUMP' privilege is essentially granting USAGE on all schemas and table-level SELECT on all tables and sequences (anything else..?). To be clear, a user with this privilege can utilize that access without using pg_dump. Yeah... it may be better to call this something other than DUMP (see below). One other point is that this shouldn't imply any other privileges, imv. I'm specifically thinking of BYPASSRLS- that's independently grantable and therefore should be explicitly set, if it's intended. Things should work 'sanely' with any combination of the two options. That does violate POLA, but it's probably worth doing so... Similairly, DUMP shouldn't imply BACKUP or visa-versa. In particular, I'd like to see roles which have only the BACKUP privilege be unable to directly read any data (modulo things granted to PUBLIC). This would allow an unprivileged user to manage backups, kick off ad-hoc ones, etc, without being able to actually access any of the data (this would require the actual backup system to have a similar control, but that's entirely possible with more advanced SANs and enterprise backup solutions). Yes, since a binary backup doesn't need to actually read data, it shouldn't implicitly allow a user granted that role to either. Given this clarification: I think #1 could certainly be answered by using DUMP. I have no strong opinion in either direction, though I do think that BACKUP does make the most sense for #2. Previously, Stephen had mentioned a READONLY capability that could effectively work for pg_dump, though, Jim's suggestion of keeping 'read-all' separate from 'ability to pg_dump' seems logical. In either case, I certainly wouldn't mind having a wider agreement/consensus on this approach. The read-all vs. ability-to-pg_dump distinction doesn't really exist for role attributes, as I see it (see my comments above). That said, having DUMP or read-all is different from read-*only*, which would probably be good to have independently. I can imagine a use-case for a read-only account which only has read ability for those tables, schemas, etc, explicitly granted to it. My specific concern about DUMP vs read all/only is IIRC dump needs to do more extensive locking than regular selects do, which can cause production problems. There is one issue that occurs to me, however. We're talking about pg_dump, but what about pg_dumpall? In particular, I don't think the DUMP privilege should provide access to pg_authid, as that would allow the user to bypass the privilege system in some environments by using the hash to log in as a superuser. Now, I don't encourage using password based authentication, especially for superuser accounts, but lots of people do. The idea with these privileges is to allow certain operations to be performed by a non-superuser while preventing trivial access to superuser. Perhaps it's pie-in-the-sky, but my hope is to achieve that. Ugh, hadn't thought about that. :( So, here is a revised list of proposed attributes: * BACKUP - allows role to perform backup operations (as originally proposed) * LOG - allows role to rotate log files - remains broad enough to consider future log related operations * DUMP - allows role to perform pg_dump* backups of whole database * MONITOR - allows role to view pg_stat_* details (as originally proposed) * PROCSIGNAL - allows role to signal backend processes (as originally proposed)well Given the confusion that can exist with the data reading stuff, perhaps BINARY BACKUP or XLOG would be a better term, especially since this will probably have some overlap with streaming replication. Likewise, if we segregate DUMP and BYPASSRLS then I think we need to call DUMP something else. Otherwise, it's a massive foot-gun; you get a successful backup only to find out it contains only a small part of the database. My how this has become a can of worms... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detecting backend failures via libpq / DBD::Pg
Greg == Greg Sabino Mullane g...@turnstep.com writes: Greg I am working on enhancing the ping() method of DBD::Pg. The Greg goal of that is for a user to be able to determine if the Greg connection to the database is still valid. The basic way we do Greg this is to send a simple SELECT via PQexec Why not PQexec(conn, ) ? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
Jim, * Jim Nasby (jim.na...@bluetreble.com) wrote: On 12/29/14, 4:01 PM, Stephen Frost wrote: That said, a 'DUMP' privilege which allows the user to dump the contents of the entire database is entirely reasonable. We need to be clear in the documentation though- such a 'DUMP' privilege is essentially granting USAGE on all schemas and table-level SELECT on all tables and sequences (anything else..?). To be clear, a user with this privilege can utilize that access without using pg_dump. Yeah... it may be better to call this something other than DUMP (see below). Did you favor something else below..? I don't see it. One other point is that this shouldn't imply any other privileges, imv. I'm specifically thinking of BYPASSRLS- that's independently grantable and therefore should be explicitly set, if it's intended. Things should work 'sanely' with any combination of the two options. That does violate POLA, but it's probably worth doing so... That really depends on your view of the privilege. I'm inclined to view it from the more-tightly-constrained point of view which matches up with what I anticipate it actually providing. That doesn't translate into a short name very well though, unfortunately. The read-all vs. ability-to-pg_dump distinction doesn't really exist for role attributes, as I see it (see my comments above). That said, having DUMP or read-all is different from read-*only*, which would probably be good to have independently. I can imagine a use-case for a read-only account which only has read ability for those tables, schemas, etc, explicitly granted to it. My specific concern about DUMP vs read all/only is IIRC dump needs to do more extensive locking than regular selects do, which can cause production problems. The locks aren't any more extensive than regular selects, it's just that the entire pg_dump works inside of one large transaction which lasts a good long time and simply that can cause issues with high-rate update systems. There is one issue that occurs to me, however. We're talking about pg_dump, but what about pg_dumpall? In particular, I don't think the DUMP privilege should provide access to pg_authid, as that would allow the user to bypass the privilege system in some environments by using the hash to log in as a superuser. Now, I don't encourage using password based authentication, especially for superuser accounts, but lots of people do. The idea with these privileges is to allow certain operations to be performed by a non-superuser while preventing trivial access to superuser. Perhaps it's pie-in-the-sky, but my hope is to achieve that. Ugh, hadn't thought about that. :( I don't think it kills the whole idea, but we do need to work out how to address it. * BACKUP - allows role to perform backup operations (as originally proposed) * LOG - allows role to rotate log files - remains broad enough to consider future log related operations * DUMP - allows role to perform pg_dump* backups of whole database * MONITOR - allows role to view pg_stat_* details (as originally proposed) * PROCSIGNAL - allows role to signal backend processes (as originally proposed)well Given the confusion that can exist with the data reading stuff, perhaps BINARY BACKUP or XLOG would be a better term, especially since this will probably have some overlap with streaming replication. I don't really like 'xlog' as a permission. BINARY BACKUP is interesting but if we're going to go multi-word, I would think we would do PITR BACKUP or something FILESYSTEM BACKUP or similar. I'm not really a big fan of the two-word options though. Likewise, if we segregate DUMP and BYPASSRLS then I think we need to call DUMP something else. Otherwise, it's a massive foot-gun; you get a successful backup only to find out it contains only a small part of the database. That's actually already dealt with. pg_dump defaults to setting the row_security GUC to 'off', in which case you'll get an error if you hit a table that has RLS enabled and you don't have BYPASSRLS. If you're not checking for errors from pg_dump, well, there's a lot of ways you could run into problems. My how this has become a can of worms... I'm not sure it's as bad as all that, but it does require some thinking.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] CATUPDATE confusion?
Adam, * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote: Given this discussion, I have attached a patch that removes CATUPDATE for review/discussion. Thanks! One of the interesting behaviors (or perhaps not) is how 'pg_class_aclmask' handles an invalid role id when checking permissions against 'rolsuper' instead of 'rolcatupdate'. This is demonstrated by the 'has_table_privilege' regression test expected results. In summary, 'has_rolcatupdate' raises an error when an invalid OID is provided, however, as documented in the source 'superuser_arg' does not, simply returning false. Therefore, when checking for superuser-ness of an non-existent role, the result will be false and not an error. Perhaps this is OK, but my concern would be on the expected behavior around items like 'has_table_privilege'. My natural thought would be that we would want to preserve that specific functionality, though short of adding a 'has_rolsuper' function that will raise an appropriate error, I'm uncertain of an approach. Thoughts? This role oid check is only getting called in this case because it's pg_authid which is getting checked (because it's a system catalog table) and it's then falling into this one case. I don't think we need to preserve that. If you do the same query with that bogus OID but a normal table, you get the same 'f' result that the regression test gives with this change. We're not back-patching this and I seriously doubt anyone is relying on this special response for system catalog tables. So, unless anyone wants to object, I'll continue to look over this patch for eventual application against master. If there are objections, it'd be great if they could be voiced sooner rather than later. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] nls and server log
On 12/30/2014 06:39 AM, Jim Nasby wrote: How much of this issue is caused by trying to machine-parse log files? Is a better option to improve that case, possibly doing something like including a field in each line that tells you the encoding for that entry? That'd be absolutely ghastly. You couldn't just view the logs with 'less' or a text editor if your logs had mixed encodings, you'd need some kind of special PostgreSQL log viewer tool. Why would we possibly do that when we could just emit utf-8 instead? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detecting backend failures via libpq / DBD::Pg
On 12/29/14, 6:43 PM, Greg Sabino Mullane wrote: I am working on enhancing the ping() method of DBD::Pg. The goal of that is for a user to be able to determine if the connection to the database is still valid. This is actually a VERY common thing for monitoring frameworks to do. Currently, the general method seems to be something akin to psql -c 'SELECT 1' ... perhaps instead of going through all the effort that you currently are we could add a FEBE ping command, and expose that through a special psql option (or maybe a special pg_ping command). This won't be noticeably faster than SELECT 1, but it would prevent a bunch of pointless work on the backend side, and should greatly simplify DBD's ping(). Only thing I'm not sure of is if this could be made to be safe within a COPY... :( -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
On 12/29/14, 7:22 PM, Stephen Frost wrote: One other point is that this shouldn't imply any other privileges, imv. I'm specifically thinking of BYPASSRLS- that's independently grantable and therefore should be explicitly set, if it's intended. Things should work 'sanely' with any combination of the two options. That does violate POLA, but it's probably worth doing so... That really depends on your view of the privilege. I'm inclined to view it from the more-tightly-constrained point of view which matches up with what I anticipate it actually providing. That doesn't translate into a short name very well though, unfortunately. READ MOST? ;) The read-all vs. ability-to-pg_dump distinction doesn't really exist for role attributes, as I see it (see my comments above). That said, having DUMP or read-all is different from read-*only*, which would probably be good to have independently. I can imagine a use-case for a read-only account which only has read ability for those tables, schemas, etc, explicitly granted to it. My specific concern about DUMP vs read all/only is IIRC dump needs to do more extensive locking than regular selects do, which can cause production problems. The locks aren't any more extensive than regular selects, it's just that the entire pg_dump works inside of one large transaction which lasts a good long time and simply that can cause issues with high-rate update systems. Good, so really DUMP and READ ALL are the same. There is one issue that occurs to me, however. We're talking about pg_dump, but what about pg_dumpall? In particular, I don't think the DUMP privilege should provide access to pg_authid, as that would allow the user to bypass the privilege system in some environments by using the hash to log in as a superuser. Now, I don't encourage using password based authentication, especially for superuser accounts, but lots of people do. The idea with these privileges is to allow certain operations to be performed by a non-superuser while preventing trivial access to superuser. Perhaps it's pie-in-the-sky, but my hope is to achieve that. Ugh, hadn't thought about that.:( I don't think it kills the whole idea, but we do need to work out how to address it. Yeah... it does indicate that we shouldn't call this thing DUMP, but perhaps as long as we put appropriate HINTS in the error paths that pg_dumpall would hit it's OK to call it DUMP. (My specific concern is it's natural to assume that DUMP would include pg_dumpall). Given the confusion that can exist with the data reading stuff, perhaps BINARY BACKUP or XLOG would be a better term, especially since this will probably have some overlap with streaming replication. I don't really like 'xlog' as a permission. BINARY BACKUP is interesting but if we're going to go multi-word, I would think we would do PITR BACKUP or something FILESYSTEM BACKUP or similar. I'm not really a big fan of the two-word options though. BINARY? Though that's pretty generic. STREAM might be better, even though it's not exactly accurate for a simple backup. Perhaps this is just a documentation issue, but there's enough caveats floating around here that I'm reluctant to rely on that. Likewise, if we segregate DUMP and BYPASSRLS then I think we need to call DUMP something else. Otherwise, it's a massive foot-gun; you get a successful backup only to find out it contains only a small part of the database. That's actually already dealt with. pg_dump defaults to setting the row_security GUC to 'off', in which case you'll get an error if you hit a table that has RLS enabled and you don't have BYPASSRLS. If you're not checking for errors from pg_dump, well, there's a lot of ways you could run into problems. This also indicates DUMP is better than something like READ ALL, if we can handle the pg_dumpall case. Based on all this, my inclination is DUMP with appropriate hints for pg_dumpall, and BINARY. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Mon, Dec 29, 2014 at 2:29 PM, Jeff Janes jeff.ja...@gmail.com wrote: I've attached an example output of querying via index and via full table scan, and also the pageinspect output of the blocks which have the 3 rows, in case that is helpful. You might have also included output from pageinspect's bt_page_items() function. Take a look at the documentation patch I just posted if the details are unclear. Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
Hi. I'm re-attaching the two patches as produced by format-patch. I haven't listed any reviewers. It's either just Andres, or maybe a lot of people. Is anyone in a position to try out the patches on MSVC and see if they build and work sensibly, please? (Otherwise it may be better to remove those bits from the patch for now.) Thanks. -- Abhijit From d5a90d31f9c35fea2636ec6baf6acc66e91fd655 Mon Sep 17 00:00:00 2001 From: Abhijit Menon-Sen a...@2ndquadrant.com Date: Tue, 30 Dec 2014 12:41:19 +0530 Subject: Implement slice-by-8 CRC calculation The COMP_CRC32C macro now calls pg_comp_crc32c(), which processes eight data bytes at a time. Its output is identical to the byte-at-a-time CRC code. (This change does not apply to the LEGACY_CRC32 computation.) Author: Abhijit Menon-Sen --- src/include/utils/pg_crc.h| 6 +- src/include/utils/pg_crc_tables.h | 578 +- src/port/pg_crc.c | 87 ++ 3 files changed, 604 insertions(+), 67 deletions(-) diff --git a/src/include/utils/pg_crc.h b/src/include/utils/pg_crc.h index f871cba..55934e5 100644 --- a/src/include/utils/pg_crc.h +++ b/src/include/utils/pg_crc.h @@ -41,6 +41,8 @@ typedef uint32 pg_crc32; +extern pg_crc32 pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len); + /* * CRC calculation using the CRC-32C (Castagnoli) polynomial. * @@ -51,7 +53,7 @@ typedef uint32 pg_crc32; #define INIT_CRC32C(crc) ((crc) = 0x) #define FIN_CRC32C(crc) ((crc) ^= 0x) #define COMP_CRC32C(crc, data, len) \ - COMP_CRC32_NORMAL_TABLE(crc, data, len, pg_crc32c_table) + ((crc) = pg_comp_crc32c((crc), (char *) (data), (len))) #define EQ_CRC32C(c1, c2) ((c1) == (c2)) /* @@ -115,7 +117,7 @@ do { \ } while (0) /* Constant tables for CRC-32C and CRC-32 polynomials */ -extern CRCDLLIMPORT const uint32 pg_crc32c_table[]; +extern CRCDLLIMPORT const uint32 pg_crc32c_table[8][256]; extern CRCDLLIMPORT const uint32 pg_crc32_table[]; #endif /* PG_CRC_H */ diff --git a/src/include/utils/pg_crc_tables.h b/src/include/utils/pg_crc_tables.h index cb6b470..f814c91 100644 --- a/src/include/utils/pg_crc_tables.h +++ b/src/include/utils/pg_crc_tables.h @@ -28,71 +28,519 @@ * This table is based on the so-called Castagnoli polynomial (the same * that is used e.g. in iSCSI). */ -const uint32 pg_crc32c_table[256] = { - 0x, 0xF26B8303, 0xE13B70F7, 0x1350F3F4, - 0xC79A971F, 0x35F1141C, 0x26A1E7E8, 0xD4CA64EB, - 0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B, - 0x4D43CFD0, 0xBF284CD3, 0xAC78BF27, 0x5E133C24, - 0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B, - 0xD7C45070, 0x25AFD373, 0x36FF2087, 0xC494A384, - 0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54, - 0x5D1D08BF, 0xAF768BBC, 0xBC267848, 0x4E4DFB4B, - 0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A, - 0xE72719C1, 0x154C9AC2, 0x061C6936, 0xF477EA35, - 0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5, - 0x6DFE410E, 0x9F95C20D, 0x8CC531F9, 0x7EAEB2FA, - 0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45, - 0xF779DEAE, 0x05125DAD, 0x1642AE59, 0xE4292D5A, - 0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A, - 0x7DA08661, 0x8FCB0562, 0x9C9BF696, 0x6EF07595, - 0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48, - 0x86E18AA3, 0x748A09A0, 0x67DAFA54, 0x95B17957, - 0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687, - 0x0C38D26C, 0xFE53516F, 0xED03A29B, 0x1F682198, - 0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927, - 0x96BF4DCC, 0x64D4CECF, 0x77843D3B, 0x85EFBE38, - 0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8, - 0x1C661503, 0xEE0D9600, 0xFD5D65F4, 0x0F36E6F7, - 0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096, - 0xA65C047D, 0x5437877E, 0x4767748A, 0xB50CF789, - 0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859, - 0x2C855CB2, 0xDEEEDFB1, 0xCDBE2C45, 0x3FD5AF46, - 0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9, - 0xB602C312, 0x44694011, 0x5739B3E5, 0xA55230E6, - 0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36, - 0x3CDB9BDD, 0xCEB018DE, 0xDDE0EB2A, 0x2F8B6829, - 0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C, - 0x456CAC67, 0xB7072F64, 0xA457DC90, 0x563C5F93, - 0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043, - 0xCFB5F4A8, 0x3DDE77AB, 0x2E8E845F, 0xDCE5075C, - 0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3, - 0x55326B08, 0xA759E80B, 0xB4091BFF, 0x466298FC, - 0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C, - 0xDFEB33C7, 0x2D80B0C4, 0x3ED04330, 0xCCBBC033, - 0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652, - 0x65D122B9, 0x97BAA1BA, 0x84EA524E, 0x7681D14D, - 0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D, - 0xEF087A76, 0x1D63F975, 0x0E330A81, 0xFC588982, - 0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D, - 0x758FE5D6, 0x87E466D5, 0x94B49521, 0x66DF1622, - 0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2, - 0xFF56BD19, 0x0D3D3E1A, 0x1E6DCDEE, 0xEC064EED, - 0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530, - 0x0417B1DB, 0xF67C32D8, 0xE52CC12C, 0x1747422F, - 0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF, - 0x8ECEE914, 0x7CA56A17, 0x6FF599E3, 0x9D9E1AE0, -
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Mon, Dec 29, 2014 at 9:12 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Dec 29, 2014 at 2:29 PM, Jeff Janes jeff.ja...@gmail.com wrote: Using the vallock2 version of V1.8, using the test I previously described, I get some all-null rows, which my code should never create. Also, the index and table don't agree, in this example I find 3 all-null rows in the table, but only 2 in the index. Just to be clear: You haven't found any such issue with approach #1 to value locking, right? Correct, I haven't seen any problems with approach #1 I'm curious about how long it took you to see the issue with #2. Were there any special steps? What were the exact steps involved in turning off the hard crash mechanism you mention? Generally the problem will occur early on in the process, and if not then it will not occur at all. I think that is because the table starts out empty, and so a lot of insertions collide with each other. Once the table is more thoroughly populated, most query takes the CONFLICT branch and therefore two insertion-branches are unlikely to collide. At its simplest, I just use the count_upsert.pl script and your patch and forget all the rest of the stuff from my test platform. So: pg_ctl stop -D /tmp/data2; rm /tmp/data2 -r; ../torn_bisect/bin/pg_ctl initdb -D /tmp/data2; ../torn_bisect/bin/pg_ctl start -D /tmp/data2 -o --fsync=off -w ; createdb; perl count_upsert.pl 8 10 A run of count_upsert.pl 8 10 takes about 30 seconds on my machine (8 core), and if it doesn't create a problem then I just destroy the database and start over. The fsync=off is not important, I've seen the problem once without it. I just include it because otherwise the run takes a lot longer. I've attached another version of the count_upsert.pl script, with some more logging targeted to this particular issue. The problem shows up like this: init done at count_upsert.pl line 97. sum is 1036 count is 9720 seq scan doesn't match index scan 1535 == 1535 and 1 == 6 $VAR1 = [ [ 6535, -21 ], . (Thousands of more lines, as it outputs the entire table twice, once gathered by seq scan, once by bitmap index scan). The first three lines are normal, the problem starts with the seq scan doesn't match... In this case the first problem it ran into was that key 1535 was present once with a count column of 1 (found by seq scan) and once with a count column of 6 (found by index scan). It was also in the seq scan with a count of 6, but the way the comparison works is that it sorts each representation of the table by the key column value and then stops at the first difference, in this case count columns 1 == 6 failed the assertion. If you get some all-NULL rows, then you will also get Perl warnings issued when the RETURNING clause starts returning NULL when none are expected to be. The overall pattern seems to be pretty streaky. It could go 20 iterations with no problem, and then it will fail several times in a row. I've seen this pattern quite a bit with other race conditions as well, I think that they may be sensitive to how memory gets laid out between CPUs, and that might depend on some longer-term characteristic of the state of the machine that survives an initdb. By the way, I also got a new error message a few times that I think might be a manifestation of the same thing: ERROR: duplicate key value violates unique constraint foo_index_idx DETAIL: Key (index)=(6106) already exists. STATEMENT: insert into foo (index, count) values ($2,$1) on conflict (index) update set count=TARGET.count + EXCLUDED.count returning foo.count Cheers, Jeff count_upsert.pl Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers