Re: [HACKERS] autonomous transactions
On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanovawrote: > > On 30 August 2016 at 23:10, Joel Jacobson wrote: > > > > There should be a way to within the session and/or txn permanently > > block autonomous transactions. > > > > This will defeat one of the use cases of autonomous transactions: auditing My idea on how to deal with this would be to mark the function to be "AUTONOMOUS" similar to how a function is marked to be "PARALLEL SAFE", and to throw an error if a caller that has blocked autonomous transactions tries to call a function that is marked to be autonomous. That way none of the code that needs to be audited would ever get executed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_sequence catalog
While I was hacking around sequence stuff, I felt the urge to look into an old peeve: That sequence metadata is not stored in a proper catalog. Right now in order to find out some metadata about sequences (min, max, increment, etc.), you need to look into the sequence. That is like having to query a table in order to find out about its schema. There are also known issues with the current storage such as that we can't safely update the sequence name stored in the sequence when we rename, so we just don't. This patch introduces a new catalog pg_sequence that stores the sequence metadata. The sequences themselves now only store the counter and supporting information. I don't know if this is a net improvement. Maybe this introduces as many new issues as it removes. But I figured I'll post this, so that at least we can discuss it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile index 1ce7610..cbf0d79 100644 --- a/src/backend/catalog/Makefile +++ b/src/backend/catalog/Makefile @@ -41,7 +41,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\ pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \ pg_foreign_table.h pg_policy.h pg_replication_origin.h \ pg_default_acl.h pg_init_privs.h pg_seclabel.h pg_shseclabel.h \ - pg_collation.h pg_range.h pg_transform.h \ + pg_collation.h pg_range.h pg_transform.h pg_sequence.h \ toasting.h indexing.h \ ) diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 04d7840..8e1e1ac 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -66,6 +66,7 @@ #include "commands/proclang.h" #include "commands/schemacmds.h" #include "commands/seclabel.h" +#include "commands/sequence.h" #include "commands/trigger.h" #include "commands/typecmds.h" #include "nodes/nodeFuncs.h" @@ -1155,6 +1156,8 @@ doDeletion(const ObjectAddress *object, int flags) else heap_drop_with_catalog(object->objectId); } +if (relKind == RELKIND_SEQUENCE) + DeleteSequenceTuple(object->objectId); break; } diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index 00550eb..182d2d0 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -1535,15 +1535,16 @@ CREATE VIEW sequences AS CAST(64 AS cardinal_number) AS numeric_precision, CAST(2 AS cardinal_number) AS numeric_precision_radix, CAST(0 AS cardinal_number) AS numeric_scale, - CAST(p.start_value AS character_data) AS start_value, - CAST(p.minimum_value AS character_data) AS minimum_value, - CAST(p.maximum_value AS character_data) AS maximum_value, - CAST(p.increment AS character_data) AS increment, - CAST(CASE WHEN p.cycle_option THEN 'YES' ELSE 'NO' END AS yes_or_no) AS cycle_option -FROM pg_namespace nc, pg_class c, LATERAL pg_sequence_parameters(c.oid) p + CAST(s.seqstart AS character_data) AS start_value, + CAST(s.seqmin AS character_data) AS minimum_value, + CAST(s.seqmax AS character_data) AS maximum_value, + CAST(s.seqincrement AS character_data) AS increment, + CAST(CASE WHEN s.seqcycle THEN 'YES' ELSE 'NO' END AS yes_or_no) AS cycle_option +FROM pg_namespace nc, pg_class c, pg_sequence s WHERE c.relnamespace = nc.oid AND c.relkind = 'S' AND (NOT pg_is_other_temp_schema(nc.oid)) + AND c.oid = s.seqrelid AND (pg_has_role(c.relowner, 'USAGE') OR has_sequence_privilege(c.oid, 'SELECT, UPDATE, USAGE') ); diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index c98f981..95bd172 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -22,8 +22,10 @@ #include "access/xloginsert.h" #include "access/xlogutils.h" #include "catalog/dependency.h" +#include "catalog/indexing.h" #include "catalog/namespace.h" #include "catalog/objectaccess.h" +#include "catalog/pg_sequence.h" #include "catalog/pg_type.h" #include "commands/defrem.h" #include "commands/sequence.h" @@ -74,7 +76,7 @@ typedef struct SeqTableData int64 cached; /* last value already cached for nextval */ /* if last != cached, we have not used up all the cached values */ int64 increment; /* copy of sequence's increment field */ - /* note that increment is zero until we first do read_seq_tuple() */ + /* note that increment is zero until we first do nextval_internal() */ } SeqTableData; typedef SeqTableData *SeqTable; @@ -92,10 +94,11 @@ static int64 nextval_internal(Oid relid); static Relation open_share_lock(SeqTable seq); static void create_seq_hashtable(void); static void init_sequence(Oid relid, SeqTable *p_elm,
Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_receivexlog --synchronous
On 29 August 2016 at 12:34, Tom Lanewrote: > Simon Riggs writes: >> Fix pg_receivexlog --synchronous > > The buildfarm says you broke the 9.5 branch. > > In general, pushing inessential patches just a few hours before a wrap > deadline is a dangerous business. Pushing them without any testing > is close to irresponsible. Sorry about that everybody. Thanks to Alvaro for doing that in my absence. I pushed to 9.5 because of a misunderstanding that the author was saying to me they had also tested it for 9.5. It was not knowingly untested, but responsibility and mistake was mine in not confirming that with my own eyes before pushing. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentrautwrote: > - A API interface to open a "connection" to a background worker, run > queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(), > AutonomousSessionExecute(), etc. The communication happens using the > client/server protocol. > I'm surprised by the background worker. I had envisioned autonomous transactions being implemented by allowing a process to reserve a second entry in PGPROC with the same pid. Or perhaps save its existing information in a separate pgproc slot (or stack of slots) and restoring it after the autonomous transaction commits. Using a background worker mean that the autonomous transaction can't access any state from the process memory. Parameters in plpgsql are a symptom of this but I suspect there will be others. What happens if a statement timeout occurs during an autonomous transaction? What happens if you use a pl language in the autonomous transaction and if it tries to use non-transactional information such as prepared statements? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
I wrote: > Personally, my big beef with the current approach to sequences is that > we eat a whole relation (including a whole relfilenode) per sequence. > I wish that we could reduce a sequence to just a single row in a > catalog, including the nontransactional state. Not sure how feasible > that is either, but accomplishing it would move the benefits of making > a change out of the "debatable whether it's worth it" category, IMO. BTW, another thing to keep in mind here is the ideas that have been kicked around in the past about alternative sequence implementations managed through a "sequence AM API". I dunno whether now is the time to start creating that API abstraction, but let's at least consider it if we're whacking the catalog representation around. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
Craig Ringerwrites: > On 31 August 2016 at 22:01, Tom Lane wrote: >> Personally, my big beef with the current approach to sequences is that >> we eat a whole relation (including a whole relfilenode) per sequence. >> I wish that we could reduce a sequence to just a single row in a >> catalog, including the nontransactional state. > It sounds like you're thinking of something like a normal(ish) heap > tuple where we just overwrite some fields in-place without fiddling > xmin/xmax and making a new row version. Right? Like we currently > overwrite the lone Form_pg_sequence on the 1-page sequence relations. That would be what to do with the nontransactional state. If I recall previous discussions correctly, there's a stumbling block if you want to treat ALTER SEQUENCE changes as transactional --- but maybe that doesn't make sense anyway. If we did want to try that, maybe we need two auxiliary catalogs, one for the transactionally-updatable sequence fields and one for the nontransactional fields. > It feels intuitively pretty gross to effectively dirty-read and write > a few fields of a tuple. But that's what we do all the time with > xmin/xmax etc, it's not really that different. True. I think two rows would work around that, but maybe we don't have to. Another issue is what is the low-level interlock between nextvals in different processes. Right now it's the buffer lock on the sequence's page. With a scheme like this, if we just kept doing that, we'd have a single lock covering probably O(100) different sequences which might lead to contention problems. We could probably improve on that with some thought. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
On 31 August 2016 at 22:01, Tom Lanewrote: > Personally, my big beef with the current approach to sequences is that > we eat a whole relation (including a whole relfilenode) per sequence. > I wish that we could reduce a sequence to just a single row in a > catalog, including the nontransactional state. I'd be happy to see incremental improvement in this space as Peter has suggested, though I certainly see the value of something like seqam too. It sounds like you're thinking of something like a normal(ish) heap tuple where we just overwrite some fields in-place without fiddling xmin/xmax and making a new row version. Right? Like we currently overwrite the lone Form_pg_sequence on the 1-page sequence relations. I initially thought that TRUNCATE ... RESTART IDENTITY would be somewhat of a problem with this. We effectively have a temporary "timeline" fork in the sequence value where it's provisionally restarted and we start using values from the restarted sequence within the xact that restarted it. But actually, it'd fit pretty well. TRUNCATE ... RESTART IDENTITY would write a new row version with a new xmin, and set xmax on the old sequence row. nextval(...) within the truncating xact would update the new row's non-transactional fields when it allocated new sequence chunks. On commit, everyone starts using the new row due to normal transactional visibility rules. On rollback everyone ignores it like they would any other dead tuple from an aborted act and uses the old tuple's nontransactional fields. It Just Works(TM). nextval(...) takes AccessShareLock on a sequence relation. TRUNCATE ... RESTART IDENTITY takes AccessExclusiveLock. So we can never have nextval(...) advancing the "old" timeline in other xacts at the same time as we consume values on the restarted sequence inside the xact that did the restarting. We still need the new "timeline" though, because we have to retain the old value for rollback. It feels intuitively pretty gross to effectively dirty-read and write a few fields of a tuple. But that's what we do all the time with xmin/xmax etc, it's not really that different. It'd certainly make sequence decoding easier too. A LOT easier. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] less expensive pg_buffercache on big shmem
Hi hackers, Recently I have finished my work on a patch for pg_buffercache contrib, I think it's time to share my results. Introduction I want to offer you the implementation that allows to decrease system workload by partially sacrificing (fully snapshot consistency) data consistency. Sometimes we do not need full data consistency, for example on quantitative rather than qualitative analysis of memory contents, or when we want to catch insufficient memory resources or how often relation is used. Problem description === Currently, the pg_buffercache v1.1 and prior takes an exclusive lock on all shared buffers, which greatly affects system performance. Usually we use pg_buffercache to find out why DB is working slower than expected or examine what occupies the entire memory. So when we run pg_buffercache on such system, we make it even slower. Implementation == Vanilla implementation contains loop which collecting statistic from whole shared memory acquire, read and release Spinlocks one by one, page by page while holding LWLock. V1.2 implementation contains flexible loop which can collect shared memory statistic using three different methods: 1) with holding LWLock only on one partition of shared memory (semiconsistent method) 2) without LWLocks (nonconsistent method), 3) or in vanilia way (consistent method) The aforementioned allow us to launch pg_buffercache in the three different ways. Each of them have some advantages and some disadvantages: Consistent: + 100% consistency of shared memory snapshot - Slowdown the system with whole shared memory exclusive lock Semiconsistent: + Faster than consistent method + Mostly doesn`t affect on the system load - Speed of taking that snapshot is low Nonconsistent: The fastest + Doesn`t noticeably affects on the systems - <3% lost of snapshot consistency What works == Actually, it work well even on big load, but of course there might be things I've overlooked. VIEW pg_buffercache_cons VIEW pg_buffercache_noncons VIEW pg_buffercache_semicons Examples from docs in new realization: SELECT c.relname, count(*) AS buffers FROM pg_buffercache_noncons b INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database())) GROUP BY c.relname ORDER BY 2 DESC LIMIT 10; SELECT c.relname, count(*) AS buffers FROM pg_buffercache_semicons b INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database())) GROUP BY c.relname ORDER BY 2 DESC LIMIT 10; Testing the implementation == How implementation tested: 1) Start server 2) Make pgbench tps pgbench -c 250 -s 1000 -T 200 -P1 3) Compare how tps sags under load if: SELECT count(*) FROM pg_buffercache_cons; SELECT count(*) FROM pg_buffercache_semicons; SELECT count(*) FROM pg_buffercache_noncons; This test was made on server (server parameters) Model name:Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz CPU(s):144 Socket(s): 4 Shared_buffers:200GB Results of testing == Our DBA team obtained the following results: Nonconsistent: * 10% faster then consistent method * doesn`t noticeably affects on the systems * the maximum loss of accuracy was less then 3%* ( in most situation it is permissible accuracy loss ) Semiconsistent: * 5 time slower then nonconsistent * made less affects on system compared to consistent Overall results: Our clients was pleased with this implementation. Implementation is made with backward compatibility, as a conclusion old pg_buffercache v1.1 queries will work well. Semiconsistent show results approaching to nonconsistent on SELECTONLY queries. * this values were obtained from our DBA tests. What can be better === It is unclear how to optimize the semiconsistent method to make it faster, and reduce temporary effect that appears from time to time. I will be glad to see your feedback! --- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile index 065d3d6..8813c50 100644 --- a/contrib/pg_buffercache/Makefile +++ b/contrib/pg_buffercache/Makefile @@ -4,7 +4,7 @@ MODULE_big = pg_buffercache OBJS = pg_buffercache_pages.o $(WIN32RES) EXTENSION = pg_buffercache -DATA = pg_buffercache--1.1.sql pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql +DATA = pg_buffercache--1.2.sql pg_buffercache--1.1--1.2.sql pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql PGFILEDESC = "pg_buffercache - monitoring of shared buffer cache in real-time" ifdef USE_PGXS diff --git a/contrib/pg_buffercache/README
[HACKERS] proposal: psql \setfileref
Hi I propose a new type of psql variables - file references. The content of file reference is specified by referenced file. It allows simple inserting large data without necessity of manual escaping or using LO api. When I wrote the patch, I used parametrized queries for these data instead escaped strings - the code is not much bigger, and the error messages are much more friendly if query is not bloated by bigger content. The text mode is used only - when escaping is not required, then content is implicitly transformed to bytea. By default the content of file is bytea. When use requires escaping, then he enforces text escaping - because it has sense only for text type. postgres=# \setfileref a ~/test2.xml postgres=# \setfileref b ~/avatar.gif postgres=# insert into test values(convert_from(:a, 'latin2')::xml, :b); -- xml is passed as bytea postgres=# insert into test values(:'a', :b); -- xml is passed via unknown text value The content of file reference variables is not persistent in memory. Comments, notes? Regards Pavel commit 077c71b1f8ae24ccf2f3723e1e4ca5bf05bca0d3 Author: Pavel StehuleDate: Wed Aug 31 17:15:33 2016 +0200 initial diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 4aaf657..3150510 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1355,6 +1355,32 @@ exec_command(const char *cmd, free(envval); } + /* \setfileref - set variable by reference on file */ + else if (strcmp(cmd, "setfileref") == 0) + { + char *name = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); + + char *ref = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); + + success = false; + + if (!name || !ref) + { + psql_error("\\%s: missing required argument\n", cmd); + success = false; + } + else + { + if (!SetFileRef(pset.vars, name, ref)) + { +psql_error("\\%s: error while setting variable\n", cmd); +success = false; + } + } + } + /* \sf -- show a function's source code */ else if (strcmp(cmd, "sf") == 0 || strcmp(cmd, "sf+") == 0) { diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 7399950..3c1db17 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -32,7 +32,6 @@ static bool ExecQueryUsingCursor(const char *query, double *elapsed_msec); static bool command_no_begin(const char *query); static bool is_select_command(const char *query); - /* * openQueryOutputFile --- attempt to open a query output file * @@ -108,6 +107,123 @@ setQFout(const char *fname) return true; } +void +psql_reset_query_params(void) +{ + int i; + + for (i = 0; i < pset.nparams; i++) + if (pset.params[i] != NULL) + { + PQfreemem(pset.params[i]); + pset.params[i] = NULL; + } + + pset.nparams = 0; +} + +/* + * Load a content of the file_ref related file to query params buffer. + * When escaping is requested, then the text content is expected. + * Without escaping the bytea content is expected and related bytea + * escaping is processed. + */ +#define BYTEAOID 17 +#define UNKNOWNOID 0 + +static char * +get_file_ref_content(const char *value, bool escape, bool as_ident) +{ + PQExpBufferData buffer; + FILE *fd = NULL; + char *fname; + char *escaped_value; + charline[1024]; + size_tsize; + + fname = pstrdup(value); + + expand_tilde(); + if (!fname) + { + psql_error("missing valid path to file\n"); + return NULL; + } + + canonicalize_path(fname); + + fd = fopen(fname, PG_BINARY_R); + if (!fd) + { + psql_error("%s: %s\n", fname, strerror(errno)); + PQfreemem(fname); + return NULL; + } + + /* can append another parameter */ + if (pset.nparams >= MAX_BINARY_PARAMS) + { + psql_error("too much binary parameters"); + PQfreemem(fname); + return NULL; + } + + if (!pset.db) + { + psql_error("cannot escape without active connection\n"); + PQfreemem(fname); + return NULL; + } + + initPQExpBuffer(); + + while ((size = fread(line, 1, sizeof(line), fd)) > 0) + appendBinaryPQExpBuffer(, line, size); + + if (ferror(fd)) + { + psql_error("%s: %s\n", fname, strerror(errno)); + PQfreemem(fname); + termPQExpBuffer(); + return NULL; + } + + if (escape) + { + if (as_ident) + escaped_value = +PQescapeIdentifier(pset.db, buffer.data, buffer.len); + else + escaped_value = +PQescapeLiteral(pset.db, buffer.data, buffer.len); + pset.paramTypes[pset.nparams] = UNKNOWNOID; + } + else + { + escaped_value = (char *) +PQescapeByteaConn(pset.db, + (const unsigned char *) buffer.data, buffer.len, ); + pset.paramTypes[pset.nparams] = BYTEAOID; + } + + /* fname, buffer is not necessary longer */ + PQfreemem(fname); + termPQExpBuffer(); + + if (escaped_value == NULL) + { + const char *error = PQerrorMessage(pset.db); + + psql_error("%s", error); + return NULL; + } + + pset.params[pset.nparams] = escaped_value; + + snprintf(line, sizeof(line) - 1, "$%d", ++pset.nparams); + + return pstrdup(line); +}
Re: [HACKERS] Missing checks when malloc returns NULL...
Michael Paquierwrites: > Which means that processes have an escape window when initializing > shared memory by cleaning up the index if an entry cannot be found and > then cannot be created properly. I don't think that it is a good idea > to change that by forcing ShmemAlloc to fail. So I would tend to just > have the patch attached and add those missing NULL-checks on all the > existing ShmemAlloc() calls. > Opinions? I still think it'd be better to fix that as attached, because it represents a net reduction not net addition of code, and it provides a defense against future repetitions of the same omission. If only 4 out of 11 existing calls were properly checked --- some of them adjacent to calls with checks --- that should tell us that we *will* have more instances of the same bug if we don't fix it centrally. I also note that your patch missed checks for two ShmemAlloc calls in InitShmemAllocation and ShmemInitStruct. Admittedly, since those are the very first such calls, it's highly unlikely they'd fail; but I think this exercise is not about dismissing failures as improbable. Almost all of these failures are improbable, given that we precalculate the shmem space requirement. regards, tom lane diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c index 1efe020..cc3af2d 100644 *** a/src/backend/storage/ipc/shmem.c --- b/src/backend/storage/ipc/shmem.c *** InitShmemAllocation(void) *** 163,177 /* * ShmemAlloc -- allocate max-aligned chunk from shared memory * ! * Assumes ShmemLock and ShmemSegHdr are initialized. * ! * Returns: real pointer to memory or NULL if we are out ! * of space. Has to return a real pointer in order ! * to be compatible with malloc(). */ void * ShmemAlloc(Size size) { Size newStart; Size newFree; void *newSpace; --- 163,194 /* * ShmemAlloc -- allocate max-aligned chunk from shared memory * ! * Throws error if request cannot be satisfied. * ! * Assumes ShmemLock and ShmemSegHdr are initialized. */ void * ShmemAlloc(Size size) { + void *newSpace; + + newSpace = ShmemAllocNoError(size); + if (!newSpace) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of shared memory (%zu bytes requested)", + size))); + return newSpace; + } + + /* + * ShmemAllocNoError -- allocate max-aligned chunk from shared memory + * + * As ShmemAlloc, but returns NULL if out of space, rather than erroring. + */ + void * + ShmemAllocNoError(Size size) + { Size newStart; Size newFree; void *newSpace; *** ShmemAlloc(Size size) *** 206,216 SpinLockRelease(ShmemLock); ! if (!newSpace) ! ereport(WARNING, ! (errcode(ERRCODE_OUT_OF_MEMORY), ! errmsg("out of shared memory"))); ! Assert(newSpace == (void *) CACHELINEALIGN(newSpace)); return newSpace; --- 223,229 SpinLockRelease(ShmemLock); ! /* note this assert is okay with newSpace == NULL */ Assert(newSpace == (void *) CACHELINEALIGN(newSpace)); return newSpace; *** ShmemInitHash(const char *name, /* table *** 293,299 * The shared memory allocator must be specified too. */ infoP->dsize = infoP->max_dsize = hash_select_dirsize(max_size); ! infoP->alloc = ShmemAlloc; hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE; /* look it up in the shmem index */ --- 306,312 * The shared memory allocator must be specified too. */ infoP->dsize = infoP->max_dsize = hash_select_dirsize(max_size); ! infoP->alloc = ShmemAllocNoError; hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE; /* look it up in the shmem index */ *** ShmemInitStruct(const char *name, Size s *** 364,375 */ Assert(shmemseghdr->index == NULL); structPtr = ShmemAlloc(size); - if (structPtr == NULL) - ereport(ERROR, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("not enough shared memory for data structure" - " \"%s\" (%zu bytes requested)", - name, size))); shmemseghdr->index = structPtr; *foundPtr = FALSE; } --- 377,382 *** ShmemInitStruct(const char *name, Size s *** 410,416 else { /* It isn't in the table yet. allocate and initialize it */ ! structPtr = ShmemAlloc(size); if (structPtr == NULL) { /* out of memory; remove the failed ShmemIndex entry */ --- 417,423 else { /* It isn't in the table yet. allocate and initialize it */ ! structPtr = ShmemAllocNoError(size); if (structPtr == NULL) { /* out of memory; remove the failed ShmemIndex entry */ diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c index 7cdb355..4064b20 100644 *** a/src/backend/storage/lmgr/predicate.c --- b/src/backend/storage/lmgr/predicate.c ***
Re: [HACKERS] some requests on auditing
2016-08-31 16:00 GMT+02:00 David Steele: > On 8/31/16 9:39 AM, David Steele wrote: > >> On 8/30/16 10:12 AM, Pavel Stehule wrote: >> > > #3 is not likely without changes to logging in Postgres. However, there >> are plenty of tools for log analysis (e.g. ELK) that might help and a >> Postgres extension that allows log messages to be directed elsewhere >> (can't remember the name but Gabrielle or Simon would know). >> > > Here's the extension I was thinking of: > > https://github.com/2ndquadrant-it/redislog > > This one is more general purpose: > > https://github.com/mpihlak/pg_logforward > many thanks you for these informations - I'll check it. Regards Pavel > -- > -David > da...@pgmasters.net >
[HACKERS] Optimizing aggregates
I've been profiling simple aggregate queries, looking for any low-hanging fruit. For this query: -- setup create table floats as select g::float8 as a, g::float8 as b, g::float8 as c from generate_series(1, 1000) g; vacuum freeze floats; -- query select sum(a), sum(b+c) from floats; perf report says: # Children Self Command Shared Object Symbol # .. . # 25.70% 0.00% postmaster [unknown] [k] 14.23%13.75% postmaster postgres [.] ExecProject 11.18%10.57% postmaster postgres [.] slot_deform_tuple 9.58% 9.04% postmaster postgres [.] advance_aggregates 8.96% 0.00% postmaster [unknown] [.] 0x000298d4 8.77% 8.42% postmaster postgres [.] ExecMakeFunctionResultNoSets 7.78% 0.00% postmaster [unknown] [.] 0x01d38260 6.63% 6.15% postmaster postgres [.] advance_transition_function 6.61% 0.00% postmaster [unknown] [.] 0x01e99e40 6.47% 0.00% postmaster libc-2.23.so [.] __GI___libc_read 6.24% 5.88% postmaster postgres [.] heap_getnext 4.62% 4.62% postmaster [kernel.kallsyms] [k] copy_user_enhanced_fast_string 3.91% 3.82% postmaster postgres [.] slot_getsomeattrs 3.29% 3.18% postmaster postgres [.] slot_getattr 3.06% 3.00% postmaster postgres [.] ExecClearTuple 2.59% 0.00% postmaster [unknown] [.] 0x01e9a370 2.57% 2.45% postmaster postgres [.] ExecScan 2.56% 2.37% postmaster postgres [.] float8pl 2.54% 2.43% postmaster postgres [.] heapgetpage 2.25% 2.17% postmaster postgres [.] ExecAgg 2.10% 1.96% postmaster postgres [.] ExecStoreTuple 2.00% 1.91% postmaster postgres [.] ExecProcNode ExecProject stands out. I find that pretty surprising. We're using ExecProject to extract the arguments from the input tuples, to pass to the aggregate transition functions. It looks like that's a pretty expensive way of doing it, for a typical aggregate that takes only one argument. We actually used to call ExecEvalExpr() directly for each argument, but that was changed by the patch that added support for ordered set aggregates. It looks like that was a bad idea, from a performance point of view. I propose that we go back to calling ExecEvalExpr() directly, for non-ordered aggregates, per the attached patch. That makes that example query about 10% faster on my laptop, which is in line with the fact that ExecProject() accounted for about 13% of the CPU time. Another idea is that maybe we should add a fast-path to ExecProject(), for these trivial cases. - Heikki From 106c5742fde2ec83576323db74a7249d7d85101f Mon Sep 17 00:00:00 2001 From: Heikki LinnakangasDate: Wed, 31 Aug 2016 17:27:52 +0300 Subject: [PATCH] Skip ExecProject for non-ordered aggregates. When support for ordered aggregates was added, we started using ExecProject to compute the arguments for the transient function. However, it turns out that the old way of just calling ExecEvalExpr() directly for each argument is somewhat faster. At least for typical aggregates that only take one or two arguments. So go back to using ExecEvalExpr() for non-ordered aggregates. diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index ce2fc28..e32b140 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -229,9 +229,8 @@ typedef struct AggStatePerTransData /* Oid of state value's datatype */ Oid aggtranstype; - /* ExprStates of the FILTER and argument expressions. */ + /* ExprStates of the FILTER and direct-argument expressions. */ ExprState *aggfilter; /* state of FILTER expression, if any */ - List *args; /* states of aggregated-argument expressions */ List *aggdirectargs; /* states of direct-argument expressions */ /* @@ -288,17 +287,21 @@ typedef struct AggStatePerTransData transtypeByVal; /* - * Stuff for evaluation of inputs. We used to just use ExecEvalExpr, but - * with the addition of ORDER BY we now need at least a slot for passing - * data to the sort object, which requires a tupledesc, so we might as - * well go whole hog and use ExecProject too. + * Stuff for evaluation of inputs. + * + * For non-ordered aggregates, we call ExecEvalExpr for each argument, + * represented by the expression trees in transInputs. For ordered + * aggregates, we need at least a slot for passing data to the sort + * object, which requires a tupledesc, so we might as well go whole hog + * and use ExecProject to evaluate the arguments, too. */ +
Re: [HACKERS] pg_sequence catalog
On 2016-08-31 11:23:27 -0400, Tom Lane wrote: > Another issue is what is the low-level interlock between nextvals > in different processes. Right now it's the buffer lock on the > sequence's page. With a scheme like this, if we just kept doing > that, we'd have a single lock covering probably O(100) different > sequences which might lead to contention problems. We could probably > improve on that with some thought. I was thinking of forcing the rows to be spread to exactly one page per sequence... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing aggregates
Hi, On 2016-08-31 17:47:18 +0300, Heikki Linnakangas wrote: > # .. . > > # > 25.70% 0.00% postmaster [unknown] [k] > 14.23%13.75% postmaster postgres [.] ExecProject > ExecProject stands out. I find that pretty surprising. > > We're using ExecProject to extract the arguments from the input tuples, to > pass to the aggregate transition functions. It looks like that's a pretty > expensive way of doing it, for a typical aggregate that takes only one > argument. > > We actually used to call ExecEvalExpr() directly for each argument, but that > was changed by the patch that added support for ordered set aggregates. It > looks like that was a bad idea, from a performance point of view. I complained about that as well http://archives.postgresql.org/message-id/20160519175727.ymv2y5tye4qgcmqx%40alap3.anarazel.de > I propose that we go back to calling ExecEvalExpr() directly, for > non-ordered aggregates, per the attached patch. That makes that example > query about 10% faster on my laptop, which is in line with the fact that > ExecProject() accounted for about 13% of the CPU time. My approach is a bit different. I've first combined the projection for all the aggregates, ordered set, or not, into one projetion. That got rid of a fair amount of overhead when you have multiple aggregates. I attached an, probably out of date, WIP version of that patch. Secondly, I'm working on overhauling expression evaluation to be faster. Even without the ExecProject overhead, the computations very quickly become the bottleneck. During that I pretty much merged ExecProject and ExecEvalExpr into one - they're really not that different, and the distinction serves no purpose, except to increase the number of function calls. The reason I'm working on getting rid of targetlist SRFs is precisely that. A proof of concept of that is attached to http://archives.postgresql.org/message-id/20160714011850.bd5zhu35szle3n3c%40alap3.anarazel.de Greetings, Andres Freund >From 384845bea72d28952d88e58e55f81aaa5addd930 Mon Sep 17 00:00:00 2001 From: Andres FreundDate: Tue, 12 Jul 2016 01:01:28 -0700 Subject: [PATCH] WIP: Only perform one projection in aggregation. --- src/backend/executor/nodeAgg.c | 112 - 1 file changed, 88 insertions(+), 24 deletions(-) diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index f655aec..4499d5f 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -210,6 +210,9 @@ typedef struct AggStatePerTransData */ int numInputs; + /* offset of input columns in Aggstate->evalslot */ + int inputoff; + /* * Number of aggregated input columns to pass to the transfn. This * includes the ORDER BY columns for ordered-set aggs, but not for plain @@ -836,14 +839,20 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup) int setno = 0; int numGroupingSets = Max(aggstate->phase->numsets, 1); int numTrans = aggstate->numtrans; + TupleTableSlot *slot = aggstate->evalslot; + AggStatePerTrans pertrans; - for (transno = 0; transno < numTrans; transno++) + /* compute input for all aggregates */ + if (aggstate->evalproj) + ExecProjectIntoSlot(aggstate->evalproj, aggstate->evalslot); + + for (transno = 0, pertrans = aggstate->pertrans; transno < numTrans; + transno++, pertrans++) { - AggStatePerTrans pertrans = >pertrans[transno]; ExprState *filter = pertrans->aggfilter; int numTransInputs = pertrans->numTransInputs; int i; - TupleTableSlot *slot; + int inputoff = pertrans->inputoff; /* Skip anything FILTERed out */ if (filter) @@ -857,13 +866,10 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup) continue; } - /* Evaluate the current input expressions for this aggregate */ - slot = ExecProject(pertrans->evalproj, NULL); - if (pertrans->numSortCols > 0) { /* DISTINCT and/or ORDER BY case */ - Assert(slot->tts_nvalid == pertrans->numInputs); + Assert(slot->tts_nvalid >= pertrans->numInputs); /* * If the transfn is strict, we want to check for nullity before @@ -876,7 +882,7 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup) { for (i = 0; i < numTransInputs; i++) { - if (slot->tts_isnull[i]) + if (slot->tts_isnull[i + inputoff]) break; } if (i < numTransInputs) @@ -888,10 +894,22 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup) /* OK, put the tuple into the tuplesort object */ if (pertrans->numInputs == 1) tuplesort_putdatum(pertrans->sortstates[setno], - slot->tts_values[0], - slot->tts_isnull[0]); + slot->tts_values[inputoff], + slot->tts_isnull[inputoff]); else -
Re: [HACKERS] pg_sequence catalog
On 31/08/16 16:10, Tom Lane wrote: I wrote: Personally, my big beef with the current approach to sequences is that we eat a whole relation (including a whole relfilenode) per sequence. I wish that we could reduce a sequence to just a single row in a catalog, including the nontransactional state. Not sure how feasible that is either, but accomplishing it would move the benefits of making a change out of the "debatable whether it's worth it" category, IMO. BTW, another thing to keep in mind here is the ideas that have been kicked around in the past about alternative sequence implementations managed through a "sequence AM API". I dunno whether now is the time to start creating that API abstraction, but let's at least consider it if we're whacking the catalog representation around. FWIW if I was going to continue with the sequence AM API, the next patch would have included split of sequence metadata and sequence state into separate catalogs, so from that point this actually seems like an improvement (I didn't look at the code though). As a side note, I don't plan to resurrect the seqam patch at least until we have reasonable built-in logical replication functionality. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On 31/08/16 16:11, Craig Ringer wrote: On 31 August 2016 at 21:46, Greg Starkwrote: On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut wrote: - A API interface to open a "connection" to a background worker, run queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(), AutonomousSessionExecute(), etc. The communication happens using the client/server protocol. Peter, you mention "Oracle-style autonomous transaction blocks". What are the semantics to be expected of those with regards to: - Accessing objects exclusively locked by the outer xact or where the requested lockmode conflicts with a lock held by the outer xact - Visibility of data written by the outer xact That would be my question as well. Also, is it intended (outside the plpgsql interface) that the autonomous xact can proceed concurrently/interleaved with a local backend xact? i.e. the local backend xact isn't suspended and you're allowed to do things on the local backend as well? If so, what handling do you have in mind for deadlocks between the local backend xact and the bgworker with the autonomous xact? I'd expect the local backend to always win, killing the autonomous xact every time. I would expect that in PLs it's handled by them, if you misuse this on C level that's your problem? I'm surprised by the background worker. I had envisioned autonomous transactions being implemented by allowing a process to reserve a second entry in PGPROC with the same pid. Or perhaps save its existing information in a separate pgproc slot (or stack of slots) and restoring it after the autonomous transaction commits. I suspect that there'll be way too much code that relies on stashing xact-scoped stuff in globals for that to be viable. Caches alone. Peter will be able to explain more, I'm sure. I can also see some advantages in bgworker approach. For example it could be used for "fire and forget" type of interface in the future, where you return as soon as you send exec and don't care about waiting for result. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make async slave to wait for lsn to be replayed
On 31 August 2016 at 22:16, Ivan Kartyshovwrote: > Our clients who deal with 9.5 and use asynchronous master-slave replication, > asked to make the wait-mechanism on the slave side to prevent the situation > when slave handles query which needs data (LSN) that was received, flushed, > but still not replayed. I like the broad idea - I've wanted something like it for a while. BDR has pg_xlog_wait_remote_receive() and pg_xlog_wait_remote_apply() for use in tests for this reason, but they act on the *upstream* side, waiting until the downstream has acked the data. Not as useful for ensuring that apps connected to both master and one or more replicas get a consistent view of data. How do you get the commit LSN to watch for? Grab pg_current_xlog_insert_location() just after the commit and figure that replaying to that point guarantees you get the commit? Some time ago[1] I raised the idea of reporting commit LSN on the wire to clients. That didn't go anywhere due to compatibility and security concerns. I think those were resolvable, but it wasn't enough of a priority to push hard on at the time. A truly "right" solution has to wait for a protocol bump, but I think good-enough solutions are possible now. So you might want to read that thread. It also mentions hesitations about exposing LSN to clients even more. I think we're *way* past that now - we have replication origins and replication slots relying on it, it's exposed in a pg_lsn datatype, a bunch of views expose it, etc. But it might be reasonable to ask "should the client instead be expected to wait for the confirmed commit of a 64-bit epoch-extended xid, like that returned by txid_current()?" . One advantage of using xid is that you can get it while you're still in the xact, so there's no race between commit and checking the lsn after commit. Are you specifically trying to ensure "this commit has replayed on the replica before we run queries on it" ? Or something else? (Also, on a side note, Kevin mentioned that it may be possible to use SSI data to achieve SERIALIZABLE read-only queries on replicas, where they get the same protection from commit-order related anomalies as queries on the master. You might want to look more deeply into that too at some stage, if you're trying to ensure the app can do read only queries on the master and expect fully consistent results). [1] https://www.postgresql.org/message-id/flat/53E41EC1.5050603%402ndquadrant.com#53e41ec1.5050...@2ndquadrant.com -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] C++ port of Postgres
[trimmed cc list because of big attachments] On 8/16/16 4:22 PM, Jim Nasby wrote: > Joy, do you have an idea what a *minimally invasive* patch for C++ > support would look like? That's certainly the first step here. I developed a minimally invasive patch for C++ support a few years ago shortly after I wrote that blog post. Since there appears to have been some interest here now, I have updated that and split it up into logical chunks. So here you go. To build this, you need to configure with g++ <= version 5. (4.x works, too.) g++ version 6 does not work yet because of the issues described in patch 0013. Then you also need to edit src/Makefile.custom and set COPT = -fpermissive -Wno-sign-compare -Wno-write-strings The -W options are optional just to reduce some noise. Cleaning up those warnings can be a separate project that might also have some benefit under C. The -fpermissive option is a g++ specific option that reduces some errors to warnings. (So this won't work with clang or other compilers at all at this point.) In particular, C++ does not allow casting from or to void pointers without a cast, but -fpermissive allows that. The step from this to "real" C++ would be adding a bunch of casts around things like malloc and palloc and other places. That would be mostly busy work, so I have excluded that here. The patches are numbered approximately in increasing order of dubiosity. So 0001 is probably a straight bug fix, 0002 and 0003 are arguably minor bug fixes as well. The patches through 0012 can probably be considered for committing in some form. After that it gets a bit hackish. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 86dbd4a5a0ab3212cd340e1fa56f03e864aa8e1a Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Tue, 30 Aug 2016 12:00:00 -0400 Subject: [PATCH 01/27] Fix use of offsetof() Using offsetof() with a run-time computed argument is not allowed in either C or C++. Apparently, gcc allows it, but g++ doesn't. --- contrib/bloom/blutils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c index debf4f4..b68a0d1 100644 --- a/contrib/bloom/blutils.c +++ b/contrib/bloom/blutils.c @@ -75,7 +75,7 @@ _PG_init(void) bl_relopt_tab[i + 1].optname = MemoryContextStrdup(TopMemoryContext, buf); bl_relopt_tab[i + 1].opttype = RELOPT_TYPE_INT; - bl_relopt_tab[i + 1].offset = offsetof(BloomOptions, bitSize[i]); + bl_relopt_tab[i + 1].offset = offsetof(BloomOptions, bitSize[0]) + sizeof(int) * i; } } -- 2.9.3 From d3c7c1fbb346fd5c040a7a379d971db7b5129581 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 30 Aug 2016 12:00:00 -0400 Subject: [PATCH 02/27] Use return instead of exit() in configure Using exit() requires stdlib.h, which is not included. Use return instead. Also add return type for main(). --- config/c-compiler.m4 | 4 +++- config/c-library.m4 | 4 +++- configure| 12 +--- 3 files changed, 15 insertions(+), 5 deletions(-) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index a7f6773..7d901e1 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -71,8 +71,10 @@ int does_int64_work() return 0; return 1; } + +int main() { - exit(! does_int64_work()); + return (! does_int64_work()); }])], [Ac_cachevar=yes], [Ac_cachevar=no], diff --git a/config/c-library.m4 b/config/c-library.m4 index 50d068d..56658b5 100644 --- a/config/c-library.m4 +++ b/config/c-library.m4 @@ -204,8 +204,10 @@ int does_int64_snprintf_work() return 0; /* either multiply or snprintf is busted */ return 1; } + +int main() { - exit(! does_int64_snprintf_work()); + return (! does_int64_snprintf_work()); }]])], [pgac_cv_snprintf_long_long_int_modifier=$pgac_modifier; break], [], diff --git a/configure b/configure index 45c8eef..36d9a54 100755 --- a/configure +++ b/configure @@ -13563,8 +13563,10 @@ int does_int64_work() return 0; return 1; } + +int main() { - exit(! does_int64_work()); + return (! does_int64_work()); } _ACEOF if ac_fn_c_try_run "$LINENO"; then : @@ -13645,8 +13647,10 @@ int does_int64_work() return 0; return 1; } + +int main() { - exit(! does_int64_work()); + return (! does_int64_work()); } _ACEOF if ac_fn_c_try_run "$LINENO"; then : @@ -13739,8 +13743,10 @@ int does_int64_snprintf_work() return 0; /* either multiply or snprintf is busted */ return 1; } + +int main() { - exit(! does_int64_snprintf_work()); + return (! does_int64_snprintf_work()); } _ACEOF if ac_fn_c_try_run "$LINENO"; then : -- 2.9.3 From 54ed07be5a29d4955a9485316e68da5c6896797b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 30 Aug 2016 12:00:00 -0400 Subject: [PATCH 03/27] Add missing include files to configure tests atoi() needs
Re: [HACKERS] Optimizing aggregates
On 08/31/2016 06:51 PM, Andres Freund wrote: On 2016-08-31 17:47:18 +0300, Heikki Linnakangas wrote: We actually used to call ExecEvalExpr() directly for each argument, but that was changed by the patch that added support for ordered set aggregates. It looks like that was a bad idea, from a performance point of view. I complained about that as well http://archives.postgresql.org/message-id/20160519175727.ymv2y5tye4qgcmqx%40alap3.anarazel.de Ah, missed that! I propose that we go back to calling ExecEvalExpr() directly, for non-ordered aggregates, per the attached patch. That makes that example query about 10% faster on my laptop, which is in line with the fact that ExecProject() accounted for about 13% of the CPU time. My approach is a bit different. I've first combined the projection for all the aggregates, ordered set, or not, into one projetion. That got rid of a fair amount of overhead when you have multiple aggregates. I attached an, probably out of date, WIP version of that patch. A-ha, I also considered doing just that! I also considered a variant where we call ExecProject once for all non-ordered aggregates, and a separate ExecProject() for each ordered one. But just switching back to straight ExecEvalExprs seemed easier. Secondly, I'm working on overhauling expression evaluation to be faster. Even without the ExecProject overhead, the computations very quickly become the bottleneck. During that I pretty much merged ExecProject and ExecEvalExpr into one - they're really not that different, and the distinction serves no purpose, except to increase the number of function calls. The reason I'm working on getting rid of targetlist SRFs is precisely that. A proof of concept of that is attached to http://archives.postgresql.org/message-id/20160714011850.bd5zhu35szle3n3c%40alap3.anarazel.de Cool, yes, all that should help. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
Hi, On 2016-08-31 12:53:30 -0400, Tom Lane wrote: > Improving on the space wastage is exactly the point IMO. If it's still > going to be 8k per sequence on disk (*and* in shared buffers, remember), > I'm not sure it's worth all the work to change things at all. A separate file is a heck lot more heavyweight than another 8 kb in an existing file. > Another idea would be to have nominally per-sequence LWLocks (or > spinlocks?) controlling nextval's nontransactional accesses to the catalog > rows, but to map those down to some fixed number of locks in a way similar > to the current fallback implementation for spinlocks, which maps them onto > a fixed number of semaphores. You'd trade off shared memory against > contention while choosing the underlying number of locks. If we could rely on spinlocks to actually be spinlocks, we could just put the spinlock besides the actual state data... Not entirely pretty, but probably pretty simple. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
Andres Freund wrote: > On 2016-08-31 11:23:27 -0400, Tom Lane wrote: > > Another issue is what is the low-level interlock between nextvals > > in different processes. Right now it's the buffer lock on the > > sequence's page. With a scheme like this, if we just kept doing > > that, we'd have a single lock covering probably O(100) different > > sequences which might lead to contention problems. We could probably > > improve on that with some thought. > > I was thinking of forcing the rows to be spread to exactly one page per > sequence... I was thinking that nextval could grab a shared buffer lock and release immediately, just to ensure no one holds exclusive buffer lock concurrently (which would be used for things like dropping one seq tuple from the page, when a sequence is dropped); then control access to each sequence tuple using LockDatabaseObject. This is a HW lock, heavier than a buffer's LWLock, but it seems better than wasting a full 8kb for each sequence. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing aggregates
On 2016-08-31 19:07:00 +0300, Heikki Linnakangas wrote: > On 08/31/2016 06:51 PM, Andres Freund wrote: > > I've first combined the projection for all the aggregates, ordered set, > > or not, into one projetion. That got rid of a fair amount of overhead > > when you have multiple aggregates. I attached an, probably out of date, > > WIP version of that patch. > > A-ha, I also considered doing just that! I also considered a variant where > we call ExecProject once for all non-ordered aggregates, and a separate > ExecProject() for each ordered one. But just switching back to straight > ExecEvalExprs seemed easier. The issue is that might, I think, end up iteratively deforming the underlying tuple. The projection machinery takes care of that, if we do it in one go. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal for changes to recovery.conf API
This is a summary of proposed changes to the recovery.conf API for v10. These are based in part on earlier discussions, and represent a minimal modification to current usage. Proposed changes (with reference to patches from Abhijit Menon-Sen and myself) * pg_ctl start -M (normal|recover|continue) pg_ctl can now start the server directly as a standby, similar to the way pg_ctl promote works. The internal implementation is also similar, with pg_ctl writing a "recovery.trigger" file that initiates recovery in the same way recovery.conf used to do. It is still possible to manually add a file called "recovery.trigger" and have that work if users desire that mechanism. Different startup methods can be selected with the -M option. Normal mode starts the server for read/write, overriding any previous use in Recover mode. Recover mode starts the server as a standby server which expects to receive changes from a primary/master server using physical streaming replication or is used for performing a recovery from backup. Continue mode is the default and will startup the server in whatever mode it was in at last proper shutdown, or as modified by any trigger files present. (Patch: recovery_startup_r10_api.v1b.patch) * $DATADIR/recovery.conf no longer triggers recovery (Patch: recovery_startup_r10_api.v1b.patch) * Recovery parameters would now be part of the main postgresql.conf infrastructure Any parameters set in $DATADIR/recovery.conf will be read after the main parameter file, similar to the way that postgresql.conf.auto is read. (Abhijit) * pg_basebackup -R will continue to generate a parameter file called recovery.conf as it does now, but will also create a file named recovery.trigger. (This part is WIP; patch doesn't include implementation for tar format yet) * Parameters All of the parameters formerly set in recovery.conf can be set in postgresql.conf using RELOAD These parameters will have no defaults in postgresql.conf.sample Setting them has no effect during normal running, or once recovery ends. https://www.postgresql.org/docs/devel/static/archive-recovery-settings.html https://www.postgresql.org/docs/devel/static/recovery-target-settings.html https://www.postgresql.org/docs/devel/static/standby-settings.html (Abhijit) Related cleanup * Promotion signal file is now called "promote.trigger" rather than just "promote" * Remove user configurable "trigger_file" mechanism - use "promote.trigger" for all cases * Remove Fallback promote mechanism, so all promotion is now "fast" in xlog.c * Rename CheckForStandbyTrigger() to CheckForPromoteTrigger() for clarity (Patch: recovery_startup_r10_api.v1b.patch) -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services recovery_startup_r10_api.v1b.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: psql \setfileref
2016-08-31 18:24 GMT+02:00 Corey Huinker: > On Wed, Aug 31, 2016 at 11:32 AM, Pavel Stehule > wrote: > >> Hi >> >> I propose a new type of psql variables - file references. The content of >> file reference is specified by referenced file. It allows simple inserting >> large data without necessity of manual escaping or using LO api. >> >> When I wrote the patch, I used parametrized queries for these data >> instead escaped strings - the code is not much bigger, and the error >> messages are much more friendly if query is not bloated by bigger content. >> The text mode is used only - when escaping is not required, then content is >> implicitly transformed to bytea. By default the content of file is bytea. >> When use requires escaping, then he enforces text escaping - because it has >> sense only for text type. >> >> postgres=# \setfileref a ~/test2.xml >> postgres=# \setfileref b ~/avatar.gif >> postgres=# insert into test values(convert_from(:a, 'latin2')::xml, :b); >> -- xml is passed as bytea >> postgres=# insert into test values(:'a', :b); -- xml is passed via >> unknown text value >> >> The content of file reference variables is not persistent in memory. >> >> Comments, notes? >> >> Regards >> >> Pavel >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> >> > Clearly jumping ahead on this one, but if the fileref is essentially a > pipe to "cat /path/to/file.name", is there anything stopping us from > setting pipes? > > My interest is primarily in ways that COPY could use this. > I don't see a reason why it should not be possible - the current code can't do it, but with 20 lines more, it should be possible There is one disadvantage against copy - the content should be fully loaded to memory, but any other functionality should be same. Regards Pavel
Re: [HACKERS] pg_sequence catalog
Andres Freundwrites: > On 2016-08-31 12:56:45 -0300, Alvaro Herrera wrote: >> I was thinking that nextval could grab a shared buffer lock and release >> immediately, just to ensure no one holds exclusive buffer lock >> concurrently (which would be used for things like dropping one seq tuple >> from the page, when a sequence is dropped); then control access to each >> sequence tuple using LockDatabaseObject. This is a HW lock, heavier >> than a buffer's LWLock, but it seems better than wasting a full 8kb for >> each sequence. > That's going to go be a *lot* slower, I don't think that's ok. I've a > hard time worrying about the space waste here; especially considering > where we're coming from. Improving on the space wastage is exactly the point IMO. If it's still going to be 8k per sequence on disk (*and* in shared buffers, remember), I'm not sure it's worth all the work to change things at all. We're already dealing with taking a heavyweight lock for each sequence (the relation AccessShareLock). I wonder whether it'd be possible to repurpose that effort somehow. Another idea would be to have nominally per-sequence LWLocks (or spinlocks?) controlling nextval's nontransactional accesses to the catalog rows, but to map those down to some fixed number of locks in a way similar to the current fallback implementation for spinlocks, which maps them onto a fixed number of semaphores. You'd trade off shared memory against contention while choosing the underlying number of locks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On Wed, Aug 31, 2016 at 3:11 PM, Craig Ringerwrote: > > I suspect that there'll be way too much code that relies on stashing > xact-scoped stuff in globals for that to be viable. Caches alone. > Peter will be able to explain more, I'm sure. > > We'd probably need a new transaction data object that everything > xact-scoped hangs off, so we can pass it everywhere or swap it out of > some global. The mechanical refactoring alone would be pretty scary, > not to mention the complexity of actually identifying all the less > obvious places that need changing. Well this is the converse of the same problem. Today process state and transaction are tied together. One way or another you're trying to split that -- either by having two processes share state or by having one process manage two transactions. I suppose we already have the infrastructure for parallel query so there's at least some shared problem space there. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Wed, Aug 31, 2016 at 1:45 PM, Pavan Deolaseewrote: > We discussed a few ideas to address the "Duplicate Scan" problem. For > example, we can teach Index AMs to discard any duplicate (key, CTID) insert > requests. Or we could guarantee uniqueness by either only allowing updates > in one lexical order. While the former is a more complete solution to avoid > duplicate entries, searching through large number of keys for non-unique > indexes could be a drag on performance. The latter approach may not be > sufficient for many workloads. Also tracking increment/decrement for many > indexes will be non-trivial. > > There is another problem with allowing many index entries pointing to the > same WARM chain. It will be non-trivial to know how many index entries are > currently pointing to the WARM chain and index/heap vacuum will throw up > more challenges. > > Instead, what I would like to propose and the patch currently implements > is to restrict WARM update to once per chain. So the first non-HOT update > to a tuple or a HOT chain can be a WARM update. The chain can further be > HOT updated any number of times. But it can no further be WARM updated. > This might look too restrictive, but it can still bring down the number of > regular updates by almost 50%. Further, if we devise a strategy to convert > a WARM chain back to HOT chain, it can again be WARM updated. (This part is > currently not implemented). A good side effect of this simple strategy is > that we know there can maximum two index entries pointing to any given WARM > chain. > We should probably think about coordinating with my btree patch. >From the description above, the strategy is quite readily "upgradable" to one in which the indexam discards duplicate (key,ctid) pairs and that would remove the limitation of only one WARM update... right?
Re: [HACKERS] pg_sequence catalog
On 2016-08-31 12:56:45 -0300, Alvaro Herrera wrote: > I was thinking that nextval could grab a shared buffer lock and release > immediately, just to ensure no one holds exclusive buffer lock > concurrently (which would be used for things like dropping one seq tuple > from the page, when a sequence is dropped); then control access to each > sequence tuple using LockDatabaseObject. This is a HW lock, heavier > than a buffer's LWLock, but it seems better than wasting a full 8kb for > each sequence. That's going to go be a *lot* slower, I don't think that's ok. I've a hard time worrying about the space waste here; especially considering where we're coming from. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: psql \setfileref
On Wed, Aug 31, 2016 at 11:32 AM, Pavel Stehulewrote: > Hi > > I propose a new type of psql variables - file references. The content of > file reference is specified by referenced file. It allows simple inserting > large data without necessity of manual escaping or using LO api. > > When I wrote the patch, I used parametrized queries for these data instead > escaped strings - the code is not much bigger, and the error messages are > much more friendly if query is not bloated by bigger content. The text mode > is used only - when escaping is not required, then content is implicitly > transformed to bytea. By default the content of file is bytea. When use > requires escaping, then he enforces text escaping - because it has sense > only for text type. > > postgres=# \setfileref a ~/test2.xml > postgres=# \setfileref b ~/avatar.gif > postgres=# insert into test values(convert_from(:a, 'latin2')::xml, :b); > -- xml is passed as bytea > postgres=# insert into test values(:'a', :b); -- xml is passed via unknown > text value > > The content of file reference variables is not persistent in memory. > > Comments, notes? > > Regards > > Pavel > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > Clearly jumping ahead on this one, but if the fileref is essentially a pipe to "cat /path/to/file.name", is there anything stopping us from setting pipes? My interest is primarily in ways that COPY could use this.
Re: [HACKERS] autonomous transactions
On 31 August 2016 at 14:09, Joel Jacobsonwrote: > On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanova > wrote: >> >> On 30 August 2016 at 23:10, Joel Jacobson wrote: >> > >> > There should be a way to within the session and/or txn permanently >> > block autonomous transactions. >> > >> >> This will defeat one of the use cases of autonomous transactions: auditing > > My idea on how to deal with this would be to mark the function to be > "AUTONOMOUS" similar to how a function is marked to be "PARALLEL > SAFE", > and to throw an error if a caller that has blocked autonomous > transactions tries to call a function that is marked to be autonomous. > > That way none of the code that needs to be audited would ever get executed. Not sure I see why you would want to turn off execution for only some functions. What happens if your function calls some other function with side-effects? How would you roll that back? How would you mark functions for the general case? Functions with side effects can't be tested with simple unit tests; that has nothing to do with autonomous transactions. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers