Re: [HACKERS] autonomous transactions

2016-08-31 Thread Joel Jacobson
On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanova
 wrote:
>
> On 30 August 2016 at 23:10, Joel Jacobson  wrote:
> >
> > There should be a way to within the session and/or txn permanently
> > block autonomous transactions.
> >
>
> This will defeat one of the use cases of autonomous transactions: auditing

My idea on how to deal with this would be to mark the function to be
"AUTONOMOUS" similar to how a function is marked to be "PARALLEL
SAFE",
and to throw an error if a caller that has blocked autonomous
transactions tries to call a function that is marked to be autonomous.

That way none of the code that needs to be audited would ever get executed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_sequence catalog

2016-08-31 Thread Peter Eisentraut
While I was hacking around sequence stuff, I felt the urge to look into
an old peeve: That sequence metadata is not stored in a proper catalog.
Right now in order to find out some metadata about sequences (min, max,
increment, etc.), you need to look into the sequence.  That is like
having to query a table in order to find out about its schema.

There are also known issues with the current storage such as that we
can't safely update the sequence name stored in the sequence when we
rename, so we just don't.

This patch introduces a new catalog pg_sequence that stores the sequence
metadata.  The sequences themselves now only store the counter and
supporting information.

I don't know if this is a net improvement.  Maybe this introduces as
many new issues as it removes.  But I figured I'll post this, so that at
least we can discuss it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 1ce7610..cbf0d79 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -41,7 +41,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
 	pg_foreign_table.h pg_policy.h pg_replication_origin.h \
 	pg_default_acl.h pg_init_privs.h pg_seclabel.h pg_shseclabel.h \
-	pg_collation.h pg_range.h pg_transform.h \
+	pg_collation.h pg_range.h pg_transform.h pg_sequence.h \
 	toasting.h indexing.h \
 )
 
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 04d7840..8e1e1ac 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -66,6 +66,7 @@
 #include "commands/proclang.h"
 #include "commands/schemacmds.h"
 #include "commands/seclabel.h"
+#include "commands/sequence.h"
 #include "commands/trigger.h"
 #include "commands/typecmds.h"
 #include "nodes/nodeFuncs.h"
@@ -1155,6 +1156,8 @@ doDeletion(const ObjectAddress *object, int flags)
 	else
 		heap_drop_with_catalog(object->objectId);
 }
+if (relKind == RELKIND_SEQUENCE)
+	DeleteSequenceTuple(object->objectId);
 break;
 			}
 
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 00550eb..182d2d0 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -1535,15 +1535,16 @@ CREATE VIEW sequences AS
CAST(64 AS cardinal_number) AS numeric_precision,
CAST(2 AS cardinal_number) AS numeric_precision_radix,
CAST(0 AS cardinal_number) AS numeric_scale,
-   CAST(p.start_value AS character_data) AS start_value,
-   CAST(p.minimum_value AS character_data) AS minimum_value,
-   CAST(p.maximum_value AS character_data) AS maximum_value,
-   CAST(p.increment AS character_data) AS increment,
-   CAST(CASE WHEN p.cycle_option THEN 'YES' ELSE 'NO' END AS yes_or_no) AS cycle_option
-FROM pg_namespace nc, pg_class c, LATERAL pg_sequence_parameters(c.oid) p
+   CAST(s.seqstart AS character_data) AS start_value,
+   CAST(s.seqmin AS character_data) AS minimum_value,
+   CAST(s.seqmax AS character_data) AS maximum_value,
+   CAST(s.seqincrement AS character_data) AS increment,
+   CAST(CASE WHEN s.seqcycle THEN 'YES' ELSE 'NO' END AS yes_or_no) AS cycle_option
+FROM pg_namespace nc, pg_class c, pg_sequence s
 WHERE c.relnamespace = nc.oid
   AND c.relkind = 'S'
   AND (NOT pg_is_other_temp_schema(nc.oid))
+  AND c.oid = s.seqrelid
   AND (pg_has_role(c.relowner, 'USAGE')
OR has_sequence_privilege(c.oid, 'SELECT, UPDATE, USAGE') );
 
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index c98f981..95bd172 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -22,8 +22,10 @@
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
 #include "catalog/dependency.h"
+#include "catalog/indexing.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/pg_sequence.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
 #include "commands/sequence.h"
@@ -74,7 +76,7 @@ typedef struct SeqTableData
 	int64		cached;			/* last value already cached for nextval */
 	/* if last != cached, we have not used up all the cached values */
 	int64		increment;		/* copy of sequence's increment field */
-	/* note that increment is zero until we first do read_seq_tuple() */
+	/* note that increment is zero until we first do nextval_internal() */
 } SeqTableData;
 
 typedef SeqTableData *SeqTable;
@@ -92,10 +94,11 @@ static int64 nextval_internal(Oid relid);
 static Relation open_share_lock(SeqTable seq);
 static void create_seq_hashtable(void);
 static void init_sequence(Oid relid, SeqTable *p_elm, 

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_receivexlog --synchronous

2016-08-31 Thread Simon Riggs
On 29 August 2016 at 12:34, Tom Lane  wrote:
> Simon Riggs  writes:
>> Fix pg_receivexlog --synchronous
>
> The buildfarm says you broke the 9.5 branch.
>
> In general, pushing inessential patches just a few hours before a wrap
> deadline is a dangerous business.  Pushing them without any testing
> is close to irresponsible.

Sorry about that everybody. Thanks to Alvaro for doing that in my absence.

I pushed to 9.5 because of a misunderstanding that the author was
saying to me they had also tested it for 9.5. It was not knowingly
untested, but responsibility and mistake was mine in not confirming
that with my own eyes before pushing.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autonomous transactions

2016-08-31 Thread Greg Stark
On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut
 wrote:
> - A API interface to open a "connection" to a background worker, run
> queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(),
> AutonomousSessionExecute(), etc.  The communication happens using the
> client/server protocol.
>


I'm surprised by the background worker. I had envisioned autonomous
transactions being implemented by allowing a process to reserve a
second entry in PGPROC with the same pid. Or perhaps save its existing
information in a separate pgproc slot (or stack of slots) and
restoring it after the autonomous transaction commits.

Using a background worker mean that the autonomous transaction can't
access any state from the process memory. Parameters in plpgsql are a
symptom of this but I suspect there will be others. What happens if a
statement timeout occurs during an autonomous transaction? What
happens if you use a pl language in the autonomous transaction and if
it tries to use non-transactional information such as prepared
statements?


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_sequence catalog

2016-08-31 Thread Tom Lane
I wrote:
> Personally, my big beef with the current approach to sequences is that
> we eat a whole relation (including a whole relfilenode) per sequence.
> I wish that we could reduce a sequence to just a single row in a
> catalog, including the nontransactional state.  Not sure how feasible
> that is either, but accomplishing it would move the benefits of making
> a change out of the "debatable whether it's worth it" category, IMO.

BTW, another thing to keep in mind here is the ideas that have been
kicked around in the past about alternative sequence implementations
managed through a "sequence AM API".  I dunno whether now is the time
to start creating that API abstraction, but let's at least consider
it if we're whacking the catalog representation around.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_sequence catalog

2016-08-31 Thread Tom Lane
Craig Ringer  writes:
> On 31 August 2016 at 22:01, Tom Lane  wrote:
>> Personally, my big beef with the current approach to sequences is that
>> we eat a whole relation (including a whole relfilenode) per sequence.
>> I wish that we could reduce a sequence to just a single row in a
>> catalog, including the nontransactional state.

> It sounds like you're thinking of something like a normal(ish) heap
> tuple where we just overwrite some fields in-place without fiddling
> xmin/xmax and making a new row version. Right? Like we currently
> overwrite the lone Form_pg_sequence  on the 1-page sequence relations.

That would be what to do with the nontransactional state.  If I recall
previous discussions correctly, there's a stumbling block if you want
to treat ALTER SEQUENCE changes as transactional --- but maybe that
doesn't make sense anyway.  If we did want to try that, maybe we need
two auxiliary catalogs, one for the transactionally-updatable sequence
fields and one for the nontransactional fields.

> It feels intuitively pretty gross to effectively dirty-read and write
> a few fields of a tuple. But that's what we do all the time with
> xmin/xmax etc, it's not really that different.

True.  I think two rows would work around that, but maybe we don't
have to.

Another issue is what is the low-level interlock between nextvals
in different processes.  Right now it's the buffer lock on the
sequence's page.  With a scheme like this, if we just kept doing
that, we'd have a single lock covering probably O(100) different
sequences which might lead to contention problems.  We could probably
improve on that with some thought.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_sequence catalog

2016-08-31 Thread Craig Ringer
On 31 August 2016 at 22:01, Tom Lane  wrote:

> Personally, my big beef with the current approach to sequences is that
> we eat a whole relation (including a whole relfilenode) per sequence.
> I wish that we could reduce a sequence to just a single row in a
> catalog, including the nontransactional state.

I'd be happy to see incremental improvement in this space as Peter has
suggested, though I certainly see the value of something like seqam
too.

It sounds like you're thinking of something like a normal(ish) heap
tuple where we just overwrite some fields in-place without fiddling
xmin/xmax and making a new row version. Right? Like we currently
overwrite the lone Form_pg_sequence  on the 1-page sequence relations.

I initially thought that TRUNCATE ... RESTART IDENTITY would be
somewhat of a problem with this. We effectively have a temporary
"timeline" fork in the sequence value where it's provisionally
restarted and we start using values from the restarted sequence within
the xact that restarted it. But actually, it'd fit pretty well.
TRUNCATE ... RESTART IDENTITY would write a new row version with a new
xmin, and set xmax on the old sequence row. nextval(...) within the
truncating xact would update the new row's non-transactional fields
when it allocated new sequence chunks. On commit, everyone starts
using the new row due to normal transactional visibility rules. On
rollback everyone ignores it like they would any other dead tuple from
an aborted act and uses the old tuple's nontransactional fields. It
Just Works(TM).

nextval(...) takes AccessShareLock on a sequence relation. TRUNCATE
... RESTART IDENTITY takes AccessExclusiveLock. So we can never have
nextval(...) advancing the "old" timeline in other xacts at the same
time as we consume values on the restarted sequence inside the xact
that did the restarting. We still need the new "timeline" though,
because we have to retain the old value for rollback.

It feels intuitively pretty gross to effectively dirty-read and write
a few fields of a tuple. But that's what we do all the time with
xmin/xmax etc, it's not really that different.

It'd certainly make sequence decoding easier too. A LOT easier.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] less expensive pg_buffercache on big shmem

2016-08-31 Thread Ivan Kartyshov

Hi hackers,

Recently I have finished my work on a patch for pg_buffercache contrib, 
I think it's time to share my results.



Introduction


I want to offer you the implementation that allows to decrease system 
workload by
partially sacrificing (fully snapshot consistency) data consistency. 
Sometimes we do not need full data consistency, for example on 
quantitative rather than qualitative analysis of memory contents, or 
when we want to catch insufficient memory resources or how often 
relation is used.



Problem description
===

Currently, the pg_buffercache v1.1 and prior takes an exclusive lock on 
all shared buffers, which greatly affects system performance.
Usually we use pg_buffercache to find out why DB is working slower than 
expected or examine what occupies the entire memory. So when we run 
pg_buffercache on such system, we make it even slower.



Implementation
==

Vanilla implementation contains loop which collecting statistic from 
whole shared memory acquire, read and release Spinlocks one by one, page 
by page  while holding LWLock.


V1.2 implementation contains flexible loop which can collect shared 
memory statistic using three different methods:
1) with holding LWLock only on one partition of shared memory 
(semiconsistent method)

2) without LWLocks (nonconsistent method),
3) or in vanilia way (consistent method)

The aforementioned allow us to launch pg_buffercache in the three 
different ways.

Each of them have some advantages and some disadvantages:

Consistent:
+ 100% consistency of shared memory snapshot
- Slowdown the system with whole shared memory exclusive lock

Semiconsistent:
+ Faster than consistent method
+ Mostly doesn`t affect on the system load
- Speed of taking that snapshot is low
Nonconsistent:
The fastest
+ Doesn`t noticeably affects on the systems
- <3% lost of snapshot consistency

What works
==

Actually, it work well even on big load, but of course there might be 
things I've

overlooked.
VIEW pg_buffercache_cons
VIEW pg_buffercache_noncons
VIEW pg_buffercache_semicons

Examples from docs in new realization:
SELECT c.relname, count(*) AS buffers FROM pg_buffercache_noncons b 
INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND 
b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = 
current_database())) GROUP BY c.relname ORDER BY 2 DESC LIMIT 10;


SELECT c.relname, count(*) AS buffers FROM pg_buffercache_semicons b 
INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND 
b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = 
current_database())) GROUP BY c.relname ORDER BY 2 DESC LIMIT 10;



Testing the implementation
==

How implementation tested:
1) Start server
2) Make pgbench tps
pgbench -c 250 -s 1000  -T 200 -P1
3) Compare how tps sags under load if:
SELECT count(*) FROM pg_buffercache_cons;
SELECT count(*) FROM pg_buffercache_semicons;
SELECT count(*) FROM pg_buffercache_noncons;

This test was made on server (server parameters)
Model name:Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz
CPU(s):144
Socket(s): 4
Shared_buffers:200GB


Results of testing
==

Our DBA team obtained the following results:
Nonconsistent:
* 10% faster then consistent method
* doesn`t noticeably affects on the systems
	* the maximum loss of accuracy was less then 3%* ( in most situation it 
is permissible accuracy loss )


Semiconsistent:
* 5 time slower then nonconsistent
* made less affects on system compared to consistent

Overall results:
Our clients was pleased with this implementation.
Implementation is made with backward compatibility, as a conclusion old 
pg_buffercache v1.1 queries will work well.
Semiconsistent show results approaching to nonconsistent on SELECTONLY 
queries.


* this values were obtained from our DBA tests.

What can be better
===

It is unclear how to optimize the semiconsistent method to make it 
faster, and reduce temporary effect that appears from time to time.



I will be glad to see your feedback!

---
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile
index 065d3d6..8813c50 100644
--- a/contrib/pg_buffercache/Makefile
+++ b/contrib/pg_buffercache/Makefile
@@ -4,7 +4,7 @@ MODULE_big = pg_buffercache
 OBJS = pg_buffercache_pages.o $(WIN32RES)
 
 EXTENSION = pg_buffercache
-DATA = pg_buffercache--1.1.sql pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql
+DATA = pg_buffercache--1.2.sql pg_buffercache--1.1--1.2.sql pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql
 PGFILEDESC = "pg_buffercache - monitoring of shared buffer cache in real-time"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_buffercache/README 

[HACKERS] proposal: psql \setfileref

2016-08-31 Thread Pavel Stehule
Hi

I propose a new type of psql variables - file references. The content of
file reference is specified by referenced file. It allows simple inserting
large data without necessity of manual escaping or using LO api.

When I wrote the patch, I used parametrized queries for these data instead
escaped strings - the code is not much bigger, and the error messages are
much more friendly if query is not bloated by bigger content. The text mode
is used only - when escaping is not required, then content is implicitly
transformed to bytea. By default the content of file is bytea. When use
requires escaping, then he enforces text escaping - because it has sense
only for text type.

postgres=# \setfileref a ~/test2.xml
postgres=# \setfileref b ~/avatar.gif
postgres=# insert into test values(convert_from(:a, 'latin2')::xml, :b); --
xml is passed as bytea
postgres=# insert into test values(:'a', :b); -- xml is passed via unknown
text value

The content of file reference variables is not persistent in memory.

Comments, notes?

Regards

Pavel
commit 077c71b1f8ae24ccf2f3723e1e4ca5bf05bca0d3
Author: Pavel Stehule 
Date:   Wed Aug 31 17:15:33 2016 +0200

initial

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 4aaf657..3150510 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1355,6 +1355,32 @@ exec_command(const char *cmd,
 		free(envval);
 	}
 
+	/* \setfileref - set variable by reference on file */
+	else if (strcmp(cmd, "setfileref") == 0)
+	{
+		char	   *name = psql_scan_slash_option(scan_state,
+  OT_NORMAL, NULL, false);
+
+		char	   *ref = psql_scan_slash_option(scan_state,
+		 OT_NORMAL, NULL, false);
+
+		success = false;
+
+		if (!name || !ref)
+		{
+			psql_error("\\%s: missing required argument\n", cmd);
+			success = false;
+		}
+		else
+		{
+			if (!SetFileRef(pset.vars, name, ref))
+			{
+psql_error("\\%s: error while setting variable\n", cmd);
+success = false;
+			}
+		}
+	}
+
 	/* \sf -- show a function's source code */
 	else if (strcmp(cmd, "sf") == 0 || strcmp(cmd, "sf+") == 0)
 	{
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 7399950..3c1db17 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -32,7 +32,6 @@ static bool ExecQueryUsingCursor(const char *query, double *elapsed_msec);
 static bool command_no_begin(const char *query);
 static bool is_select_command(const char *query);
 
-
 /*
  * openQueryOutputFile --- attempt to open a query output file
  *
@@ -108,6 +107,123 @@ setQFout(const char *fname)
 	return true;
 }
 
+void
+psql_reset_query_params(void)
+{
+	int			i;
+
+	for (i = 0; i < pset.nparams; i++)
+		if (pset.params[i] != NULL)
+		{
+			PQfreemem(pset.params[i]);
+			pset.params[i] = NULL;
+		}
+
+	pset.nparams = 0;
+}
+
+/*
+ * Load a content of the file_ref related file to query params buffer.
+ * When escaping is requested, then the text content is expected.
+ * Without escaping the bytea content is expected and related bytea
+ * escaping is processed.
+ */
+#define BYTEAOID		17
+#define UNKNOWNOID		0
+
+static char *
+get_file_ref_content(const char *value, bool escape, bool as_ident)
+{
+	PQExpBufferData		buffer;
+	FILE			   *fd = NULL;
+	char			   *fname;
+	char			   *escaped_value;
+	charline[1024];
+	size_tsize;
+
+	fname = pstrdup(value);
+
+	expand_tilde();
+	if (!fname)
+	{
+		psql_error("missing valid path to file\n");
+		return NULL;
+	}
+
+	canonicalize_path(fname);
+
+	fd = fopen(fname, PG_BINARY_R);
+	if (!fd)
+	{
+		psql_error("%s: %s\n", fname, strerror(errno));
+		PQfreemem(fname);
+		return NULL;
+	}
+
+	/* can append another parameter */
+	if (pset.nparams >= MAX_BINARY_PARAMS)
+	{
+		psql_error("too much binary parameters");
+		PQfreemem(fname);
+		return NULL;
+	}
+
+	if (!pset.db)
+	{
+		psql_error("cannot escape without active connection\n");
+		PQfreemem(fname);
+		return NULL;
+	}
+
+	initPQExpBuffer();
+
+	while ((size = fread(line, 1, sizeof(line), fd)) > 0)
+		appendBinaryPQExpBuffer(, line, size);
+
+	if (ferror(fd))
+	{
+		psql_error("%s: %s\n", fname, strerror(errno));
+		PQfreemem(fname);
+		termPQExpBuffer();
+		return NULL;
+	}
+
+	if (escape)
+	{
+		if (as_ident)
+			escaped_value =
+PQescapeIdentifier(pset.db, buffer.data, buffer.len);
+		else
+			escaped_value =
+PQescapeLiteral(pset.db, buffer.data, buffer.len);
+		pset.paramTypes[pset.nparams] = UNKNOWNOID;
+	}
+	else
+	{
+		escaped_value = (char *)
+PQescapeByteaConn(pset.db,
+	(const unsigned char *) buffer.data, buffer.len, );
+		pset.paramTypes[pset.nparams] = BYTEAOID;
+	}
+
+	/* fname, buffer is not necessary longer */
+	PQfreemem(fname);
+	termPQExpBuffer();
+
+	if (escaped_value == NULL)
+	{
+		const char *error = PQerrorMessage(pset.db);
+
+		psql_error("%s", error);
+		return NULL;
+	}
+
+	pset.params[pset.nparams] = escaped_value;
+
+	snprintf(line, sizeof(line) - 1, "$%d", ++pset.nparams);
+
+	return pstrdup(line);
+}
 
 

Re: [HACKERS] Missing checks when malloc returns NULL...

2016-08-31 Thread Tom Lane
Michael Paquier  writes:
> Which means that processes have an escape window when initializing
> shared memory by cleaning up the index if an entry cannot be found and
> then cannot be created properly. I don't think that it is a good idea
> to change that by forcing ShmemAlloc to fail. So I would tend to just
> have the patch attached and add those missing NULL-checks on all the
> existing ShmemAlloc() calls.

> Opinions?

I still think it'd be better to fix that as attached, because it
represents a net reduction not net addition of code, and it provides
a defense against future repetitions of the same omission.  If only
4 out of 11 existing calls were properly checked --- some of them
adjacent to calls with checks --- that should tell us that we *will*
have more instances of the same bug if we don't fix it centrally.

I also note that your patch missed checks for two ShmemAlloc calls in
InitShmemAllocation and ShmemInitStruct.  Admittedly, since those are
the very first such calls, it's highly unlikely they'd fail; but I think
this exercise is not about dismissing failures as improbable.  Almost
all of these failures are improbable, given that we precalculate the
shmem space requirement.

regards, tom lane

diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index 1efe020..cc3af2d 100644
*** a/src/backend/storage/ipc/shmem.c
--- b/src/backend/storage/ipc/shmem.c
*** InitShmemAllocation(void)
*** 163,177 
  /*
   * ShmemAlloc -- allocate max-aligned chunk from shared memory
   *
!  * Assumes ShmemLock and ShmemSegHdr are initialized.
   *
!  * Returns: real pointer to memory or NULL if we are out
!  *		of space.  Has to return a real pointer in order
!  *		to be compatible with malloc().
   */
  void *
  ShmemAlloc(Size size)
  {
  	Size		newStart;
  	Size		newFree;
  	void	   *newSpace;
--- 163,194 
  /*
   * ShmemAlloc -- allocate max-aligned chunk from shared memory
   *
!  * Throws error if request cannot be satisfied.
   *
!  * Assumes ShmemLock and ShmemSegHdr are initialized.
   */
  void *
  ShmemAlloc(Size size)
  {
+ 	void	   *newSpace;
+ 
+ 	newSpace = ShmemAllocNoError(size);
+ 	if (!newSpace)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+  errmsg("out of shared memory (%zu bytes requested)",
+ 		size)));
+ 	return newSpace;
+ }
+ 
+ /*
+  * ShmemAllocNoError -- allocate max-aligned chunk from shared memory
+  *
+  * As ShmemAlloc, but returns NULL if out of space, rather than erroring.
+  */
+ void *
+ ShmemAllocNoError(Size size)
+ {
  	Size		newStart;
  	Size		newFree;
  	void	   *newSpace;
*** ShmemAlloc(Size size)
*** 206,216 
  
  	SpinLockRelease(ShmemLock);
  
! 	if (!newSpace)
! 		ereport(WARNING,
! (errcode(ERRCODE_OUT_OF_MEMORY),
!  errmsg("out of shared memory")));
! 
  	Assert(newSpace == (void *) CACHELINEALIGN(newSpace));
  
  	return newSpace;
--- 223,229 
  
  	SpinLockRelease(ShmemLock);
  
! 	/* note this assert is okay with newSpace == NULL */
  	Assert(newSpace == (void *) CACHELINEALIGN(newSpace));
  
  	return newSpace;
*** ShmemInitHash(const char *name, /* table
*** 293,299 
  	 * The shared memory allocator must be specified too.
  	 */
  	infoP->dsize = infoP->max_dsize = hash_select_dirsize(max_size);
! 	infoP->alloc = ShmemAlloc;
  	hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE;
  
  	/* look it up in the shmem index */
--- 306,312 
  	 * The shared memory allocator must be specified too.
  	 */
  	infoP->dsize = infoP->max_dsize = hash_select_dirsize(max_size);
! 	infoP->alloc = ShmemAllocNoError;
  	hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE;
  
  	/* look it up in the shmem index */
*** ShmemInitStruct(const char *name, Size s
*** 364,375 
  			 */
  			Assert(shmemseghdr->index == NULL);
  			structPtr = ShmemAlloc(size);
- 			if (structPtr == NULL)
- ereport(ERROR,
- 		(errcode(ERRCODE_OUT_OF_MEMORY),
- 		 errmsg("not enough shared memory for data structure"
- " \"%s\" (%zu bytes requested)",
- name, size)));
  			shmemseghdr->index = structPtr;
  			*foundPtr = FALSE;
  		}
--- 377,382 
*** ShmemInitStruct(const char *name, Size s
*** 410,416 
  	else
  	{
  		/* It isn't in the table yet. allocate and initialize it */
! 		structPtr = ShmemAlloc(size);
  		if (structPtr == NULL)
  		{
  			/* out of memory; remove the failed ShmemIndex entry */
--- 417,423 
  	else
  	{
  		/* It isn't in the table yet. allocate and initialize it */
! 		structPtr = ShmemAllocNoError(size);
  		if (structPtr == NULL)
  		{
  			/* out of memory; remove the failed ShmemIndex entry */
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 7cdb355..4064b20 100644
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
*** 

Re: [HACKERS] some requests on auditing

2016-08-31 Thread Pavel Stehule
2016-08-31 16:00 GMT+02:00 David Steele :

> On 8/31/16 9:39 AM, David Steele wrote:
>
>> On 8/30/16 10:12 AM, Pavel Stehule wrote:
>>
>
> #3 is not likely without changes to logging in Postgres.  However, there
>> are plenty of tools for log analysis (e.g. ELK) that might help and a
>> Postgres extension that allows log messages to be directed elsewhere
>> (can't remember the name but Gabrielle or Simon would know).
>>
>
> Here's the extension I was thinking of:
>
> https://github.com/2ndquadrant-it/redislog
>
> This one is more general purpose:
>
> https://github.com/mpihlak/pg_logforward
>

many thanks you for these informations - I'll check it.

Regards

Pavel


> --
> -David
> da...@pgmasters.net
>


[HACKERS] Optimizing aggregates

2016-08-31 Thread Heikki Linnakangas
I've been profiling simple aggregate queries, looking for any 
low-hanging fruit. For this query:



-- setup
create table floats as select g::float8 as a, g::float8 as b, g::float8 
as c from generate_series(1, 1000) g;

vacuum freeze floats;

-- query
select sum(a), sum(b+c) from floats;


perf report says:

# Children  Self  Command Shared Object  Symbol 

#     ..  . 


#
25.70% 0.00%  postmaster  [unknown]  [k] 
14.23%13.75%  postmaster  postgres   [.] ExecProject
11.18%10.57%  postmaster  postgres   [.] slot_deform_tuple
 9.58% 9.04%  postmaster  postgres   [.] advance_aggregates
 8.96% 0.00%  postmaster  [unknown]  [.] 0x000298d4
 8.77% 8.42%  postmaster  postgres   [.] 
ExecMakeFunctionResultNoSets

 7.78% 0.00%  postmaster  [unknown]  [.] 0x01d38260
 6.63% 6.15%  postmaster  postgres   [.] 
advance_transition_function

 6.61% 0.00%  postmaster  [unknown]  [.] 0x01e99e40
 6.47% 0.00%  postmaster  libc-2.23.so   [.] __GI___libc_read
 6.24% 5.88%  postmaster  postgres   [.] heap_getnext
 4.62% 4.62%  postmaster  [kernel.kallsyms]  [k] 
copy_user_enhanced_fast_string

 3.91% 3.82%  postmaster  postgres   [.] slot_getsomeattrs
 3.29% 3.18%  postmaster  postgres   [.] slot_getattr
 3.06% 3.00%  postmaster  postgres   [.] ExecClearTuple
 2.59% 0.00%  postmaster  [unknown]  [.] 0x01e9a370
 2.57% 2.45%  postmaster  postgres   [.] ExecScan
 2.56% 2.37%  postmaster  postgres   [.] float8pl
 2.54% 2.43%  postmaster  postgres   [.] heapgetpage
 2.25% 2.17%  postmaster  postgres   [.] ExecAgg
 2.10% 1.96%  postmaster  postgres   [.] ExecStoreTuple
 2.00% 1.91%  postmaster  postgres   [.] ExecProcNode

ExecProject stands out. I find that pretty surprising.

We're using ExecProject to extract the arguments from the input tuples, 
to pass to the aggregate transition functions. It looks like that's a 
pretty expensive way of doing it, for a typical aggregate that takes 
only one argument.


We actually used to call ExecEvalExpr() directly for each argument, but 
that was changed by the patch that added support for ordered set 
aggregates. It looks like that was a bad idea, from a performance point 
of view.


I propose that we go back to calling ExecEvalExpr() directly, for 
non-ordered aggregates, per the attached patch. That makes that example 
query about 10% faster on my laptop, which is in line with the fact that 
ExecProject() accounted for about 13% of the CPU time.


Another idea is that maybe we should add a fast-path to ExecProject(), 
for these trivial cases.


- Heikki
From 106c5742fde2ec83576323db74a7249d7d85101f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 31 Aug 2016 17:27:52 +0300
Subject: [PATCH] Skip ExecProject for non-ordered aggregates.

When support for ordered aggregates was added, we started using
ExecProject to compute the arguments for the transient function. However,
it turns out that the old way of just calling ExecEvalExpr() directly for
each argument is somewhat faster. At least for typical aggregates that only
take one or two arguments. So go back to using ExecEvalExpr() for non-ordered
aggregates.

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index ce2fc28..e32b140 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -229,9 +229,8 @@ typedef struct AggStatePerTransData
 	/* Oid of state value's datatype */
 	Oid			aggtranstype;
 
-	/* ExprStates of the FILTER and argument expressions. */
+	/* ExprStates of the FILTER and direct-argument expressions. */
 	ExprState  *aggfilter;		/* state of FILTER expression, if any */
-	List	   *args;			/* states of aggregated-argument expressions */
 	List	   *aggdirectargs;	/* states of direct-argument expressions */
 
 	/*
@@ -288,17 +287,21 @@ typedef struct AggStatePerTransData
 transtypeByVal;
 
 	/*
-	 * Stuff for evaluation of inputs.  We used to just use ExecEvalExpr, but
-	 * with the addition of ORDER BY we now need at least a slot for passing
-	 * data to the sort object, which requires a tupledesc, so we might as
-	 * well go whole hog and use ExecProject too.
+	 * Stuff for evaluation of inputs.
+	 *
+	 * For non-ordered aggregates, we call ExecEvalExpr for each argument,
+	 * represented by the expression trees in transInputs. For ordered
+	 * aggregates, we need at least a slot for passing data to the sort
+	 * object, which requires a tupledesc, so we might as well go whole hog
+	 * and use ExecProject to evaluate the arguments, too.
 	 */
+	

Re: [HACKERS] pg_sequence catalog

2016-08-31 Thread Andres Freund
On 2016-08-31 11:23:27 -0400, Tom Lane wrote:
> Another issue is what is the low-level interlock between nextvals
> in different processes.  Right now it's the buffer lock on the
> sequence's page.  With a scheme like this, if we just kept doing
> that, we'd have a single lock covering probably O(100) different
> sequences which might lead to contention problems.  We could probably
> improve on that with some thought.

I was thinking of forcing the rows to be spread to exactly one page per
sequence...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing aggregates

2016-08-31 Thread Andres Freund
Hi,

On 2016-08-31 17:47:18 +0300, Heikki Linnakangas wrote:
> #     ..  .
> 
> #
> 25.70% 0.00%  postmaster  [unknown]  [k] 
> 14.23%13.75%  postmaster  postgres   [.] ExecProject

> ExecProject stands out. I find that pretty surprising.
> 
> We're using ExecProject to extract the arguments from the input tuples, to
> pass to the aggregate transition functions. It looks like that's a pretty
> expensive way of doing it, for a typical aggregate that takes only one
> argument.
> 
> We actually used to call ExecEvalExpr() directly for each argument, but that
> was changed by the patch that added support for ordered set aggregates. It
> looks like that was a bad idea, from a performance point of view.

I complained about that as well
http://archives.postgresql.org/message-id/20160519175727.ymv2y5tye4qgcmqx%40alap3.anarazel.de


> I propose that we go back to calling ExecEvalExpr() directly, for
> non-ordered aggregates, per the attached patch. That makes that example
> query about 10% faster on my laptop, which is in line with the fact that
> ExecProject() accounted for about 13% of the CPU time.

My approach is a bit different.

I've first combined the projection for all the aggregates, ordered set,
or not, into one projetion. That got rid of a fair amount of overhead
when you have multiple aggregates.  I attached an, probably out of date,
WIP version of that patch.

Secondly, I'm working on overhauling expression evaluation to be
faster. Even without the ExecProject overhead, the computations very
quickly become the bottleneck. During that I pretty much merged
ExecProject and ExecEvalExpr into one - they're really not that
different, and the distinction serves no purpose, except to increase the
number of function calls. The reason I'm working on getting rid of
targetlist SRFs is precisely that. A proof of concept of that is
attached to
http://archives.postgresql.org/message-id/20160714011850.bd5zhu35szle3n3c%40alap3.anarazel.de

Greetings,

Andres Freund
>From 384845bea72d28952d88e58e55f81aaa5addd930 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 12 Jul 2016 01:01:28 -0700
Subject: [PATCH] WIP: Only perform one projection in aggregation.

---
 src/backend/executor/nodeAgg.c | 112 -
 1 file changed, 88 insertions(+), 24 deletions(-)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f655aec..4499d5f 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -210,6 +210,9 @@ typedef struct AggStatePerTransData
 	 */
 	int			numInputs;
 
+	/* offset of input columns in Aggstate->evalslot */
+	int			inputoff;
+
 	/*
 	 * Number of aggregated input columns to pass to the transfn.  This
 	 * includes the ORDER BY columns for ordered-set aggs, but not for plain
@@ -836,14 +839,20 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
 	int			setno = 0;
 	int			numGroupingSets = Max(aggstate->phase->numsets, 1);
 	int			numTrans = aggstate->numtrans;
+	TupleTableSlot *slot = aggstate->evalslot;
+	AggStatePerTrans pertrans;
 
-	for (transno = 0; transno < numTrans; transno++)
+	/* compute input for all aggregates */
+	if (aggstate->evalproj)
+		ExecProjectIntoSlot(aggstate->evalproj, aggstate->evalslot);
+
+	for (transno = 0, pertrans = aggstate->pertrans; transno < numTrans;
+		 transno++, pertrans++)
 	{
-		AggStatePerTrans pertrans = >pertrans[transno];
 		ExprState  *filter = pertrans->aggfilter;
 		int			numTransInputs = pertrans->numTransInputs;
 		int			i;
-		TupleTableSlot *slot;
+		int			inputoff = pertrans->inputoff;
 
 		/* Skip anything FILTERed out */
 		if (filter)
@@ -857,13 +866,10 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
 continue;
 		}
 
-		/* Evaluate the current input expressions for this aggregate */
-		slot = ExecProject(pertrans->evalproj, NULL);
-
 		if (pertrans->numSortCols > 0)
 		{
 			/* DISTINCT and/or ORDER BY case */
-			Assert(slot->tts_nvalid == pertrans->numInputs);
+			Assert(slot->tts_nvalid >= pertrans->numInputs);
 
 			/*
 			 * If the transfn is strict, we want to check for nullity before
@@ -876,7 +882,7 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
 			{
 for (i = 0; i < numTransInputs; i++)
 {
-	if (slot->tts_isnull[i])
+	if (slot->tts_isnull[i + inputoff])
 		break;
 }
 if (i < numTransInputs)
@@ -888,10 +894,22 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
 /* OK, put the tuple into the tuplesort object */
 if (pertrans->numInputs == 1)
 	tuplesort_putdatum(pertrans->sortstates[setno],
-	   slot->tts_values[0],
-	   slot->tts_isnull[0]);
+	   slot->tts_values[inputoff],
+	   slot->tts_isnull[inputoff]);
 else
-	

Re: [HACKERS] pg_sequence catalog

2016-08-31 Thread Petr Jelinek

On 31/08/16 16:10, Tom Lane wrote:

I wrote:

Personally, my big beef with the current approach to sequences is that
we eat a whole relation (including a whole relfilenode) per sequence.
I wish that we could reduce a sequence to just a single row in a
catalog, including the nontransactional state.  Not sure how feasible
that is either, but accomplishing it would move the benefits of making
a change out of the "debatable whether it's worth it" category, IMO.


BTW, another thing to keep in mind here is the ideas that have been
kicked around in the past about alternative sequence implementations
managed through a "sequence AM API".  I dunno whether now is the time
to start creating that API abstraction, but let's at least consider
it if we're whacking the catalog representation around.



FWIW if I was going to continue with the sequence AM API, the next patch 
would have included split of sequence metadata and sequence state into 
separate catalogs, so from that point this actually seems like an 
improvement (I didn't look at the code though).


As a side note, I don't plan to resurrect the seqam patch at least until 
we have reasonable built-in logical replication functionality.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autonomous transactions

2016-08-31 Thread Petr Jelinek

On 31/08/16 16:11, Craig Ringer wrote:

On 31 August 2016 at 21:46, Greg Stark  wrote:

On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut
 wrote:

- A API interface to open a "connection" to a background worker, run
queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(),
AutonomousSessionExecute(), etc.  The communication happens using the
client/server protocol.


Peter, you mention "Oracle-style autonomous transaction blocks".

What are the semantics to be expected of those with regards to:

- Accessing objects exclusively locked by the outer xact or where the
requested lockmode conflicts with a lock held by the outer xact

- Visibility of data written by the outer xact



That would be my question as well.



Also, is it intended (outside the plpgsql interface) that the
autonomous xact can proceed concurrently/interleaved with a local
backend xact? i.e. the local backend xact isn't suspended and you're
allowed to do things on the local backend as well? If so, what
handling do you have in mind for deadlocks between the local backend
xact and the bgworker with the autonomous xact? I'd expect the local
backend to always win, killing the autonomous xact every time.



I would expect that in PLs it's handled by them, if you misuse this on C 
level that's your problem?



I'm surprised by the background worker. I had envisioned autonomous
transactions being implemented by allowing a process to reserve a
second entry in PGPROC with the same pid. Or perhaps save its existing
information in a separate pgproc slot (or stack of slots) and
restoring it after the autonomous transaction commits.


I suspect that there'll be way too much code that relies on stashing
xact-scoped stuff in globals for that to be viable. Caches alone.
Peter will be able to explain more, I'm sure.



I can also see some advantages in bgworker approach. For example it 
could be used for "fire and forget" type of interface in the future, 
where you return as soon as you send exec and don't care about waiting 
for result.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make async slave to wait for lsn to be replayed

2016-08-31 Thread Craig Ringer
On 31 August 2016 at 22:16, Ivan Kartyshov  wrote:

> Our clients who deal with 9.5 and use asynchronous master-slave replication,
> asked to make the wait-mechanism on the slave side to prevent the situation
> when slave handles query which needs data (LSN) that was received, flushed,
> but still not replayed.

I like the broad idea - I've wanted something like it for a while. BDR
has pg_xlog_wait_remote_receive() and pg_xlog_wait_remote_apply() for
use in tests for this reason, but they act on the *upstream* side,
waiting until the downstream has acked the data. Not as useful for
ensuring that apps connected to both master and one or more replicas
get a consistent view of data.

How do you get the commit LSN to watch for? Grab
pg_current_xlog_insert_location() just after the commit and figure
that replaying to that point guarantees you get the commit?

Some time ago[1] I raised the idea of reporting commit LSN on the wire
to clients. That didn't go anywhere due to compatibility and security
concerns. I think those were resolvable, but it wasn't enough of a
priority to push hard on at the time. A truly "right" solution has to
wait for a protocol bump, but I think good-enough solutions are
possible now. So you might want to read that thread.

It also mentions hesitations about exposing LSN to clients even more.
I think we're *way* past that now - we have replication origins and
replication slots relying on it, it's exposed in a pg_lsn datatype, a
bunch of views expose it, etc. But it might be reasonable to ask
"should the client instead be expected to wait for the confirmed
commit of a 64-bit epoch-extended xid, like that returned by
txid_current()?" . One advantage of using xid is that you can get it
while you're still in the xact, so there's no race between commit and
checking the lsn after commit.

Are you specifically trying to ensure "this commit has replayed on the
replica before we run queries on it" ? Or something else?

(Also, on a side note, Kevin mentioned that it may be possible to use
SSI data to achieve SERIALIZABLE read-only queries on replicas, where
they get the same protection from commit-order related anomalies as
queries on the master. You might want to look more deeply into that
too at some stage, if you're trying to ensure the app can do read only
queries on the master and expect fully consistent results).

[1] 
https://www.postgresql.org/message-id/flat/53E41EC1.5050603%402ndquadrant.com#53e41ec1.5050...@2ndquadrant.com

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] C++ port of Postgres

2016-08-31 Thread Peter Eisentraut
[trimmed cc list because of big attachments]

On 8/16/16 4:22 PM, Jim Nasby wrote:
> Joy, do you have an idea what a *minimally invasive* patch for C++ 
> support would look like? That's certainly the first step here.

I developed a minimally invasive patch for C++ support a few years ago
shortly after I wrote that blog post.  Since there appears to have been
some interest here now, I have updated that and split it up into logical
chunks.

So here you go.

To build this, you need to configure with g++ <= version 5.  (4.x works,
too.)  g++ version 6 does not work yet because of the issues described
in patch 0013.

Then you also need to edit src/Makefile.custom and set

COPT = -fpermissive -Wno-sign-compare -Wno-write-strings

The -W options are optional just to reduce some noise.  Cleaning up
those warnings can be a separate project that might also have some
benefit under C.

The -fpermissive option is a g++ specific option that reduces some
errors to warnings.  (So this won't work with clang or other compilers
at all at this point.)  In particular, C++ does not allow casting from
or to void pointers without a cast, but -fpermissive allows that.  The
step from this to "real" C++ would be adding a bunch of casts around
things like malloc and palloc and other places.  That would be mostly
busy work, so I have excluded that here.

The patches are numbered approximately in increasing order of dubiosity.
 So 0001 is probably a straight bug fix, 0002 and 0003 are arguably
minor bug fixes as well.  The patches through 0012 can probably be
considered for committing in some form.  After that it gets a bit hackish.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 86dbd4a5a0ab3212cd340e1fa56f03e864aa8e1a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 30 Aug 2016 12:00:00 -0400
Subject: [PATCH 01/27] Fix use of offsetof()

Using offsetof() with a run-time computed argument is not allowed in
either C or C++.  Apparently, gcc allows it, but g++ doesn't.
---
 contrib/bloom/blutils.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index debf4f4..b68a0d1 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -75,7 +75,7 @@ _PG_init(void)
 		bl_relopt_tab[i + 1].optname = MemoryContextStrdup(TopMemoryContext,
 		   buf);
 		bl_relopt_tab[i + 1].opttype = RELOPT_TYPE_INT;
-		bl_relopt_tab[i + 1].offset = offsetof(BloomOptions, bitSize[i]);
+		bl_relopt_tab[i + 1].offset = offsetof(BloomOptions, bitSize[0]) + sizeof(int) * i;
 	}
 }
 
-- 
2.9.3

From d3c7c1fbb346fd5c040a7a379d971db7b5129581 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 30 Aug 2016 12:00:00 -0400
Subject: [PATCH 02/27] Use return instead of exit() in configure

Using exit() requires stdlib.h, which is not included.  Use return
instead.  Also add return type for main().
---
 config/c-compiler.m4 |  4 +++-
 config/c-library.m4  |  4 +++-
 configure| 12 +---
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index a7f6773..7d901e1 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -71,8 +71,10 @@ int does_int64_work()
 return 0;
   return 1;
 }
+
+int
 main() {
-  exit(! does_int64_work());
+  return (! does_int64_work());
 }])],
 [Ac_cachevar=yes],
 [Ac_cachevar=no],
diff --git a/config/c-library.m4 b/config/c-library.m4
index 50d068d..56658b5 100644
--- a/config/c-library.m4
+++ b/config/c-library.m4
@@ -204,8 +204,10 @@ int does_int64_snprintf_work()
 return 0;			/* either multiply or snprintf is busted */
   return 1;
 }
+
+int
 main() {
-  exit(! does_int64_snprintf_work());
+  return (! does_int64_snprintf_work());
 }]])],
 [pgac_cv_snprintf_long_long_int_modifier=$pgac_modifier; break],
 [],
diff --git a/configure b/configure
index 45c8eef..36d9a54 100755
--- a/configure
+++ b/configure
@@ -13563,8 +13563,10 @@ int does_int64_work()
 return 0;
   return 1;
 }
+
+int
 main() {
-  exit(! does_int64_work());
+  return (! does_int64_work());
 }
 _ACEOF
 if ac_fn_c_try_run "$LINENO"; then :
@@ -13645,8 +13647,10 @@ int does_int64_work()
 return 0;
   return 1;
 }
+
+int
 main() {
-  exit(! does_int64_work());
+  return (! does_int64_work());
 }
 _ACEOF
 if ac_fn_c_try_run "$LINENO"; then :
@@ -13739,8 +13743,10 @@ int does_int64_snprintf_work()
 return 0;			/* either multiply or snprintf is busted */
   return 1;
 }
+
+int
 main() {
-  exit(! does_int64_snprintf_work());
+  return (! does_int64_snprintf_work());
 }
 _ACEOF
 if ac_fn_c_try_run "$LINENO"; then :
-- 
2.9.3

From 54ed07be5a29d4955a9485316e68da5c6896797b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 30 Aug 2016 12:00:00 -0400
Subject: [PATCH 03/27] Add missing include files to configure tests

atoi() needs 

Re: [HACKERS] Optimizing aggregates

2016-08-31 Thread Heikki Linnakangas

On 08/31/2016 06:51 PM, Andres Freund wrote:

On 2016-08-31 17:47:18 +0300, Heikki Linnakangas wrote:

We actually used to call ExecEvalExpr() directly for each argument, but that
was changed by the patch that added support for ordered set aggregates. It
looks like that was a bad idea, from a performance point of view.


I complained about that as well
http://archives.postgresql.org/message-id/20160519175727.ymv2y5tye4qgcmqx%40alap3.anarazel.de


Ah, missed that!


I propose that we go back to calling ExecEvalExpr() directly, for
non-ordered aggregates, per the attached patch. That makes that example
query about 10% faster on my laptop, which is in line with the fact that
ExecProject() accounted for about 13% of the CPU time.


My approach is a bit different.

I've first combined the projection for all the aggregates, ordered set,
or not, into one projetion. That got rid of a fair amount of overhead
when you have multiple aggregates.  I attached an, probably out of date,
WIP version of that patch.


A-ha, I also considered doing just that! I also considered a variant 
where we call ExecProject once for all non-ordered aggregates, and a 
separate ExecProject() for each ordered one. But just switching back to 
straight ExecEvalExprs seemed easier.



Secondly, I'm working on overhauling expression evaluation to be
faster. Even without the ExecProject overhead, the computations very
quickly become the bottleneck. During that I pretty much merged
ExecProject and ExecEvalExpr into one - they're really not that
different, and the distinction serves no purpose, except to increase the
number of function calls. The reason I'm working on getting rid of
targetlist SRFs is precisely that. A proof of concept of that is
attached to
http://archives.postgresql.org/message-id/20160714011850.bd5zhu35szle3n3c%40alap3.anarazel.de


Cool, yes, all that should help.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_sequence catalog

2016-08-31 Thread Andres Freund
Hi,

On 2016-08-31 12:53:30 -0400, Tom Lane wrote:
> Improving on the space wastage is exactly the point IMO.  If it's still
> going to be 8k per sequence on disk (*and* in shared buffers, remember),
> I'm not sure it's worth all the work to change things at all.

A separate file is a heck lot more heavyweight than another 8 kb in an
existing file.


> Another idea would be to have nominally per-sequence LWLocks (or
> spinlocks?) controlling nextval's nontransactional accesses to the catalog
> rows, but to map those down to some fixed number of locks in a way similar
> to the current fallback implementation for spinlocks, which maps them onto
> a fixed number of semaphores.  You'd trade off shared memory against
> contention while choosing the underlying number of locks.

If we could rely on spinlocks to actually be spinlocks, we could just
put the spinlock besides the actual state data... Not entirely pretty,
but probably pretty simple.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_sequence catalog

2016-08-31 Thread Alvaro Herrera
Andres Freund wrote:
> On 2016-08-31 11:23:27 -0400, Tom Lane wrote:
> > Another issue is what is the low-level interlock between nextvals
> > in different processes.  Right now it's the buffer lock on the
> > sequence's page.  With a scheme like this, if we just kept doing
> > that, we'd have a single lock covering probably O(100) different
> > sequences which might lead to contention problems.  We could probably
> > improve on that with some thought.
> 
> I was thinking of forcing the rows to be spread to exactly one page per
> sequence...

I was thinking that nextval could grab a shared buffer lock and release
immediately, just to ensure no one holds exclusive buffer lock
concurrently (which would be used for things like dropping one seq tuple
from the page, when a sequence is dropped); then control access to each
sequence tuple using LockDatabaseObject.  This is a HW lock, heavier
than a buffer's LWLock, but it seems better than wasting a full 8kb for
each sequence.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing aggregates

2016-08-31 Thread Andres Freund
On 2016-08-31 19:07:00 +0300, Heikki Linnakangas wrote:
> On 08/31/2016 06:51 PM, Andres Freund wrote:
> > I've first combined the projection for all the aggregates, ordered set,
> > or not, into one projetion. That got rid of a fair amount of overhead
> > when you have multiple aggregates.  I attached an, probably out of date,
> > WIP version of that patch.
> 
> A-ha, I also considered doing just that! I also considered a variant where
> we call ExecProject once for all non-ordered aggregates, and a separate
> ExecProject() for each ordered one. But just switching back to straight
> ExecEvalExprs seemed easier.

The issue is that might, I think, end up iteratively deforming the
underlying tuple. The projection machinery takes care of that, if we do
it in one go.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal for changes to recovery.conf API

2016-08-31 Thread Simon Riggs
This is a summary of proposed changes to the recovery.conf API for
v10. These are based in part on earlier discussions, and represent a
minimal modification to current usage.

Proposed changes (with reference to patches from Abhijit Menon-Sen and myself)

* pg_ctl start -M (normal|recover|continue)
pg_ctl can now start the server directly as a standby, similar to the
way pg_ctl promote works. The internal implementation is also similar,
with pg_ctl writing a "recovery.trigger" file that initiates recovery
in the same way recovery.conf used to do. It is still possible to
manually add a file called "recovery.trigger" and have that work if
users desire that mechanism.
Different startup methods can be selected with the -M
option. Normal mode starts the server for read/write,
overriding any previous use in Recover mode.
Recover mode starts the server as a standby server which
expects to receive changes from a primary/master server using physical
streaming replication or is used for
performing a recovery from backup. Continue mode is
the default and will startup the server in whatever mode it was in at
last proper shutdown, or as modified by any trigger files present.
(Patch: recovery_startup_r10_api.v1b.patch)

* $DATADIR/recovery.conf no longer triggers recovery
(Patch: recovery_startup_r10_api.v1b.patch)

* Recovery parameters would now be part of the main postgresql.conf
infrastructure
Any parameters set in $DATADIR/recovery.conf will be read after the
main parameter file, similar to the way that postgresql.conf.auto is
read.
(Abhijit)

* pg_basebackup -R will continue to generate a parameter file called
recovery.conf as it does now, but will also create a file named
recovery.trigger.
(This part is WIP; patch doesn't include implementation for tar format yet)

* Parameters
All of the parameters formerly set in recovery.conf can be set in
postgresql.conf using RELOAD
These parameters will have no defaults in postgresql.conf.sample
Setting them has no effect during normal running, or once recovery ends.
 https://www.postgresql.org/docs/devel/static/archive-recovery-settings.html
 https://www.postgresql.org/docs/devel/static/recovery-target-settings.html
 https://www.postgresql.org/docs/devel/static/standby-settings.html
(Abhijit)

Related cleanup
* Promotion signal file is now called "promote.trigger" rather than
just "promote"
* Remove user configurable "trigger_file" mechanism - use
"promote.trigger" for all cases
* Remove Fallback promote mechanism, so all promotion is now "fast" in xlog.c
* Rename CheckForStandbyTrigger() to CheckForPromoteTrigger() for clarity
(Patch: recovery_startup_r10_api.v1b.patch)

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


recovery_startup_r10_api.v1b.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql \setfileref

2016-08-31 Thread Pavel Stehule
2016-08-31 18:24 GMT+02:00 Corey Huinker :

> On Wed, Aug 31, 2016 at 11:32 AM, Pavel Stehule 
> wrote:
>
>> Hi
>>
>> I propose a new type of psql variables - file references. The content of
>> file reference is specified by referenced file. It allows simple inserting
>> large data without necessity of manual escaping or using LO api.
>>
>> When I wrote the patch, I used parametrized queries for these data
>> instead escaped strings - the code is not much bigger, and the error
>> messages are much more friendly if query is not bloated by bigger content.
>> The text mode is used only - when escaping is not required, then content is
>> implicitly transformed to bytea. By default the content of file is bytea.
>> When use requires escaping, then he enforces text escaping - because it has
>> sense only for text type.
>>
>> postgres=# \setfileref a ~/test2.xml
>> postgres=# \setfileref b ~/avatar.gif
>> postgres=# insert into test values(convert_from(:a, 'latin2')::xml, :b);
>> -- xml is passed as bytea
>> postgres=# insert into test values(:'a', :b); -- xml is passed via
>> unknown text value
>>
>> The content of file reference variables is not persistent in memory.
>>
>> Comments, notes?
>>
>> Regards
>>
>> Pavel
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
> Clearly jumping ahead on this one, but if the fileref is essentially a
> pipe to "cat /path/to/file.name", is there anything stopping us from
> setting pipes?
>
>
My interest is primarily in ways that COPY could use this.
>

I don't see a reason why it should not be possible - the current code can't
do it, but with 20 lines more, it should be possible

There is one disadvantage against copy - the content should be fully loaded
to memory, but any other functionality should be same.

Regards

Pavel


Re: [HACKERS] pg_sequence catalog

2016-08-31 Thread Tom Lane
Andres Freund  writes:
> On 2016-08-31 12:56:45 -0300, Alvaro Herrera wrote:
>> I was thinking that nextval could grab a shared buffer lock and release
>> immediately, just to ensure no one holds exclusive buffer lock
>> concurrently (which would be used for things like dropping one seq tuple
>> from the page, when a sequence is dropped); then control access to each
>> sequence tuple using LockDatabaseObject.  This is a HW lock, heavier
>> than a buffer's LWLock, but it seems better than wasting a full 8kb for
>> each sequence.

> That's going to go be a *lot* slower, I don't think that's ok.  I've a
> hard time worrying about the space waste here; especially considering
> where we're coming from.

Improving on the space wastage is exactly the point IMO.  If it's still
going to be 8k per sequence on disk (*and* in shared buffers, remember),
I'm not sure it's worth all the work to change things at all.

We're already dealing with taking a heavyweight lock for each sequence
(the relation AccessShareLock).  I wonder whether it'd be possible to
repurpose that effort somehow.

Another idea would be to have nominally per-sequence LWLocks (or
spinlocks?) controlling nextval's nontransactional accesses to the catalog
rows, but to map those down to some fixed number of locks in a way similar
to the current fallback implementation for spinlocks, which maps them onto
a fixed number of semaphores.  You'd trade off shared memory against
contention while choosing the underlying number of locks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autonomous transactions

2016-08-31 Thread Greg Stark
On Wed, Aug 31, 2016 at 3:11 PM, Craig Ringer  wrote:
>
> I suspect that there'll be way too much code that relies on stashing
> xact-scoped stuff in globals for that to be viable. Caches alone.
> Peter will be able to explain more, I'm sure.
>
> We'd probably need a new transaction data object that everything
> xact-scoped hangs off, so we can pass it everywhere or swap it out of
> some global. The mechanical refactoring alone would be pretty scary,
> not to mention the complexity of actually identifying all the less
> obvious places that need changing.

Well this is the converse of the same problem. Today process state and
transaction are tied together. One way or another you're trying to
split that -- either by having two processes share state or by having
one process manage two transactions.

I suppose we already have the infrastructure for parallel query so
there's at least some shared problem space there.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2016-08-31 Thread Claudio Freire
On Wed, Aug 31, 2016 at 1:45 PM, Pavan Deolasee 
wrote:

> We discussed a few ideas to address the "Duplicate Scan" problem. For
> example, we can teach Index AMs to discard any duplicate (key, CTID) insert
> requests. Or we could guarantee uniqueness by either only allowing updates
> in one lexical order. While the former is a more complete solution to avoid
> duplicate entries, searching through large number of keys for non-unique
> indexes could be a drag on performance. The latter approach may not be
> sufficient for many workloads. Also tracking increment/decrement for many
> indexes will be non-trivial.
>
> There is another problem with allowing many index entries pointing to the
> same WARM chain. It will be non-trivial to know how many index entries are
> currently pointing to the WARM chain and index/heap vacuum will throw up
> more challenges.
>
> Instead, what I would like to propose and the patch currently implements
> is to restrict WARM update to once per chain. So the first non-HOT update
> to a tuple or a HOT chain can be a WARM update. The chain can further be
> HOT updated any number of times. But it can no further be WARM updated.
> This might look too restrictive, but it can still bring down the number of
> regular updates by almost 50%. Further, if we devise a strategy to convert
> a WARM chain back to HOT chain, it can again be WARM updated. (This part is
> currently not implemented). A good side effect of this simple strategy is
> that we know there can maximum two index entries pointing to any given WARM
> chain.
>

We should probably think about coordinating with my btree patch.

>From the description above, the strategy is quite readily "upgradable" to
one in which the indexam discards duplicate (key,ctid) pairs and that would
remove the limitation of only one WARM update... right?


Re: [HACKERS] pg_sequence catalog

2016-08-31 Thread Andres Freund
On 2016-08-31 12:56:45 -0300, Alvaro Herrera wrote:
> I was thinking that nextval could grab a shared buffer lock and release
> immediately, just to ensure no one holds exclusive buffer lock
> concurrently (which would be used for things like dropping one seq tuple
> from the page, when a sequence is dropped); then control access to each
> sequence tuple using LockDatabaseObject.  This is a HW lock, heavier
> than a buffer's LWLock, but it seems better than wasting a full 8kb for
> each sequence.

That's going to go be a *lot* slower, I don't think that's ok.  I've a
hard time worrying about the space waste here; especially considering
where we're coming from.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql \setfileref

2016-08-31 Thread Corey Huinker
On Wed, Aug 31, 2016 at 11:32 AM, Pavel Stehule 
wrote:

> Hi
>
> I propose a new type of psql variables - file references. The content of
> file reference is specified by referenced file. It allows simple inserting
> large data without necessity of manual escaping or using LO api.
>
> When I wrote the patch, I used parametrized queries for these data instead
> escaped strings - the code is not much bigger, and the error messages are
> much more friendly if query is not bloated by bigger content. The text mode
> is used only - when escaping is not required, then content is implicitly
> transformed to bytea. By default the content of file is bytea. When use
> requires escaping, then he enforces text escaping - because it has sense
> only for text type.
>
> postgres=# \setfileref a ~/test2.xml
> postgres=# \setfileref b ~/avatar.gif
> postgres=# insert into test values(convert_from(:a, 'latin2')::xml, :b);
> -- xml is passed as bytea
> postgres=# insert into test values(:'a', :b); -- xml is passed via unknown
> text value
>
> The content of file reference variables is not persistent in memory.
>
> Comments, notes?
>
> Regards
>
> Pavel
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
Clearly jumping ahead on this one, but if the fileref is essentially a pipe
to "cat /path/to/file.name", is there anything stopping us from setting
pipes?
My interest is primarily in ways that COPY could use this.


Re: [HACKERS] autonomous transactions

2016-08-31 Thread Simon Riggs
On 31 August 2016 at 14:09, Joel Jacobson  wrote:
> On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanova
>  wrote:
>>
>> On 30 August 2016 at 23:10, Joel Jacobson  wrote:
>> >
>> > There should be a way to within the session and/or txn permanently
>> > block autonomous transactions.
>> >
>>
>> This will defeat one of the use cases of autonomous transactions: auditing
>
> My idea on how to deal with this would be to mark the function to be
> "AUTONOMOUS" similar to how a function is marked to be "PARALLEL
> SAFE",
> and to throw an error if a caller that has blocked autonomous
> transactions tries to call a function that is marked to be autonomous.
>
> That way none of the code that needs to be audited would ever get executed.

Not sure I see why you would want to turn off execution for only some functions.

What happens if your function calls some other function with
side-effects? How would you roll that back? How would you mark
functions for the general case?

Functions with side effects can't be tested with simple unit tests;
that has nothing to do with autonomous transactions.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2