Re: [HACKERS] WAL format and API changes (9.5)

2014-06-27 Thread Michael Paquier
On Sun, Jun 15, 2014 at 6:11 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 06/14/2014 09:43 PM, Alvaro Herrera wrote:

 Heikki Linnakangas wrote:

  Here's a new version, rebased against master. No other changes.


 This is missing xlogrecord.h ...


 Oops, here you are.

Looking at this patch, it does not compile when assertions are enabled
because of this assertion in heapam.c:
Assert(recdata == data + datalen);
It seems like a thinko as removing it makes the code compile.

Here is a review of this patch:
- Compilation without warnings, regression tests pass
- That's not a small patch, but the changes mainly done are xlog record
insertion in accordance to the new API format.
$ git diff master --stat | tail -n1
52 files changed, 3601 insertions(+), 4371 deletions(-)

Some comments about the code:
1) In src/backend/access/transam/README:
's/no further action is no required/no further action is required/g'
2) Description of XLogBeginInsert, XLogHasBlockRef, XLogGetBlockRef and
XLogGetBlockRefIds are missing in src/backend/access/transam/README.
3) There are a couple of code blocks marked as FIXME and BROKEN. There are
as well some NOT_USED blocks.
 4) Current patch crashes when running make check in contrib/test_decoding.
Looking closer, wal_level = logical is broken:
=# show wal_level;
 wal_level
---
 logical
(1 row)
=# create database foo;
The connection to the server was lost. Attempting reset: Failed.

Looking even closer, log_heap_new_cid is broken:
(lldb) bt
* thread #1: tid = 0x, 0x7fff8a3c2866
libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
  * frame #0: 0x7fff8a3c2866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff8d37235c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x7fff9369db1a libsystem_c.dylib`abort + 125
frame #3: 0x000104428819
postgres`ExceptionalCondition(conditionName=0x0001044a711c,
errorType=0x00010448b19f, fileName=0x0001044a2cfd, lineNumber=6879)
+ 137 at assert.c:54
frame #4: 0x000103f08dbe
postgres`log_heap_new_cid(relation=0x7fae4482afb8,
tup=0x7fae438062e8) + 126 at heapam.c:6879
frame #5: 0x000103f080cf
postgres`heap_insert(relation=0x7fae4482afb8, tup=0x7fae438062e8,
cid=0, options=0, bistate=0x) + 383 at heapam.c:2095
frame #6: 0x000103f09b6a
postgres`simple_heap_insert(relation=0x7fae4482afb8,
tup=0x7fae438062e8) + 74 at heapam.c:2533
frame #7: 0x00010406aacf postgres`createdb(stmt=0x7fae44843690)
+ 6335 at dbcommands.c:511
frame #8: 0x00010429def8
postgres`standard_ProcessUtility(parsetree=0x7fae44843690,
queryString=0x7fae44842c38, context=PROCESS_UTILITY_TOPLEVEL,
params=0x, dest=0x7fae44843a18,
completionTag=0x7fff5bd4ee30) + 1720 at utility.c:56

5) Yes,there are some WAL records that have only data related to buffers,
XLOG_GIN_VACUUM_DATA_LEAF_PAGE being one, with nothing registered with
buffer_id = -d, but using a dummy entry seems counter-productive:
+   rdata = rdata_head;
+   if (rdata == NULL)
+   {
+   /*
+* the rest of this function assumes that there is at least
some
+* rdata entries, so fake an empty rdata entry
+*/
+   dummy_rdata.data = NULL;
+   dummy_rdata.len = 0;
+   dummy_rdata.next = NULL;
+   rdata = dummy_rdata;
+   }
Could this be removed and replaced by a couple of checks on rdata being
NULL in some cases? XLogInsert should be updated in consequence.
6) XLogRegisterData creates a XLogRecData entry and appends it in one of
the static XLogRecData entries if buffer_id = 0 or to
rdata_head/rdata_tail if buffer_id = -1 (for permanent data related to the
WAL record). XLogRegisterXLogRecData does something similar, but with an
entry of XLogRecData already built. What do you think about removing
XLogRegisterXLogRecData and keeping only XLogRegisterData. This makes the
interface simpler, and XLogRecData could be kept private in xlog.c. This
would need more work especially on gin side where I am seeing for example
constructLeafRecompressWALData directly building a XLogRecData entry.
By doing so, we could as remove the while loop at the bottom of
XLogRegisterXLogRecData as we would insert in the tail only the latest
record created, aka replacing that:
while ((*tail)-next)
*tail = (*tail)-next;
By simply that:
*tail = (*tail)-next;
7) New structures at the top of xlog.c should have more comments about
their role, particularly rdata_head and rdata_tail that store main WAL data
(id of -1)
8) What do you think about adding in the README a description about how to
retrieve the block list modified by a WAL record, for a flow similar to
that:
record = XLogReadRecord(...);
nblocks = XLogGetBlockRefIds(record, blockids);
for (i = 0; i  nblocks; i++)
{
bkpb = XLogGetBlockRef(record, id, NULL);
// stuff
}

Re: [HACKERS] review: Built-in binning functions

2014-06-27 Thread Pavel Stehule
Hello

I recheck this patch

1. applied cleanly and compilation was without warnings and errors
2. all tests was passed ok
3. documentation was rebuild without issues
4. I don't see any issue in code quality - it is well commented, well
formatted, with regress tests

It is ready for commit

Regards

Pavel


2014-06-26 22:43 GMT+02:00 Petr Jelinek p...@2ndquadrant.com:

 Here is v2,

 with fixed documentation and numeric version of the implementation.


 --
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] ALTER SYSTEM RESET?

2014-06-27 Thread Vik Fearing
On 06/27/2014 06:22 AM, Amit Kapila wrote:
 On Thu, Jun 26, 2014 at 8:17 PM, Vik Fearing vik.fear...@dalibo.com
 mailto:vik.fear...@dalibo.com wrote:
 On 06/26/2014 05:07 AM, Amit Kapila wrote:
  I think it will make sense if we support RESET ALL as well similar
  to Alter Database .. RESET ALL syntax.  Do you see any reason
  why we shouldn't support RESET ALL syntax for Alter System?

 Yes, that makes sense.  I've added that in the attached version 2 of the
 patch.
 
 I think the idea used in patch to implement RESET ALL is sane.  In passing
 by, I noticed that modified lines in .sgml have crossed 80 char
 boundary which
 is generally preferred to be maintained..
 
 Refer below modification.
 
 !values to the filenamepostgresql.auto.conf/filename file.
 Setting the parameter to
 !literalDEFAULT/literal, or using the commandRESET/command
 variant, removes the configuration entry from

I did that on purpose so that it's easy for a reviewer/committer to see
what changed.

This third patch reformats the documentation in the way I expected it to
be committed.
-- 
Vik
*** a/doc/src/sgml/ref/alter_system.sgml
--- b/doc/src/sgml/ref/alter_system.sgml
***
*** 22,27  PostgreSQL documentation
--- 22,30 
   refsynopsisdiv
  synopsis
  ALTER SYSTEM SET replaceable class=PARAMETERconfiguration_parameter/replaceable { TO | = } { replaceable class=PARAMETERvalue/replaceable | 'replaceable class=PARAMETERvalue/replaceable' | DEFAULT }
+ 
+ ALTER SYSTEM RESET replaceable class=PARAMETERconfiguration_parameter/replaceable
+ ALTER SYSTEM RESET ALL
  /synopsis
   /refsynopsisdiv
  
***
*** 30,37  ALTER SYSTEM SET replaceable class=PARAMETERconfiguration_parameter/replace
  
para
 commandALTER SYSTEM/command writes the configuration parameter
!values to the filenamepostgresql.auto.conf/filename file. With
!literalDEFAULT/literal, it removes a configuration entry from
 filenamepostgresql.auto.conf/filename file. The values will be
 effective after reload of server configuration (SIGHUP) or in next 
 server start based on the type of configuration parameter modified.
--- 33,41 
  
para
 commandALTER SYSTEM/command writes the configuration parameter
!values to the filenamepostgresql.auto.conf/filename file.
!Setting the parameter to literalDEFAULT/literal, or using the
!commandRESET/command variant, removes the configuration entry from
 filenamepostgresql.auto.conf/filename file. The values will be
 effective after reload of server configuration (SIGHUP) or in next 
 server start based on the type of configuration parameter modified.
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***
*** 401,407  static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
  
  %type istmt	insert_rest
  
! %type vsetstmt generic_set set_rest set_rest_more SetResetClause FunctionSetResetClause
  
  %type node	TableElement TypedTableElement ConstraintElem TableFuncElement
  %type node	columnDef columnOptions
--- 401,407 
  
  %type istmt	insert_rest
  
! %type vsetstmt generic_set set_rest set_rest_more generic_reset reset_rest SetResetClause FunctionSetResetClause
  
  %type node	TableElement TypedTableElement ConstraintElem TableFuncElement
  %type node	columnDef columnOptions
***
*** 1567,1605  NonReservedWord_or_Sconst:
  		;
  
  VariableResetStmt:
! 			RESET var_name
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n-kind = VAR_RESET;
! 	n-name = $2;
! 	$$ = (Node *) n;
  }
! 			| RESET TIME ZONE
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n-kind = VAR_RESET;
! 	n-name = timezone;
! 	$$ = (Node *) n;
  }
! 			| RESET TRANSACTION ISOLATION LEVEL
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n-kind = VAR_RESET;
! 	n-name = transaction_isolation;
! 	$$ = (Node *) n;
  }
! 			| RESET SESSION AUTHORIZATION
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n-kind = VAR_RESET;
! 	n-name = session_authorization;
! 	$$ = (Node *) n;
  }
! 			| RESET ALL
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n-kind = VAR_RESET_ALL;
! 	$$ = (Node *) n;
  }
  		;
  
--- 1567,1613 
  		;
  
  VariableResetStmt:
! 			RESET reset_rest		{ $$ = (Node *) $2; }
! 		;
! 
! reset_rest:
! 			generic_reset			{ $$ = $1; }
! 			| TIME ZONE
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n-kind = VAR_RESET;
! 	n-name = timezone;
! 	$$ = n;
  }
! 			| TRANSACTION ISOLATION LEVEL
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n-kind = VAR_RESET;
! 	n-name = transaction_isolation;
! 	$$ = n;
  }
! 			| SESSION AUTHORIZATION
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n-kind = VAR_RESET;
! 	n-name = session_authorization;
! 	

Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-06-27 Thread Pavel Stehule
Hello

thank you Peter, so now only setting for MS Windows is missing?

Regards

Pavel


2014-06-26 21:57 GMT+02:00 Petr Jelinek p...@2ndquadrant.com:

 Hello,

 I went through the patch, seems mostly fine, I adjusted some wording,
 removed the default .pgpass file info since it's not accurate, and replaced
 couple of phrases with (hopefully) more informative ones.


 --
  Petr Jelinek  http://www.2ndQuadrant.com/

  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] ALTER SYSTEM RESET?

2014-06-27 Thread Vik Fearing
On 06/27/2014 08:49 AM, Vik Fearing wrote:
 This third patch reformats the documentation in the way I expected it to
 be committed.

Amit,

I added this to the next commitfest with your name as reviewer.

https://commitfest.postgresql.org/action/patch_view?id=1495

Please update the status as you see fit.
-- 
Vik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS Design

2014-06-27 Thread Dean Rasheed
On 26 June 2014 18:04, Robert Haas robertmh...@gmail.com wrote:
 ALTER TABLE t1 SET POLICY p1 ON SELECT TO t1_p1_sel_quals;
 GRANT SELECT ON TABLE t1 TO role1 USING p1;

 As I see it, the downside of this is that it gets a lot more complex.
 We have to revise the ACL representation, which is already pretty darn
 complicated, to keep track not only of the grantee, grantor, and
 permissions, but also the policies qualifying those permissions.  The
 changes to GRANT will need to propagate into GRANT ON ALL TABLES IN
 SCHEMA and AFTER DEFAULT PRIVILEGES.

No, it can be done without any changes to the permissions code by
storing the ACLs on the catalog entries where the RLS quals are held,
rather than modifying the ACL items on the table. I.e., instead of
thinking of USING polname as a modifier to the grant, think of it as
as an additional qualifier on the thing being granted.

That means the syntax I proposed earlier is wrong/misleading. Instead of

GRANT SELECT ON TABLE tbl TO role USING polname;

it should really be

GRANT SELECT USING polname ON TABLE tbl TO role;


 There is administrative
 complexity as well, because if you want to policy-protect an
 additional table, you've got to add the table to the policy and then
 update all the grants as well.  I think what will happen in practice
 is that people will grant to PUBLIC all rights on the policy, and then
 do all the access control through the GRANT statements.


If you assume that most users will only have one policy through which
they can access any given table, then there is no more administrative
overhead than we have right now. Right now you have to grant each user
permissions on each table you define. The only difference is that now
you throw in a USING polname. We could also simplify administration
by supporting

GRANT SELECT USING polname ON ALL TABLES IN SCHEMA sch TO role;

The important distinction is that this is only granting permissions on
tables that exist now, not on tables that might be created later.


 An interesting question we haven't much considered is: who can set up
 policies and add then to users?  Maybe we should flip this around, and
 instead of adding users to policies, we should exempt users from
 policies.

 CREATE POLICY p1;

 And then, if they own p1 and t1, they can do:

 ALTER TABLE t1 SET POLICY p1 TO t1_p1_quals;
 (or maybe we should associate it to the policy instead of the table:
 ALTER POLICY p1 SET TABLE t1 TO t1_p1_quals)

 And then the policy applies to everyone who doesn't have the grantable
 EXEMPT privilege on the policy.  The policy owner and superuser have
 that privilege by default and it can be handed out to others like
 this:

 GRANT EXEMPT ON POLICY p1 TO snowden;

 Then users who have row_level_security=on will bypass RLS if possible,
 and otherwise it will be applied.  Users who have
 row_level_security=off will bypass RLS if possible, and otherwise
 error.  And users who have row_level_security=force will apply RLS
 even if they are entitled to bypass it.


That's interesting. I need to think some more about what that means.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-27 Thread Andres Freund
On 2014-06-26 22:47:47 -0600, Ryan Johnson wrote:
 Hi,
 
 As part of a research project, I'm trying to change Read Committed isolation
 to use HeapTupleSatisfiesNow rather than acquiring a new snapshot at every
 command [1]. Things appear to have gone reasonably well so far, except
 certain queries fail with ERROR:  non-MVCC snapshots are not supported in
 index-only scans.

You're aware that unless you employ additional locking you can simply
miss individual rows or see them several times because of concurrent
updates?
The reason it has worked ( 9.4) for system catalogs is that updates of
rows were only performed while objects were locked access exclusively -
that's the reason why some places in the code use inplace updates btw...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_resetxlog to clear backup start/end locations.

2014-06-27 Thread Fujii Masao
On Fri, Jun 27, 2014 at 12:29 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, I have finished the patches for all of 9.x.

 I dont' touch what '-n' option shows and rewrite documents for
 the option a bit. And '-n' won't show the changes of backup
 location.

 -8.4: does not have backup locations in ControlFileData.

 9.0-9.1: resetxlog_backuploc_9_0-9.1.patch: Add clearance of backupStartPoint.

 9.2: resetxlog_backuploc_9_2.patch: Add clearance of
backupStart/EndPoint and backupEndRequired

 9.3: resetxlog_backuploc_9_3.patch: Ditto. (format of XLogRecPtr changed)

 9.4-master: resetxlog_backuploc_9_4-master.patch: Add clearance of
backupPoints. help message and html doc changed.

Thanks for the patch! But when I read the source code of pg_resetxlog,
I found that it has already reset the backup locations. Please see
RewriteControlFile() which does that. So I wonder if we need nothing
at least in HEAD for the purpose which you'd like to achieve. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-06-27 Thread Fujii Masao
On Thu, Jun 26, 2014 at 7:01 PM,  furu...@pm.nttdata.co.jp wrote:
 The patch looks somewhat complicated and bugs can be easily introduced
 because it tries to not only add new feature but also reorganize the main
 loop in HandleCopyStream at the same time. To keep the patch simple, I'm
 thinking to firstly apply the attached patch which just refactors the
 main loop. Then we can apply the main patch, i.e., add new feature.
 Thought?
 Thank you for the refactoring patch.
 I did a review of the patch.
 As a result, I found the calculation of sleeptime when the --status-intarvall 
 is set to 1 was incorrect.

 --status-intarvall 1 --  sleeptime 1. !?
 --status-intarvall 2 --  sleeptime 1. OK
 --status-intarvall 3 --  sleeptime 2. OK

Thanks for the review!

+if (secs = 0)
+secs = 1;/* Always sleep at least 1 sec */
+
+sleeptime = secs * 1000 + usecs / 1000;

The above is the code which caused that problem. 'usecs' should have been
reset to zero when 'secs' are rounded up to 1 second. But not. Attached is
the updated version of the patch.

Regards,

-- 
Fujii Masao
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index d76e605..4aa35da 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -35,6 +35,8 @@ static PGresult *HandleCopyStream(PGconn *conn, XLogRecPtr startpos,
  uint32 timeline, char *basedir,
 			   stream_stop_callback stream_stop, int standby_message_timeout,
  char *partial_suffix, XLogRecPtr *stoppos);
+static int CopyStreamPoll(PGconn *conn, long timeout_ms);
+static int CopyStreamReceive(PGconn *conn, long timeout, char **buffer);
 
 static bool ReadEndOfStreamingResult(PGresult *res, XLogRecPtr *startpos,
 		 uint32 *timeline);
@@ -744,12 +746,7 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
 		int			bytes_written;
 		int64		now;
 		int			hdr_len;
-
-		if (copybuf != NULL)
-		{
-			PQfreemem(copybuf);
-			copybuf = NULL;
-		}
+		long		sleeptime;
 
 		/*
 		 * Check if we should continue streaming, or abort at this point.
@@ -784,67 +781,38 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
 			last_status = now;
 		}
 
-		r = PQgetCopyData(conn, copybuf, 1);
-		if (r == 0)
+		/*
+		 * Compute how long send/receive loops should sleep
+		 */
+		if (standby_message_timeout  still_sending)
 		{
-			/*
-			 * No data available. Wait for some to appear, but not longer than
-			 * the specified timeout, so that we can ping the server.
-			 */
-			fd_set		input_mask;
-			struct timeval timeout;
-			struct timeval *timeoutptr;
-
-			FD_ZERO(input_mask);
-			FD_SET(PQsocket(conn), input_mask);
-			if (standby_message_timeout  still_sending)
+			int64		targettime;
+			long		secs;
+			int			usecs;
+
+			targettime = last_status + (standby_message_timeout - 1) * ((int64) 1000);
+			feTimestampDifference(now,
+  targettime,
+  secs,
+  usecs);
+			/* Always sleep at least 1 sec */
+			if (secs = 0)
 			{
-int64		targettime;
-long		secs;
-int			usecs;
-
-targettime = last_status + (standby_message_timeout - 1) * ((int64) 1000);
-feTimestampDifference(now,
-	  targettime,
-	  secs,
-	  usecs);
-if (secs = 0)
-	timeout.tv_sec = 1; /* Always sleep at least 1 sec */
-else
-	timeout.tv_sec = secs;
-timeout.tv_usec = usecs;
-timeoutptr = timeout;
+secs = 1;
+usecs = 0;
 			}
-			else
-timeoutptr = NULL;
 
-			r = select(PQsocket(conn) + 1, input_mask, NULL, NULL, timeoutptr);
-			if (r == 0 || (r  0  errno == EINTR))
-			{
-/*
- * Got a timeout or signal. Continue the loop and either
- * deliver a status packet to the server or just go back into
- * blocking.
- */
-continue;
-			}
-			else if (r  0)
-			{
-fprintf(stderr, _(%s: select() failed: %s\n),
-		progname, strerror(errno));
-goto error;
-			}
-			/* Else there is actually data on the socket */
-			if (PQconsumeInput(conn) == 0)
-			{
-fprintf(stderr,
-		_(%s: could not receive data from WAL stream: %s),
-		progname, PQerrorMessage(conn));
-goto error;
-			}
-			continue;
+			sleeptime = secs * 1000 + usecs / 1000;
 		}
+		else
+			sleeptime = -1;
+
+		r = CopyStreamReceive(conn, sleeptime, copybuf);
+		if (r == 0)
+			continue;
 		if (r == -1)
+			goto error;
+		if (r == -2)
 		{
 			PGresult   *res = PQgetResult(conn);
 
@@ -877,15 +845,10 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
 			}
 			if (copybuf != NULL)
 PQfreemem(copybuf);
+			copybuf = NULL;
 			*stoppos = blockpos;
 			return res;
 		}
-		if (r == -2)
-		{
-			fprintf(stderr, _(%s: could not read COPY data: %s),
-	progname, PQerrorMessage(conn));
-			goto error;
-		}
 
 		/* Check the message type. */
 		if (copybuf[0] == 'k')
@@ -1056,3 +1019,115 @@ error:
 		PQfreemem(copybuf);
 	return NULL;
 }
+
+/*
+ * Wait 

Re: [HACKERS] pg_receivexlog add synchronous mode

2014-06-27 Thread Andres Freund
On 2014-06-05 19:13:34 +0900, furu...@pm.nttdata.co.jp wrote:
  -Original Message-
  Hi,
  
  On 2014-06-05 17:09:44 +0900, furu...@pm.nttdata.co.jp wrote:
   Synchronous(synchronous_commit = on) mode offers the ability to
  confirm WAL have been streamed in the same way as synchronous
  replication.
   If an output is used as a different disk from the directory where the
  transaction log should be stored.
   Prevent the loss of data due to disk failure.
  
   the additional parameter(-m) and replicationslot specify, that its
  synchronous mode.
   All received WAL write after, flush is executed and reply flush
   position.
  
  What's the usecase for this? I can see some benefit in easier testing
  of syncrep, but that's basically it?
 
 When used with syncrep, standby server crashes, multiplexing of WAL can be 
 collateral.
 Data loss can be to nearly zero.

I don't see how this gets pg_receivexlog much closer to a solution for
multiplexing WAL. Since there's no support for streaming data, removing
old WAL and such it seems to me you'd need to have something entirely
different anyway?
I'm not really averse to adding this feature (on the basis of easier
syncrep testing alone), but I don't like the arguments for it so far...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-27 Thread Robert Haas
On Wed, Jun 25, 2014 at 11:40 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 Why is IIT timeout turned on only when send_ready_for_query is true?
 I was thinking it should be turned on every time a message is received.
 Imagine the case where the session is in idle-in-transaction state and
 a client gets stuck after sending Parse message and before sending Bind
 message. This case would also cause long transaction problem and should
 be resolved by IIT timeout. But IIT timeout that this patch adds cannot
 handle this case because it's enabled only when send_ready_for_query is
 true. Thought?

 I think you just moved the goalposts to the next county.

 The point of this feature, AFAICS, is to detect clients that are failing
 to issue another query or close their transaction as a result of bad
 client logic.  It's not to protect against network glitches.

Hmm, so there's no reason a client, after sending one protocol
message, might not pause before sending the next protocol message?
That seems like a surprising argument.  Someone couldn't Parse and
then wait before sending Bind and Execute, or Parse and Bind and then
wait before sending Execute?

 Moreover, there would be no way to implement a timeout like that without
 adding a gettimeofday() call after every packet receipt, which is overhead
 we do not need and cannot afford.  I don't think this feature should add
 *any* gettimeofday's beyond the ones that are already there.

That's an especially good point if we think that this feature will be
enabled commonly or by default - but as Fujii Masao says, it might be
tricky to avoid.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-27 Thread Andres Freund
On 2014-06-24 10:22:08 -0700, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-24 13:03:37 -0400, Noah Misch wrote:
  If a change has the potential to make some architectures give wrong
  answers only at odd times, that's a different kind of problem.  For
  that reason, actively breaking Alpha is a good thing.
 
  Not sure what you mean with the 'actively breaking Alpha' statement?
  That we should drop Alpha?
 
 +1.  Especially with no buildfarm critter.  Would anyone here care
 to bet even the price of a burger that Alpha isn't broken already?

Here's a patch removing alpha/true64/osf/1 support. I think I got most
relevant references, not sure if I missed something.

Since there seems to be (unanimous?) support for dropping alpha and some
patches coming up that need to deal with platform dependent stuff it
seems sensible to do this first.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From aba43d095344337fb76fb15b53a6ac4c90900d80 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Fri, 27 Jun 2014 15:43:46 +0200
Subject: [PATCH] Remove Alpha and Tru64 support.

Support for running postgres on Alpha hasn't been tested for a long
while. Due to Alpha's lax memory order guarantees it's a hard to
develop for platform and thought to be unlikely to currently work
correctly.
As Alpha is the only supported architecture for Tru64 drop support for
it. Tru64's support has ended 2012 and it was in maintenance-only mode
for much longer.

Also remove stray references to  __ksr__ and ultrix defines.
---
 configure  |  1 -
 configure.in   |  1 -
 doc/src/sgml/dfunc.sgml| 21 
 doc/src/sgml/installation.sgml |  7 ++--
 src/Makefile.shlib |  4 ---
 src/backend/main/main.c| 29 -
 src/backend/port/dynloader/osf.c   |  7 
 src/backend/port/dynloader/osf.h   | 47 ---
 src/backend/utils/misc/ps_status.c |  2 +-
 src/include/port/osf.h |  4 ---
 src/include/storage/barrier.h  |  9 --
 src/include/storage/s_lock.h   | 66 --
 src/makefiles/Makefile.osf | 10 --
 src/template/osf   |  6 
 14 files changed, 4 insertions(+), 210 deletions(-)
 delete mode 100644 src/backend/port/dynloader/osf.c
 delete mode 100644 src/backend/port/dynloader/osf.h
 delete mode 100644 src/include/port/osf.h
 delete mode 100644 src/makefiles/Makefile.osf
 delete mode 100644 src/template/osf

diff --git a/configure b/configure
index da89c69..c9388f2 100755
--- a/configure
+++ b/configure
@@ -2855,7 +2855,6 @@ dragonfly*) template=netbsd ;;
mingw*) template=win32 ;;
   netbsd*) template=netbsd ;;
  openbsd*) template=openbsd ;;
- osf*) template=osf ;;
  sco*) template=sco ;;
  solaris*) template=solaris ;;
sysv5*) template=unixware ;;
diff --git a/configure.in b/configure.in
index 7dfbe9f..f8bf466 100644
--- a/configure.in
+++ b/configure.in
@@ -69,7 +69,6 @@ dragonfly*) template=netbsd ;;
mingw*) template=win32 ;;
   netbsd*) template=netbsd ;;
  openbsd*) template=openbsd ;;
- osf*) template=osf ;;
  sco*) template=sco ;;
  solaris*) template=solaris ;;
sysv5*) template=unixware ;;
diff --git a/doc/src/sgml/dfunc.sgml b/doc/src/sgml/dfunc.sgml
index 3d90a36..b78537c 100644
--- a/doc/src/sgml/dfunc.sgml
+++ b/doc/src/sgml/dfunc.sgml
@@ -207,27 +207,6 @@ gcc -G -o foo.so foo.o
 
varlistentry
 term
- systemitem class=osnameTru64 UNIX/
- indextermprimaryTru64 UNIX/secondaryshared library//
- indextermprimaryDigital UNIX/seeTru64 UNIX//
-/term
-listitem
- para
-  acronymPIC/acronym is the default, so the compilation command
-  is the usual one.  commandld/command with special options is
-  used to do the linking.
-programlisting
-cc -c foo.c
-ld -shared -expect_unresolved '*' -o foo.so foo.o
-/programlisting
-  The same procedure is used with GCC instead of the system
-  compiler; no special options are required.
- /para
-/listitem
-   /varlistentry
-
-   varlistentry
-term
  systemitem class=osnameUnixWare/
  indextermprimaryUnixWare/secondaryshared library//
 /term
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 7353c61..2e81bbb 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -1669,8 +1669,7 @@ PostgreSQL, contrib and HTML documentation successfully made. Ready to install.
 systemitem class=osnameHP-UX/,
 systemitem class=osnameLinux/,
 systemitem class=osnameNetBSD/, systemitem
-class=osnameOpenBSD/, systemitem class=osnameTru64
-UNIX/ (formerly systemitem class=osnameDigital UNIX/), and
+class=osnameOpenBSD/, and
 systemitem class=osnameSolaris/.
/para
 
@@ -1981,7 

Re: [HACKERS] change alter user to be a true alias for alter role

2014-06-27 Thread Vik Fearing
On 06/19/2014 07:18 PM, Tom Lane wrote:
 Jov am...@amutu.com writes:
 the doc say:
 ALTER USER is now an alias for ALTER 
 ROLEhttp://www.postgresql.org/docs/devel/static/sql-alterrole.html
 
 but alter user lack the following format:
 ...
 
 If we're going to have a policy that these commands be exactly equivalent,
 it seems like this patch is just sticking a finger into the dike.  It does
 nothing to prevent anyone from making the same mistake again in future.
 
 What about collapsing both sets of productions into one, along the lines
 of
 
 role_or_user: ROLE | USER;
 
 AlterRoleSetStmt:
   ALTER role_or_user RoleId opt_in_database SetResetClause
 
 (and similarly to the latter for every existing ALTER ROLE variant).
 
 Because MAPPING is an unreserved keyword, I think that this approach
 might force us to also change ALTER USER MAPPING to ALTER role_or_user
 MAPPING, which is not contemplated by the SQL standard.  But hey,
 it would satisfy the principle of least surprise no?  Anyway we don't
 have to document that that would work.

After a week of silence from Jov, I decided to do this myself since it
didn't seem very hard.

Many frustrating hours of trying to understand why I'm getting
shift/reduce conflicts by the hundreds later, I've decided to give up
for now.
-- 
Vik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-27 Thread Alvaro Herrera
Ryan Johnson wrote:
 On 26/06/2014 11:04 PM, Alvaro Herrera wrote:
 Ryan Johnson wrote:
 As part of a research project, I'm trying to change Read Committed
 isolation to use HeapTupleSatisfiesNow rather than acquiring a new
 snapshot at every command [1].
 Are you aware of this?
 
 commit 813fb0315587d32e3b77af1051a0ef517d187763
 Author: Robert Haas rh...@postgresql.org
 Date:   Thu Aug 1 10:46:19 2013 -0400
 
  Remove SnapshotNow and HeapTupleSatisfiesNow.

 That would be wonderful news... if snapshots weren't so darned
 expensive to create.

I take it you aren't aware of this other effort, either:
http://archives.postgresql.org/message-id/539ad153.9000...@vmware.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL replay bugs

2014-06-27 Thread Michael Paquier
On Fri, Jun 27, 2014 at 2:57 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 Here are rebased patches, their was a conflict with a recent commit in
 contrib/pg_upgrade.

I am resending patch 2 as it contained a rebase conflict not correctly
resolved (Thanks Alvaro).

Regards,
-- 
Michael
From d4f0289ffcece54a78e51e8b707c41e994d549ee Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Fri, 27 Jun 2014 23:35:29 +0900
Subject: [PATCH 2/3] Extract generic bash initialization process from
 pg_upgrade

Such initialization is useful as well for some other utilities and makes
test settings consistent.
---
 contrib/pg_upgrade/test.sh | 47 
 src/test/shell/init_env.sh | 60 ++
 2 files changed, 64 insertions(+), 43 deletions(-)
 create mode 100644 src/test/shell/init_env.sh

diff --git a/contrib/pg_upgrade/test.sh b/contrib/pg_upgrade/test.sh
index 7bbd2c7..2e1c61a 100644
--- a/contrib/pg_upgrade/test.sh
+++ b/contrib/pg_upgrade/test.sh
@@ -9,24 +9,14 @@
 # Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
 # Portions Copyright (c) 1994, Regents of the University of California
 
-set -e
-
-: ${MAKE=make}
-
-# Guard against parallel make issues (see comments in pg_regress.c)
-unset MAKEFLAGS
-unset MAKELEVEL
-
-# Establish how the server will listen for connections
-testhost=`uname -s`
+# Initialize test
+. ../../src/test/shell/init_env.sh
 
 case $testhost in
 	MINGW*)
-		LISTEN_ADDRESSES=localhost
 		PGHOST=; unset PGHOST
 		;;
 	*)
-		LISTEN_ADDRESSES=
 		# Select a socket directory.  The algorithm is from the configure
 		# script; the outcome mimics pg_regress.c:make_temp_sockdir().
 		PGHOST=$PG_REGRESS_SOCK_DIR
@@ -102,37 +92,8 @@ logdir=$PWD/log
 rm -rf $logdir
 mkdir $logdir
 
-# Clear out any environment vars that might cause libpq to connect to
-# the wrong postmaster (cf pg_regress.c)
-#
-# Some shells, such as NetBSD's, return non-zero from unset if the variable
-# is already unset. Since we are operating under 'set -e', this causes the
-# script to fail. To guard against this, set them all to an empty string first.
-PGDATABASE=;unset PGDATABASE
-PGUSER=;unset PGUSER
-PGSERVICE=; unset PGSERVICE
-PGSSLMODE=; unset PGSSLMODE
-PGREQUIRESSL=;  unset PGREQUIRESSL
-PGCONNECT_TIMEOUT=; unset PGCONNECT_TIMEOUT
-PGHOSTADDR=;unset PGHOSTADDR
-
-# Select a non-conflicting port number, similarly to pg_regress.c
-PG_VERSION_NUM=`grep '#define PG_VERSION_NUM' $newsrc/src/include/pg_config.h | awk '{print $3}'`
-PGPORT=`expr $PG_VERSION_NUM % 16384 + 49152`
-export PGPORT
-
-i=0
-while psql -X postgres /dev/null 2/dev/null
-do
-	i=`expr $i + 1`
-	if [ $i -eq 16 ]
-	then
-		echo port $PGPORT apparently in use
-		exit 1
-	fi
-	PGPORT=`expr $PGPORT + 1`
-	export PGPORT
-done
+# Get a port to run the tests
+pg_get_test_port $newsrc
 
 # buildfarm may try to override port via EXTRA_REGRESS_OPTS ...
 EXTRA_REGRESS_OPTS=$EXTRA_REGRESS_OPTS --port=$PGPORT
diff --git a/src/test/shell/init_env.sh b/src/test/shell/init_env.sh
new file mode 100644
index 000..d37eb69
--- /dev/null
+++ b/src/test/shell/init_env.sh
@@ -0,0 +1,60 @@
+#!/bin/sh
+
+# src/test/shell/init.sh
+#
+# Utility initializing environment for tests to be conducted in shell.
+# The initialization done here is made to be platform-proof.
+
+set -e
+
+: ${MAKE=make}
+
+# Guard against parallel make issues (see comments in pg_regress.c)
+unset MAKEFLAGS
+unset MAKELEVEL
+
+# Set listen_addresses desirably
+testhost=`uname -s`
+case $testhost in
+	MINGW*)	LISTEN_ADDRESSES=localhost ;;
+	*)		LISTEN_ADDRESSES= ;;
+esac
+
+# Clear out any environment vars that might cause libpq to connect to
+# the wrong postmaster (cf pg_regress.c)
+#
+# Some shells, such as NetBSD's, return nonzero from unset if the variable
+# is already unset. Since we are operating under 'set e', this causes the
+# script to fail. To guard against this, set them all to an empty string first.
+PGDATABASE=;unset PGDATABASE
+PGUSER=;unset PGUSER
+PGSERVICE=; unset PGSERVICE
+PGSSLMODE=; unset PGSSLMODE
+PGREQUIRESSL=;  unset PGREQUIRESSL
+PGCONNECT_TIMEOUT=; unset PGCONNECT_TIMEOUT
+PGHOST=;unset PGHOST
+PGHOSTADDR=;unset PGHOSTADDR
+
+# Select a non-conflicting port number, similarly to pg_regress.c, and
+# save its value in PGPORT. Caller should either save or use this value
+# for the tests.
+pg_get_test_port()
+{
+	PG_ROOT_DIR=$1
+	PG_VERSION_NUM=`grep '#define PG_VERSION_NUM' $PG_ROOT_DIR/src/include/pg_config.h | awk '{print $3}'`
+	PGPORT=`expr $PG_VERSION_NUM % 16384 + 49152`
+	export PGPORT
+
+	i=0
+	while psql -X postgres /dev/null 2/dev/null
+	do
+		i=`expr $i + 1`
+		if [ $i -eq 16 ]
+		then
+			echo port $PGPORT apparently in use
+			exit 1
+		fi
+		PGPORT=`expr $PGPORT + 1`
+		export PGPORT
+	done
+}
-- 
2.0.0


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-27 Thread Ryan Johnson

On 27/06/2014 3:14 AM, Andres Freund wrote:

On 2014-06-26 22:47:47 -0600, Ryan Johnson wrote:

Hi,

As part of a research project, I'm trying to change Read Committed isolation
to use HeapTupleSatisfiesNow rather than acquiring a new snapshot at every
command [1]. Things appear to have gone reasonably well so far, except
certain queries fail with ERROR:  non-MVCC snapshots are not supported in
index-only scans.

You're aware that unless you employ additional locking you can simply
miss individual rows or see them several times because of concurrent
updates?
The reason it has worked ( 9.4) for system catalogs is that updates of
rows were only performed while objects were locked access exclusively -
that's the reason why some places in the code use inplace updates btw...
Yes, I was aware of the need for locking. The documentation just made it 
sound that locking was already in place for non-MVCC index scans. I was 
hoping I'd missed some easy way to activate it.


Regards,
Ryan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-27 Thread Andres Freund
On 2014-06-27 08:39:13 -0600, Ryan Johnson wrote:
 On 27/06/2014 3:14 AM, Andres Freund wrote:
 On 2014-06-26 22:47:47 -0600, Ryan Johnson wrote:
 Hi,
 
 As part of a research project, I'm trying to change Read Committed isolation
 to use HeapTupleSatisfiesNow rather than acquiring a new snapshot at every
 command [1]. Things appear to have gone reasonably well so far, except
 certain queries fail with ERROR:  non-MVCC snapshots are not supported in
 index-only scans.
 You're aware that unless you employ additional locking you can simply
 miss individual rows or see them several times because of concurrent
 updates?
 The reason it has worked ( 9.4) for system catalogs is that updates of
 rows were only performed while objects were locked access exclusively -
 that's the reason why some places in the code use inplace updates btw...

 Yes, I was aware of the need for locking. The documentation just made it
 sound that locking was already in place for non-MVCC index scans. I was
 hoping I'd missed some easy way to activate it.

Well, it is/was for the places (i.e. DDL) that actually use non-MVCC
scans.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgstat_heap() consults freed memory

2014-06-27 Thread Noah Misch
pgstat_heap() creates a BufferAccessStrategy and attaches it to a
HeapScanDesc.  It continues to use that strategy after calling heap_endscan(),
which frees the strategy.  This is only a risk when the table contains empty
pages at the end.  I get a crash in an assert-enabled build with this test
procedure, after disabling autovacuum:

-- session 1
create table t (c) as select * from generate_series(1,2);
delete from t where c  1;
-- session 2
begin; lock table t in access share mode;
-- session 1
vacuum t;
-- restart PostgreSQL to clear shared buffers
-- session 3
select * from pgstattuple('t');

The simplest fix is to move the heap_endscan() call past the last use of the
strategy.  However, I don't think this function ought to be creating a
strategy explicitly.  It should use the one that initscan() creates, if any.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
diff --git a/contrib/pgstattuple/pgstattuple.c 
b/contrib/pgstattuple/pgstattuple.c
index edc603f..1007748 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -274,7 +274,6 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
BlockNumber tupblock;
Buffer  buffer;
pgstattuple_type stat = {0};
-   BufferAccessStrategy bstrategy;
SnapshotData SnapshotDirty;
 
/* Disable syncscan because we assume we scan from block zero upwards */
@@ -283,10 +282,6 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 
nblocks = scan-rs_nblocks; /* # blocks to be scanned */
 
-   /* prepare access strategy for this table */
-   bstrategy = GetAccessStrategy(BAS_BULKREAD);
-   scan-rs_strategy = bstrategy;
-
/* scan the relation */
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
@@ -320,26 +315,28 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
{
CHECK_FOR_INTERRUPTS();
 
-   buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block, 
RBM_NORMAL, bstrategy);
+   buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
+   
RBM_NORMAL, scan-rs_strategy);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
stat.free_space += PageGetHeapFreeSpace((Page) 
BufferGetPage(buffer));
UnlockReleaseBuffer(buffer);
block++;
}
}
-   heap_endscan(scan);
 
while (block  nblocks)
{
CHECK_FOR_INTERRUPTS();
 
-   buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block, 
RBM_NORMAL, bstrategy);
+   buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
+   
RBM_NORMAL, scan-rs_strategy);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
stat.free_space += PageGetHeapFreeSpace((Page) 
BufferGetPage(buffer));
UnlockReleaseBuffer(buffer);
block++;
}
 
+   heap_endscan(scan);
relation_close(rel, AccessShareLock);
 
stat.table_len = (uint64) nblocks *BLCKSZ;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-27 Thread Ryan Johnson

On 27/06/2014 8:20 AM, Alvaro Herrera wrote:

Ryan Johnson wrote:

On 26/06/2014 11:04 PM, Alvaro Herrera wrote:

Ryan Johnson wrote:

As part of a research project, I'm trying to change Read Committed
isolation to use HeapTupleSatisfiesNow rather than acquiring a new
snapshot at every command [1].

Are you aware of this?

commit 813fb0315587d32e3b77af1051a0ef517d187763
Author: Robert Haas rh...@postgresql.org
Date:   Thu Aug 1 10:46:19 2013 -0400

 Remove SnapshotNow and HeapTupleSatisfiesNow.

That would be wonderful news... if snapshots weren't so darned
expensive to create.

I take it you aren't aware of this other effort, either:
http://archives.postgresql.org/message-id/539ad153.9000...@vmware.com
That is good news, though from reading the thread it sounds like proc 
array accesses are being exchanged for accesses to an SLRU, so a lot of 
lwlock calls will remain. It will definitely help, though. SLRU will get 
ex-locked a lot less often, so the main source of contention will be for 
the actual lwlock acquire/release operations.


Regards,
Ryan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-27 Thread Robert Haas
On Thu, Jun 26, 2014 at 5:01 PM, Andres Freund and...@2ndquadrant.com wrote:
 But a) isn't really avoidable because it'll otherwise generate compiler
 warnings and b) is done that way all over the tree. E.g. lwlock.c has
 several copies of (note the nonvolatility of proc):
 volatile LWLock *lock = l;
 PGPROC *proc = MyProc;
 ...
 proc-lwWaiting = true;
 proc-lwWaitMode = LW_WAIT_UNTIL_FREE;
 proc-lwWaitLink = NULL;

 /* waiters are added to the front of the queue */
 proc-lwWaitLink = lock-head;
 if (lock-head == NULL)
 lock-tail = proc;
 lock-head = proc;

 /* Can release the mutex now */
 SpinLockRelease(lock-mutex);
 There's nothing forcing the compiler to not move any of the proc-*
 assignments past the SpinLockRelease(). And indeed in my case it was
 actually the store to lwWaitLink that was moved across the lock.

 I think it's just pure luck that there's no active bug (that we know of)
 caused by this. I wouldn't be surprised if some dubious behaviour we've
 seen would be caused by similar issues.

 Now, we can fix this and similar cases by more gratuitous use of
 volatile. But for one we're never going to find all cases. For another
 it won't help *at all* for architectures with looser CPU level memory
 ordering guarantees.
 I think we finally need to bite the bullet and make all S_UNLOCKs
 compiler/write barriers.

There are two separate issues here:

1. SpinLockAcquire and SpinLockRelease are not guaranteed to be
compiler barriers, so all relevant memory accesses in the critical
section need to be done using volatile pointers.  Failing to do this
is an easy mistake to make, and we've fixed numerous bugs of this type
over the years (most recently, to my knowledge, in
4bc15a8bfbc7856bc3426dc9ab99567eebbb64d3).  Forcing SpinLockAcquire()
and SpinLockRelease() to serve as compiler barriers would let us
dispense with a whole lot of volatile calls and make writing future
code correctly a lot easier.

2. Some of our implementations of SpinLockAcquire and/or
SpinLockRelease, but in particular SpinLockRelease, may not actually
provide the memory-ordering semantics which they are required to
provide.  In particular, ...

 I'd previously, in
 http://www.postgresql.org/message-id/20130920151110.ga8...@awork2.anarazel.de,
 gone through the list of S_UNLOCKs and found several that were
 lacking. Most prominently the default S_UNLOCK is just
 #define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0)
 which allows the compiler to move non volatile access across and does
 nothing for CPU level cache coherency.

...this default implementation of S_UNLOCK() is pretty sketchy.  Even
on a platform that enforces reads in program order and writes in
program order, this is still unsafe because a read within the critical
section might get postponed until after this write.  Now, x86 happens
to have an additional constraint, which is that it can reorder loads
before stores but not stores before loads; so that coding happens to
provide release semantics.  But that need not be true on every
architecture, and the trend seems to be toward weaker memory ordering.
As you pointed out to me on chat, the non-intrinsics based ARM
implementation brokenly relies on the default S_UNLOCK(), which
clearly isn't adequate.

Now, in terms of solving these problems:

I tend to think that we should try to think about these two problems
somewhat separately.  As to #1, in the back-branches, I think further
volatile-izing the LWLock* routines is probably the only realistic
solution.  In master, I fully support moving the goalposts such that
we require SpinLockAcquire() and SpinLockRelease() are compiler
barriers.  Once we do this, I think we should go back and rip out all
the places where we've used volatile-ized pointers to provide compiler
ordering.  That way, if we haven't actually managed to provide
compiler ordering everywhere, it's more likely that something will
fall over and warn us about the problem; plus, that avoids keeping
around a coding pattern which isn't actually the one we want people to
copy.  However, I think your proposed S_UNLOCKED_UNLOCK() hack is
plain ugly, probably cripplingly slow, and there's no guarantee that's
even correct; see for example the comments about Itanium's tas
implementation possibly being only an acquire barrier (blech).

On gcc and icc, which account for lines 99 through 700 of spin.h, it
should be simple and mechanical to use compiler intrinsics to make
sure that every S_UNLOCK implementation includes a compiler barrier.
However, lines 710 through 895 support non-gcc, non-icc compilers, and
some of those we may not know how to implement a compiler barrier - in
particular, Univel CC, the Tru64 Alpha compiler, HPPA, AIX, or Sun's
compilers.  Except for Sun, we have no buildfarm support for those
platforms, so we could consider just dropping support entirely, but
I'd be inclined to do 

Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-27 Thread Robert Haas
On Thu, Jun 26, 2014 at 6:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-26 14:13:07 -0700, Tom Lane wrote:
 Surely it had better be a read barrier as well?

 I don't immediately see why it has to be read barrier? Hoisting a load
 from after the release into the locked area of code should be safe?

 No doubt, but delaying a read till after the unlocking write would
 certainly not be safe.

 AFAICT, README.barrier completely fails to define what we think the
 semantics of pg_read_barrier and pg_write_barrier actually are, so if
 you believe that a write barrier prevents reordering of reads relative to
 writes, you'd better propose some new text for that file.  It certainly
 doesn't say that today.

The relevant text is in barrier.h

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-27 Thread Robert Haas
On Fri, Jun 27, 2014 at 9:59 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-24 10:22:08 -0700, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-24 13:03:37 -0400, Noah Misch wrote:
  If a change has the potential to make some architectures give wrong
  answers only at odd times, that's a different kind of problem.  For
  that reason, actively breaking Alpha is a good thing.

  Not sure what you mean with the 'actively breaking Alpha' statement?
  That we should drop Alpha?

 +1.  Especially with no buildfarm critter.  Would anyone here care
 to bet even the price of a burger that Alpha isn't broken already?

 Here's a patch removing alpha/true64/osf/1 support. I think I got most
 relevant references, not sure if I missed something.

 Since there seems to be (unanimous?) support for dropping alpha and some
 patches coming up that need to deal with platform dependent stuff it
 seems sensible to do this first.

I have noticed that most PostgreSQL committers seem for format their
commit messages so that paragraphs are separated by a blank line, but
you seem not to do that.  I find that less readable.

I don't personally object to dropping Alpha, but when this was
discussed back in October, Stefan did:

http://www.postgresql.org/message-id/52616373.10...@kaltenbrunner.cc

But I think he's rather in the minority anyway.  Also, if we added a
fallback implementation for spinlocks that uses GCC intrinsics, it
would probably work again, as much as it does now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-27 Thread Robert Haas
On Thu, Jun 26, 2014 at 3:04 PM, Andres Freund and...@2ndquadrant.com wrote:
 I don't really see usecases where it's not related data that's being
 touched, but: The point is that the fastpath (not using a spinlock) might
 touch the atomic variable, even while the slowpath has acquired the
 spinlock. So the slowpath can't just use non-atomic operations on the
 atomic variable.
 Imagine something like releasing a buffer pin while somebody else is
 doing something that requires holding the buffer header spinlock.

If the atomic variable can be manipulated without the spinlock under
*any* circumstances, then how is it a good idea to ever manipulate it
with the spinlock?  That seems hard to reason about, and unnecessary.
Critical sections for spinlocks should be short and contain only the
instructions that need protection, and clearly the atomic op is not
one of those.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-27 Thread Robert Haas
On Thu, Jun 26, 2014 at 3:50 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:
 At 2014-06-25 10:36:07 -0400, sfr...@snowman.net wrote:

 To me, that's ridiculous on the face of it.  Other databases have had
 this kind of capability as a matter of course for decades- we are far
 behind in this area.

 OK, I've done a bit more thinking, but I'm not convinced that it's so
 cut-and-dried (as ridiculous on the face of it).

 Ian looked into the auditing capabilities of various (SQL and otherwise)
 systems some time ago. Informix handles its auditing configuration
 entirely outside the database. DB2 uses a mixture of in-database and
 external configuration. Oracle stores everything in the database.

 We're certainly not going to invent a mechanism other than GUC settings
 or catalog tables for auditing information (à la Informix). What I think
 you're saying is that without storing stuff in the catalog tables, we're
 not going to be able to offer auditing capabilities worth having. I can
 appreciate that, but I'm not sure I agree.

 You're right, it isn't possible to sanely do something like AUDIT
 SELECT ON TABLE t1 FOR role1 without storing configuration in the
 catalog tables. You're right, it would be nice to have that level
 of control.

 The part that I don't agree about is that using a GUC setting and
 a reloption won't give us anything useful. We can use that to do
 global, per-database, per-user, and per-object auditing with just
 mechanisms that are familiar to Postgres users already (ALTER … SET).
 Some other messages in the thread have suggested a similar mechanism,
 which implies that at least some people wouldn't be unhappy with it.

 No, we couldn't do combinations of TABLE/ROLE, but there's also no
 terrible conflict with doing that via some new AUDIT foo ON bar
 syntax later.

 As you point out, we're decades behind, and I seriously doubt if
 we're going to jump forward a few decades in time for 9.5.

 What do others think of the tradeoff?

I agree with Stephen's statement that the audit configuration
shouldn't be managed in flat files.  If somebody wants to do that in
an extension, fine, but I don't think we should commit anything into
our source tree that works that way.  The right way to manage that
kind of configuration is with GUCs, or reloptions, or (security)
labels, or some other kind of catalog structure.  Renaming an object
shouldn't be able to break auditing, but if the audit configuration is
a list of table names in a file somewhere, it will.

I disagree with Stephen's proposal that this should be in core, or
that it needs its own dedicated syntax.  I think contrib is a great
place for extensions like this.  That makes it a whole lot easier for
people to develop customized vesions that meet particular needs they
may have, and it avoids bloating core with a bunch of stuff that is
only of interest to a subset of users.  We spent years getting sepgsql
into a form that could be shipped in contrib without substantially
burdening core, and I don't see why this feature should be handed any
differently.

In fact, considering how much variation there is likely to be between
auditing requirements at different sites, I'm not sure this should
even be in contrib at this point.  It's clearly useful, but there is
no requirement that every useful extension must be bundled with the
core server, and in fact doing so severely limits our ability to make
further changes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-27 Thread Andres Freund
On 2014-06-27 13:00:34 -0400, Robert Haas wrote:
 There are two separate issues here:
 
 1. SpinLockAcquire and SpinLockRelease are not guaranteed to be
 compiler barriers, so all relevant memory accesses in the critical
 section need to be done using volatile pointers.  Failing to do this
 is an easy mistake to make, and we've fixed numerous bugs of this type
 over the years (most recently, to my knowledge, in
 4bc15a8bfbc7856bc3426dc9ab99567eebbb64d3).  Forcing SpinLockAcquire()
 and SpinLockRelease() to serve as compiler barriers would let us
 dispense with a whole lot of volatile calls and make writing future
 code correctly a lot easier.

And actually faster in some cases. I'm just playing around with some
bigger POWER machine and removing the volatiles in GetSnapshotData() +
some access forcing macro for accessing xmin is worth 1% or so.

 2. Some of our implementations of SpinLockAcquire and/or
 SpinLockRelease, but in particular SpinLockRelease, may not actually
 provide the memory-ordering semantics which they are required to
 provide.

 I tend to think that we should try to think about these two problems
 somewhat separately.  As to #1, in the back-branches, I think further
 volatile-izing the LWLock* routines is probably the only realistic
 solution.

We could also decide not to do anything :/.

  In master, I fully support moving the goalposts such that
 we require SpinLockAcquire() and SpinLockRelease() are compiler
 barriers.  Once we do this, I think we should go back and rip out all
 the places where we've used volatile-ized pointers to provide compiler
 ordering.  That way, if we haven't actually managed to provide
 compiler ordering everywhere, it's more likely that something will
 fall over and warn us about the problem; plus, that avoids keeping
 around a coding pattern which isn't actually the one we want people to
 copy.

+1

 However, I think your proposed S_UNLOCKED_UNLOCK() hack is
 plain ugly, probably cripplingly slow, and there's no guarantee that's
 even correct; see for example the comments about Itanium's tas
 implementation possibly being only an acquire barrier (blech).

Heh. I don't think it's worse than the current fallback barrier
implementation. The S_UNLOCKED_UNLOCK() thing was just to avoid
recursion when using a barrier in the spinlock used to implement
barriers...

 On gcc and icc, which account for lines 99 through 700 of spin.h, it
 should be simple and mechanical to use compiler intrinsics to make
 sure that every S_UNLOCK implementation includes a compiler barrier.
 However, lines 710 through 895 support non-gcc, non-icc compilers, and
 some of those we may not know how to implement a compiler barrier - in
 particular, Univel CC, the Tru64 Alpha compiler, HPPA, AIX, or Sun's
 compilers.  Except for Sun, we have no buildfarm support for those
 platforms, so we could consider just dropping support entirely

Both sun's and AIX's compilers can relatively easily be handled:
* Solaris has atomic.h with membar_enter() et al. Apparently since at least
solaris 9.
* XLC has __fence and __isync intrinsics.

There's been recent talk about AIX, including about AIX animal, so I'd
be hesitant to drop it. It's also still developed.

I'm obviously in favor of dropping Alpha. And I'm, unsurprisingly, all
for removing unixware support (which is what univel CC seems to be used
for after you dropped the univel port proper).

I think the only person that has used postgres on hppa in the last 5
years is Tom, so I guess he'll have to speak up about it. Tom?

 void fake_compiler_barrier(void) { }
 void (*fake_compiler_barrier_hook) = fake_compiler_barrier;
 #define pg_compiler_barrier() ((*fake_compiler_barrier_hook)())

But we can do that as a fallback. It's what HPPA's example spinlock
implementation does after all.

 Now, this doesn't remove the circular dependency between s_lock.h and
 barrier.h, because we still don't have a fallback method, other than
 acquiring and releasing a spinlock, of implementing a barrier that
 blocks both compiler reordering and CPU reordering.  But it is enough
 to solve problem #1, and doesn't require that we drop support for
 anything that works now.

I think we can move the fallback into a C function. Compared to the cost
of a tas/unlock that shouldn't be significant.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-27 Thread Andres Freund
On 2014-06-27 13:12:31 -0400, Robert Haas wrote:
 I have noticed that most PostgreSQL committers seem for format their
 commit messages so that paragraphs are separated by a blank line, but
 you seem not to do that.  I find that less readable.

I'll change that.

  Since there seems to be (unanimous?) support for dropping alpha and some
  patches coming up that need to deal with platform dependent stuff it
  seems sensible to do this first.

 I don't personally object to dropping Alpha, but when this was
 discussed back in October, Stefan did:
 
 http://www.postgresql.org/message-id/52616373.10...@kaltenbrunner.cc

Ah, right. I still am in favor of dropping it because I don't it is
likely to work, but, as a compromise, we could remove only the Tru64
variant? Openbsd + gcc is much less of a hassle.

 But I think he's rather in the minority anyway.

Looks like it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-27 Thread Andres Freund
On 2014-06-27 13:04:02 -0400, Robert Haas wrote:
 On Thu, Jun 26, 2014 at 6:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-26 14:13:07 -0700, Tom Lane wrote:
  Surely it had better be a read barrier as well?
 
  I don't immediately see why it has to be read barrier? Hoisting a load
  from after the release into the locked area of code should be safe?
 
  No doubt, but delaying a read till after the unlocking write would
  certainly not be safe.
 
  AFAICT, README.barrier completely fails to define what we think the
  semantics of pg_read_barrier and pg_write_barrier actually are, so if
  you believe that a write barrier prevents reordering of reads relative to
  writes, you'd better propose some new text for that file.  It certainly
  doesn't say that today.
 
 The relevant text is in barrier.h

Note that that definition of a write barrier is *not* sufficient for the
release of a lock... As I said elsewhere I think all the barrier
definitions, except maybe alpha, luckily seem to be strong enough for
that anyway.

Do we want to introduce acquire/release barriers? Or do we want to
redefine the current barriers to be strong enough for that?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-27 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 I agree with Stephen's statement that the audit configuration
 shouldn't be managed in flat files.  If somebody wants to do that in
 an extension, fine, but I don't think we should commit anything into
 our source tree that works that way.  The right way to manage that
 kind of configuration is with GUCs, or reloptions, or (security)
 labels, or some other kind of catalog structure.  Renaming an object
 shouldn't be able to break auditing, but if the audit configuration is
 a list of table names in a file somewhere, it will.

Thanks for your thoughts on this.

 I disagree with Stephen's proposal that this should be in core, or
 that it needs its own dedicated syntax.  I think contrib is a great
 place for extensions like this.  That makes it a whole lot easier for
 people to develop customized vesions that meet particular needs they
 may have, and it avoids bloating core with a bunch of stuff that is
 only of interest to a subset of users.  We spent years getting sepgsql
 into a form that could be shipped in contrib without substantially
 burdening core, and I don't see why this feature should be handed any
 differently.

I don't exactly see how an extension or contrib module is going to be
able to have a reasonable catalog structure which avoids the risk that
renaming an object (role, schema, table, column, policy) will break the
configuration without being part of core.  Add on to that the desire for
audit control to be a grant'able right along the lines of:

GRANT AUDIT ON TABLE x TO audit_role;

which allows a user who is not the table owner or a superuser to manage
the auditing.  I'm not against implementing capabilities through contrib
modules, provided the goals of the feature can be met that way.  In a
way, sepgsql is actually a good example for this- the module provides
the implementation, but all of the syntax and catalog information is
actually in core.  An implementation similar to that for auditing might
be workable, but I don't see pgaudit as being that.

 In fact, considering how much variation there is likely to be between
 auditing requirements at different sites, I'm not sure this should
 even be in contrib at this point.  It's clearly useful, but there is
 no requirement that every useful extension must be bundled with the
 core server, and in fact doing so severely limits our ability to make
 further changes.

For my part, I do view auditing as being a requirement we should be
addressing through core- and not just for the reasons mentioned above.
We might be able to manage all of the above through an extension,
however, as we continue to add capabilities and features, new types and
methods, ways to extend the system, and more, auditing needs to keep
pace and be considered a core feature as much as the GRANT system is.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-27 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-27 13:12:31 -0400, Robert Haas wrote:
 I don't personally object to dropping Alpha, but when this was
 discussed back in October, Stefan did:
 
 http://www.postgresql.org/message-id/52616373.10...@kaltenbrunner.cc

As an ex-packager I do not believe the argument that it will matter
to packagers if we desupport one of their secondary architectures.
There are many, many packages that have never claimed to work on
oddball architectures at all.  Packagers would be better served
by honesty about what we can support.

 Ah, right. I still am in favor of dropping it because I don't it is
 likely to work, but, as a compromise, we could remove only the Tru64
 variant? Openbsd + gcc is much less of a hassle.

 But I think he's rather in the minority anyway.

 Looks like it.

There would be value in continuing to support Alpha if we had one
in the buildfarm.  We don't, and have not had in any recent memory,
and I haven't noticed anyone offering to provide one in future.

The actual situation is that we're shipping a port that most
likely doesn't work, and we have no way to fix it.  That's of
no benefit to anyone.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-27 Thread Andres Freund
On 2014-06-27 13:15:25 -0400, Robert Haas wrote:
 On Thu, Jun 26, 2014 at 3:04 PM, Andres Freund and...@2ndquadrant.com wrote:
  I don't really see usecases where it's not related data that's being
  touched, but: The point is that the fastpath (not using a spinlock) might
  touch the atomic variable, even while the slowpath has acquired the
  spinlock. So the slowpath can't just use non-atomic operations on the
  atomic variable.
  Imagine something like releasing a buffer pin while somebody else is
  doing something that requires holding the buffer header spinlock.
 
 If the atomic variable can be manipulated without the spinlock under
 *any* circumstances, then how is it a good idea to ever manipulate it
 with the spinlock?  That seems hard to reason about, and unnecessary.
 Critical sections for spinlocks should be short and contain only the
 instructions that need protection, and clearly the atomic op is not
 one of those.

Imagine the situation for the buffer header spinlock which is one of the
bigger performance issues atm. We could aim to replace all usages of
that with clever and complicated logic, but it's hard.

The IMO reasonable (and prototyped) way to do it is to make the common
paths lockless, but fall back to the spinlock for the more complicated
situations. For the buffer header that means that pin/unpin and buffer
lookup are lockless, but IO and changing the identity of a buffer still
require the spinlock. My attempts to avoid the latter basically required
a buffer header specific reimplementation of spinlocks.

To make pin/unpin et al safe without acquiring the spinlock the pincount
and the flags had to be moved into one variable so that when
incrementing the pincount you automatically get the flagbits back. If
it's currently valid all is good, otherwise the spinlock needs to be
acquired. But for that to work you need to set flags while the spinlock
is held.

Possibly you can come up with ways to avoid that, but I am pretty damn
sure that the code will be less robust by forbidding atomics inside a
critical section, not the contrary. It's a good idea to avoid it, but
not at all costs.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #8673: Could not open file pg_multixact/members/xxxx on slave during hot_standby

2014-06-27 Thread Alvaro Herrera
Jeff Janes wrote:

 This problem was initially fairly easy to reproduce, but since I
 started adding instrumentation specifically to catch it, it has become
 devilishly hard to reproduce.
 
 I think my next step will be to also log each of the values which goes
 into the complex if (...) expression that decides on the deletion.

Could you please to reproduce it after updating to latest?  I pushed
fixes that should close these issues.  Maybe you want to remove the
instrumentation you added, to make failures more likely.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-27 Thread Stefan Kaltenbrunner
On 06/27/2014 08:26 PM, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-27 13:12:31 -0400, Robert Haas wrote:
 I don't personally object to dropping Alpha, but when this was
 discussed back in October, Stefan did:

 http://www.postgresql.org/message-id/52616373.10...@kaltenbrunner.cc
 
 As an ex-packager I do not believe the argument that it will matter
 to packagers if we desupport one of their secondary architectures.
 There are many, many packages that have never claimed to work on
 oddball architectures at all.  Packagers would be better served
 by honesty about what we can support.

yeah I guess so - I was mostly pointing out that alpha looked like to be
a way more active platform than most of what was discussed in that
thread. I personally dont think that continuing to support alpha will
buy us anything...

 
 Ah, right. I still am in favor of dropping it because I don't it is
 likely to work, but, as a compromise, we could remove only the Tru64
 variant? Openbsd + gcc is much less of a hassle.
 
 But I think he's rather in the minority anyway.
 
 Looks like it.
 
 There would be value in continuing to support Alpha if we had one
 in the buildfarm.  We don't, and have not had in any recent memory,
 and I haven't noticed anyone offering to provide one in future.
 
 The actual situation is that we're shipping a port that most
 likely doesn't work, and we have no way to fix it.  That's of
 no benefit to anyone.

yeah I dont have access to any alpha hardware to provide a buildfarm box
so I cant help there :(



Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-27 Thread Noah Misch
On Fri, Jun 27, 2014 at 07:51:32PM +0200, Andres Freund wrote:
 On 2014-06-27 13:12:31 -0400, Robert Haas wrote:
  I don't personally object to dropping Alpha, but when this was
  discussed back in October, Stefan did:
  
  http://www.postgresql.org/message-id/52616373.10...@kaltenbrunner.cc
 
 Ah, right. I still am in favor of dropping it because I don't it is
 likely to work, but, as a compromise, we could remove only the Tru64
 variant? Openbsd + gcc is much less of a hassle.

Retaining Alpha support means placing data dependency barriers (Linux kernel
term) where the Alpha memory model requires them.  I doubt enough users will
stress Alpha builds for us to distinguish a missing barrier from hardware
flakiness, so we'd never find out whether we did it right.  That's why I favor
removing Alpha-specific code completely, and it is an OS-independent and
compiler-independent motive.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-27 Thread David Fetter
On Sat, Jun 21, 2014 at 11:06:26AM -0700, Kevin Grittner wrote:
 Kevin Grittner kgri...@ymail.com wrote:
  Kevin Grittner kgri...@ymail.com wrote:
 
  I've already said that I now think we should use the standard
  CREATE TRIGGER syntax to specify the transition tables, and that
  if we do that we don't need the reloption that's in the patch.
  If you ignore the 19 lines of new code to add that reloption,
  absolutely 100% of the code changes in the patch so far are in
  trigger.c and trigger.h.
 
  Although nobody has actually framed their feedback as a review, I
  feel that I have enough to work with to throw the patch into
  Waiting on Author status.  Since I started with the assumption
  that I was going to be using standard syntax and got a ways into
  that before convincing myself it was a bad idea, I should have a
  new version of the patch working that way in a couple days.
 
 Here is v2.

Thanks!

I've taken the liberty of making an extension that uses this.
Preliminary tests indicate a 10x performance improvement over the
user-space hack I did that's similar in functionality.

Please find attached the extension, etc., which I've published to
https://github.com/davidfetter/postgresql_projects/tree/test_delta_v2

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/contrib/statement_trigger_row/Makefile 
b/contrib/statement_trigger_row/Makefile
new file mode 100644
index 000..e0cf006
--- /dev/null
+++ b/contrib/statement_trigger_row/Makefile
@@ -0,0 +1,17 @@
+# contrib/statement_trigger_row/Makefile
+
+MODULES = statement_trigger_row
+
+EXTENSION = statement_trigger_row
+DATA = statement_trigger_row--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/statement_trigger_row
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/statement_trigger_row/sql/easy_way.sql 
b/contrib/statement_trigger_row/sql/easy_way.sql
new file mode 100644
index 000..019ae7f
--- /dev/null
+++ b/contrib/statement_trigger_row/sql/easy_way.sql
@@ -0,0 +1,85 @@
+/*
+ * If these were surfaced to PL/pgsql, this is what it might look like.
+ */
+
+BEGIN;
+
+CREATE TABLE a(
+id SERIAL PRIMARY KEY,
+i INT
+);
+
+CREATE FUNCTION summarize_a_inserts()
+RETURNS TRIGGER LANGUAGE plpgsql
+AS $$
+DECLARE
+the_sum BIGINT;
+BEGIN
+SELECT INTO the_sum sum(NEW.i)
+FROM
+new_a;
+RAISE NOTICE 'Total change: %.', the_sum;
+RETURN NULL;
+END;
+$$;
+
+CREATE FUNCTION summarize_a_updates()
+RETURNS TRIGGER LANGUAGE plpgsql
+AS $$
+DECLARE
+the_sum BIGINT;
+BEGIN
+SELECT INTO the_sum sum(COALESCE(NEW.i,0) - COALESCE(OLD.i, 0))
+FROM
+old_a
+JOIN
+new_a
+ON(old_a.id = new_a.id);
+RAISE NOTICE 'Total change: %.', the_sum;
+RETURN NULL;
+END;
+$$;
+
+CREATE FUNCTION summarize_a_deletes()
+RETURNS TRIGGER LANGUAGE plpgsql
+AS $$
+DECLARE
+the_sum BIGINT;
+BEGIN
+SELECT INTO the_sum -1 * sum(OLD.i)
+FROM
+old_a;
+RAISE NOTICE 'Total change: %.', the_sum;
+RETURN NULL;
+END;
+$$;
+
+CREATE TRIGGER statement_after_insert_a
+AFTER INSERT ON a
+REFERENCING
+NEW TABLE AS new_a
+FOR EACH STATEMENT
+EXECUTE PROCEDURE summarize_a_inserts();
+
+CREATE TRIGGER statement_after_update_a
+AFTER UPDATE ON a
+REFERENCING
+OLD TABLE AS old_a
+NEW TABLE AS new_a
+FOR EACH STATEMENT
+EXECUTE PROCEDURE summarize_a_updates();
+
+CREATE TRIGGER statement_after_delete_a
+AFTER DELETE ON a
+REFERENCING
+OLD TABLE AS old_a
+FOR EACH STATEMENT
+EXECUTE PROCEDURE summarize_a_deletes();
+
+INSERT INTO a(i)
+SELECT * FROM generate_series(1,1);
+
+UPDATE a SET i=i+1;
+
+ROLLBACK;
+
diff --git a/contrib/statement_trigger_row/sql/hard_way.sql 
b/contrib/statement_trigger_row/sql/hard_way.sql
new file mode 100644
index 000..c6f7c1d
--- /dev/null
+++ b/contrib/statement_trigger_row/sql/hard_way.sql
@@ -0,0 +1,68 @@
+CREATE TABLE IF NOT EXISTS h(
+i INTEGER
+);
+
+CREATE FUNCTION set_up_h_rows()
+RETURNS TRIGGER LANGUAGE plpgsql
+AS $$
+BEGIN
+CREATE TEMPORARY TABLE IF NOT EXISTS h_rows(LIKE a) ON COMMIT DROP;
+RETURN NULL;
+END;
+$$;
+
+CREATE TRIGGER statement_before_writing_h
+BEFORE INSERT OR UPDATE OR DELETE ON a
+FOR EACH STATEMENT
+EXECUTE PROCEDURE set_up_h_rows();
+
+CREATE OR REPLACE FUNCTION stash_h_row_deltas()
+RETURNS TRIGGER
+LANGUAGE plpgsql
+AS $$
+BEGIN
+INSERT INTO h_rows(i)
+VALUES(
+CASE TG_OP
+WHEN 'INSERT' THEN COALESCE(NEW.i,0)
+   

[HACKERS] [PATCH 0/3] Tau support

2014-06-27 Thread Asbjørn Sloth Tønnesen
Hi,

Please see the following patches for implementing support
for and migrating to Tau in PostgreSQL.

Happy Tau Day Hacking
http://tauday.com/

Make sure to check out this shory story:
  http://tauday.com/a-parable


Asbjørn Sloth Tønnesen (3):
  Implement tau() function
  backend: use M_TAU instead of M_PI
  earthdistance: TWO_PI = M_TAU

 contrib/earthdistance/earthdistance.c | 12 +++-
 doc/src/sgml/func.sgml| 13 +
 src/backend/utils/adt/float.c | 21 +
 src/backend/utils/adt/geo_ops.c   |  8 ++--
 src/include/catalog/pg_proc.h |  2 ++
 src/include/utils/builtins.h  |  1 +
 6 files changed, 46 insertions(+), 11 deletions(-)

-- 
2.0.0



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH 3/3] earthdistance: TWO_PI = M_TAU

2014-06-27 Thread Asbjørn Sloth Tønnesen
Signed-off-by: Asbjørn Sloth Tønnesen asbj...@asbjorn.it
---
 contrib/earthdistance/earthdistance.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/contrib/earthdistance/earthdistance.c 
b/contrib/earthdistance/earthdistance.c
index 6bbebdf..432309c 100644
--- a/contrib/earthdistance/earthdistance.c
+++ b/contrib/earthdistance/earthdistance.c
@@ -6,16 +6,18 @@
 
 #include utils/geo_decls.h   /* for Point */
 
-#ifndef M_PI
-#define M_PI 3.14159265358979323846
+#ifndef M_TAU
+#define M_TAU 6.28318530717958647693
 #endif
 
+#ifndef M_PI
+#define M_PI (M_TAU / 2.0)
+#endif
 
 PG_MODULE_MAGIC;
 
 /* Earth's radius is in statute miles. */
 static const double EARTH_RADIUS = 3958.747716;
-static const double TWO_PI = 2.0 * M_PI;
 
 
 /**
@@ -30,7 +32,7 @@ static const double TWO_PI = 2.0 * M_PI;
 static double
 degtorad(double degrees)
 {
-   return (degrees / 360.0) * TWO_PI;
+   return (degrees / 360.0) * M_TAU;
 }
 
 /**
@@ -67,7 +69,7 @@ geo_distance_internal(Point *pt1, Point *pt2)
/* compute difference in longitudes - want  180 degrees */
longdiff = fabs(long1 - long2);
if (longdiff  M_PI)
-   longdiff = TWO_PI - longdiff;
+   longdiff = M_TAU - longdiff;
 
sino = sqrt(sin(fabs(lat1 - lat2) / 2.) * sin(fabs(lat1 - lat2) / 2.) +
cos(lat1) * cos(lat2) * sin(longdiff / 2.) * 
sin(longdiff / 2.));
-- 
2.0.0



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH 2/3] backend: use M_TAU instead of M_PI

2014-06-27 Thread Asbjørn Sloth Tønnesen
Signed-off-by: Asbjørn Sloth Tønnesen asbj...@asbjorn.it
---
 src/backend/utils/adt/float.c   | 4 ++--
 src/backend/utils/adt/geo_ops.c | 8 ++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 1278053..b0211f8 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -1701,7 +1701,7 @@ degrees(PG_FUNCTION_ARGS)
float8  arg1 = PG_GETARG_FLOAT8(0);
float8  result;
 
-   result = arg1 * (180.0 / M_PI);
+   result = arg1 * (360.0 / M_TAU);
 
CHECKFLOATVAL(result, isinf(arg1), arg1 == 0);
PG_RETURN_FLOAT8(result);
@@ -1737,7 +1737,7 @@ radians(PG_FUNCTION_ARGS)
float8  arg1 = PG_GETARG_FLOAT8(0);
float8  result;
 
-   result = arg1 * (M_PI / 180.0);
+   result = arg1 * (M_TAU / 360.0);
 
CHECKFLOATVAL(result, isinf(arg1), arg1 == 0);
PG_RETURN_FLOAT8(result);
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index 54391fd..429de6d 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -23,8 +23,12 @@
 #include utils/builtins.h
 #include utils/geo_decls.h
 
+#ifndef M_TAU
+#define M_TAU 6.28318530717958647693
+#endif
+
 #ifndef M_PI
-#define M_PI 3.14159265358979323846
+#define M_PI (M_TAU / 2.0)
 #endif
 
 
@@ -5168,7 +5172,7 @@ circle_poly(PG_FUNCTION_ARGS)
SET_VARSIZE(poly, size);
poly-npts = npts;
 
-   anglestep = (2.0 * M_PI) / npts;
+   anglestep = M_TAU / npts;
 
for (i = 0; i  npts; i++)
{
-- 
2.0.0



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH 1/3] Implement tau() function

2014-06-27 Thread Asbjørn Sloth Tønnesen
Signed-off-by: Asbjørn Sloth Tønnesen asbj...@asbjorn.it
---
 doc/src/sgml/func.sgml| 13 +
 src/backend/utils/adt/float.c | 17 +++--
 src/include/catalog/pg_proc.h |  2 ++
 src/include/utils/builtins.h  |  1 +
 4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 551576a..2b48123 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -878,6 +878,19 @@
   row
entry
 indexterm
+ primarytau/primary
+/indexterm
+literalfunctiontau()/function/literal
+   /entry
+   entrytypedp/type/entry
+   entryquotetau;/quote constant/entry
+   entryliteraltau()/literal/entry
+   entryliteral6.28318530717959/literal/entry
+  /row
+
+  row
+   entry
+indexterm
  primarytrunc/primary
 /indexterm
 literalfunctiontrunc(typedp/type or 
typenumeric/type)/function/literal
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 41b3eaa..1278053 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -26,9 +26,12 @@
 #include utils/sortsupport.h
 
 
+#ifndef M_TAU
+#define M_TAU 6.28318530717958647693
+#endif
+
 #ifndef M_PI
-/* from my RH5.2 gcc math.h file - thomas 2000-04-03 */
-#define M_PI 3.14159265358979323846
+#define M_PI (M_TAU / 2.0)
 #endif
 
 /* Visual C++ etc lacks NAN, and won't accept 0.0/0.0.  NAN definition from
@@ -1716,6 +1719,16 @@ dpi(PG_FUNCTION_ARGS)
 
 
 /*
+ * dtau- returns the constant Tau
+ */
+Datum
+dtau(PG_FUNCTION_ARGS)
+{
+   PG_RETURN_FLOAT8(M_TAU);
+}
+
+
+/*
  * radians - returns radians converted from degrees
  */
 Datum
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 0b6105b..4148bed 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -1825,6 +1825,8 @@ DATA(insert OID = 1609 (  radians PGNSP 
PGUID 12 1 0 0 0 f f f f t f i 1 0 701
 DESCR(degrees to radians);
 DATA(insert OID = 1610 (  pi   PGNSP PGUID 12 1 0 0 0 
f f f f t f i 0 0 701  _null_ _null_ _null_ _null_ dpi _null_ _null_ _null_ 
));
 DESCR(PI);
+DATA(insert OID = 1611 (  tau  PGNSP PGUID 12 1 0 0 0 
f f f f t f i 0 0 701  _null_ _null_ _null_ _null_ dtau _null_ _null_ _null_ 
));
+DESCR(Tau);
 
 DATA(insert OID = 1618 (  interval_mul PGNSP PGUID 12 1 0 0 0 f f f f 
t f i 2 0 1186 1186 701 _null_ _null_ _null_ _null_ interval_mul _null_ 
_null_ _null_ ));
 
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index bbb5d39..0258a64 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -408,6 +408,7 @@ extern Datum dsin(PG_FUNCTION_ARGS);
 extern Datum dtan(PG_FUNCTION_ARGS);
 extern Datum degrees(PG_FUNCTION_ARGS);
 extern Datum dpi(PG_FUNCTION_ARGS);
+extern Datum dtau(PG_FUNCTION_ARGS);
 extern Datum radians(PG_FUNCTION_ARGS);
 extern Datum drandom(PG_FUNCTION_ARGS);
 extern Datum setseed(PG_FUNCTION_ARGS);
-- 
2.0.0



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 0/3] Tau support

2014-06-27 Thread Tom Lane
=?UTF-8?q?Asbj=C3=B8rn=20Sloth=20T=C3=B8nnesen?= asbj...@asbjorn.it writes:
 Please see the following patches for implementing support
 for and migrating to Tau in PostgreSQL.

While I don't particularly object to adding a tau() function, the rest of
this seems like change for the sake of change.  What's the point?  And
it had better be a darn good point, because cross-version code differences
are a constant source of maintenance pain for us.  We don't need ones
that have no concrete value.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 0/3] Tau support

2014-06-27 Thread David Fetter
On Fri, Jun 27, 2014 at 06:03:33PM -0700, Tom Lane wrote:
 =?UTF-8?q?Asbj=C3=B8rn=20Sloth=20T=C3=B8nnesen?= asbj...@asbjorn.it writes:
  Please see the following patches for implementing support for and
  migrating to Tau in PostgreSQL.
 
 While I don't particularly object to adding a tau() function, the
 rest of this seems like change for the sake of change.  What's the
 point?  And it had better be a darn good point, because
 cross-version code differences are a constant source of maintenance
 pain for us.  We don't need ones that have no concrete value.

It's Tau day (6.28) in some parts of the world already.  Might that be
the cause?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 0/3] Tau support

2014-06-27 Thread Robert Haas
On Fri, Jun 27, 2014 at 9:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 =?UTF-8?q?Asbj=C3=B8rn=20Sloth=20T=C3=B8nnesen?= asbj...@asbjorn.it writes:
 Please see the following patches for implementing support
 for and migrating to Tau in PostgreSQL.

 While I don't particularly object to adding a tau() function, the rest of
 this seems like change for the sake of change.  What's the point?  And
 it had better be a darn good point, because cross-version code differences
 are a constant source of maintenance pain for us.  We don't need ones
 that have no concrete value.

Perhaps we should also have a pau() function, as proposed here:

http://xkcd.com/1292/

On a related note, I think we should declare vi the officially
favored editor of the PostgreSQL project.

(Note: No, I'm not serious.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-27 Thread Robert Haas
On Fri, Jun 27, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-27 13:04:02 -0400, Robert Haas wrote:
 On Thu, Jun 26, 2014 at 6:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-26 14:13:07 -0700, Tom Lane wrote:
  Surely it had better be a read barrier as well?
 
  I don't immediately see why it has to be read barrier? Hoisting a load
  from after the release into the locked area of code should be safe?
 
  No doubt, but delaying a read till after the unlocking write would
  certainly not be safe.
 
  AFAICT, README.barrier completely fails to define what we think the
  semantics of pg_read_barrier and pg_write_barrier actually are, so if
  you believe that a write barrier prevents reordering of reads relative to
  writes, you'd better propose some new text for that file.  It certainly
  doesn't say that today.

 The relevant text is in barrier.h

 Note that that definition of a write barrier is *not* sufficient for the
 release of a lock... As I said elsewhere I think all the barrier
 definitions, except maybe alpha, luckily seem to be strong enough for
 that anyway.

 Do we want to introduce acquire/release barriers? Or do we want to
 redefine the current barriers to be strong enough for that?

Well, unless we're prepared to dump support for an awful lot of
platfomrs, trying to support acquire and release barriers on every
platform we support is a doomed effort.  The definitions of the
barriers implemented by barrier.h are the same as the ones that Linux
has (minus read-barrier-depends), which I think is probably good
evidence that they are generally useful definitions.  If we were going
to use any of those in s_lock.h, it'd have to be pg_memory_barrier(),
but I don't think making s_lock.h dependent on barrier.h is the way to
go.  I think we should just adjust s_lock.h in a minimal way, using
inline assembler or tweaking the existing assembler or whatever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-27 Thread Robert Haas
On Fri, Jun 27, 2014 at 2:00 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-27 13:15:25 -0400, Robert Haas wrote:
 On Thu, Jun 26, 2014 at 3:04 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I don't really see usecases where it's not related data that's being
  touched, but: The point is that the fastpath (not using a spinlock) might
  touch the atomic variable, even while the slowpath has acquired the
  spinlock. So the slowpath can't just use non-atomic operations on the
  atomic variable.
  Imagine something like releasing a buffer pin while somebody else is
  doing something that requires holding the buffer header spinlock.

 If the atomic variable can be manipulated without the spinlock under
 *any* circumstances, then how is it a good idea to ever manipulate it
 with the spinlock?  That seems hard to reason about, and unnecessary.
 Critical sections for spinlocks should be short and contain only the
 instructions that need protection, and clearly the atomic op is not
 one of those.

 Imagine the situation for the buffer header spinlock which is one of the
 bigger performance issues atm. We could aim to replace all usages of
 that with clever and complicated logic, but it's hard.

 The IMO reasonable (and prototyped) way to do it is to make the common
 paths lockless, but fall back to the spinlock for the more complicated
 situations. For the buffer header that means that pin/unpin and buffer
 lookup are lockless, but IO and changing the identity of a buffer still
 require the spinlock. My attempts to avoid the latter basically required
 a buffer header specific reimplementation of spinlocks.

 To make pin/unpin et al safe without acquiring the spinlock the pincount
 and the flags had to be moved into one variable so that when
 incrementing the pincount you automatically get the flagbits back. If
 it's currently valid all is good, otherwise the spinlock needs to be
 acquired. But for that to work you need to set flags while the spinlock
 is held.

 Possibly you can come up with ways to avoid that, but I am pretty damn
 sure that the code will be less robust by forbidding atomics inside a
 critical section, not the contrary. It's a good idea to avoid it, but
 not at all costs.

You might be right, but I'm not convinced.  Does the lwlock
scalability patch work this way, too?  I was hoping to look at that
RSN, so if there are roughly the same issues there it might shed some
light on this point also.

What I'm basically afraid of is that this will work fine in many cases
but have really ugly failure modes.  That's pretty much what happens
with spinlocks already - the overhead is insignificant at low levels
of contention but as the spinlock starts to become contended the CPUs
all fight over the cache line and performance goes to pot.  ISTM that
making the spinlock critical section significantly longer by putting
atomic ops in there could appear to win in some cases while being very
bad in others.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER SYSTEM RESET?

2014-06-27 Thread Amit Kapila
On Fri, Jun 27, 2014 at 12:27 PM, Vik Fearing vik.fear...@dalibo.com
wrote:
 On 06/27/2014 08:49 AM, Vik Fearing wrote:
  This third patch reformats the documentation in the way I expected it to
  be committed.

 Amit,

 I added this to the next commitfest with your name as reviewer.

No issues, I will review it in next CF.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com