[HACKERS] Transactions involving multiple postgres foreign servers

2015-01-01 Thread Ashutosh Bapat
Hi All,
While looking at the patch for supporting inheritance on foreign tables, I
noticed that if a transaction makes changes to more than two foreign
servers the current implementation in postgres_fdw doesn't make sure that
either all of them rollback or all of them commit their changes, IOW there
is a possibility that some of them commit their changes while others
rollback theirs.

PFA patch which uses 2PC to solve this problem. In pgfdw_xact_callback() at
XACT_EVENT_PRE_COMMIT event, it sends prepares the transaction at all the
foreign postgresql servers and at XACT_EVENT_COMMIT or XACT_EVENT_ABORT
event it commits or aborts those transactions resp.

The logic to craft the prepared transaction ids is rudimentary and I am
open to suggestions for the same. I have following goals in mind while
crafting the transaction ids
1. Minimize the chances of crafting a transaction id which would conflict
with a concurrent transaction id on that foreign server.
2. Because of a limitation described later, DBA/user should be able to
identify the server which originated a remote transaction.
More can be found in comments above function pgfdw_get_prep_xact_id() in
the patch.

Limitations
---
1. After a transaction has been prepared on foreign server, if the
connection to that server is lost before the transaction is rolled back or
committed on that server, the transaction remains in prepared state
forever. Manual intervention would be needed to clean up such a transaction
(Hence the goal 2 above). Automating this process would require significant
changes to the transaction manager, so, left out of this patch, which I
thought would be better right now. If required, I can work on that part in
this patch itself.

2. 2PC is needed only when there are more than two foreign servers involved
in a transaction. Transactions on a single foreign server are handled well
right now. So, ideally, the code should detect if there are more than two
foreign server are involved in the transaction and only then use 2PC. But I
couldn't find a way to detect that without changing the transaction manager.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 116be7d..9492f14 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -42,20 +42,21 @@ typedef struct ConnCacheKey
 } ConnCacheKey;
 
 typedef struct ConnCacheEntry
 {
 	ConnCacheKey key;			/* hash key (must be first) */
 	PGconn	   *conn;			/* connection to foreign server, or NULL */
 	int			xact_depth;		/* 0 = no xact open, 1 = main xact open, 2 =
  * one level of subxact open, etc */
 	bool		have_prep_stmt; /* have we prepared any stmts in this xact? */
 	bool		have_error;		/* have any subxacts aborted in this xact? */
+	char		*prep_xact_name;	/* Name of prepared transaction on this connection */
 } ConnCacheEntry;
 
 /*
  * Connection cache (initialized on first use)
  */
 static HTAB *ConnectionHash = NULL;
 
 /* for assigning cursor numbers and prepared statement numbers */
 static unsigned int cursor_number = 0;
 static unsigned int prep_stmt_number = 0;
@@ -135,20 +136,21 @@ GetConnection(ForeignServer *server, UserMapping *user,
 	 * Find or create cached entry for requested connection.
 	 */
 	entry = hash_search(ConnectionHash, key, HASH_ENTER, found);
 	if (!found)
 	{
 		/* initialize new hashtable entry (key is already filled in) */
 		entry-conn = NULL;
 		entry-xact_depth = 0;
 		entry-have_prep_stmt = false;
 		entry-have_error = false;
+		entry-prep_xact_name = NULL;
 	}
 
 	/*
 	 * We don't check the health of cached connection here, because it would
 	 * require some overhead.  Broken connection will be detected when the
 	 * connection is actually used.
 	 */
 
 	/*
 	 * If cache entry doesn't have a connection, we have to establish a new
@@ -156,20 +158,21 @@ GetConnection(ForeignServer *server, UserMapping *user,
 	 * will be left in a valid empty state.)
 	 */
 	if (entry-conn == NULL)
 	{
 		entry-xact_depth = 0;	/* just to be sure */
 		entry-have_prep_stmt = false;
 		entry-have_error = false;
 		entry-conn = connect_pg_server(server, user);
 		elog(DEBUG3, new postgres_fdw connection %p for server \%s\,
 			 entry-conn, server-servername);
+		entry-prep_xact_name = NULL;
 	}
 
 	/*
 	 * Start a new transaction or subtransaction if needed.
 	 */
 	begin_remote_xact(entry);
 
 	/* Remember if caller will prepare statements */
 	entry-have_prep_stmt |= will_prep_stmt;
 
@@ -507,20 +510,55 @@ pgfdw_report_error(int elevel, PGresult *res, PGconn *conn,
 		if (clear)
 			PQclear(res);
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
 	if (clear)
 		PQclear(res);
 }
 
 /*
+ * pgfdw_get_prep_xact_id
+ * The function crafts prepared transaction identifier. PostgreSQL documentation
+ * mentions two restrictions on the name
+ * 1. String literal, less than 200 bytes long.
+ * 2. Should not be same as any 

Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-01 Thread Fabien COELHO



I was about to go and look at this, but I had a problem when attempting to
compile with MSVC.


Thanks! Here is a v4 which includes your fix.

--
Fabien.diff --git a/contrib/pgbench/.gitignore b/contrib/pgbench/.gitignore
index 489a2d6..aae819e 100644
--- a/contrib/pgbench/.gitignore
+++ b/contrib/pgbench/.gitignore
@@ -1 +1,3 @@
+/exprparse.c
+/exprscan.c
 /pgbench
diff --git a/contrib/pgbench/Makefile b/contrib/pgbench/Makefile
index b8e2fc8..2d4033b 100644
--- a/contrib/pgbench/Makefile
+++ b/contrib/pgbench/Makefile
@@ -4,7 +4,9 @@ PGFILEDESC = pgbench - a simple program for running benchmark tests
 PGAPPICON = win32
 
 PROGRAM = pgbench
-OBJS	= pgbench.o $(WIN32RES)
+OBJS	= pgbench.o exprparse.o $(WIN32RES)
+
+EXTRA_CLEAN	= exprparse.c exprscan.c
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
 PG_LIBS = $(libpq_pgport) $(PTHREAD_LIBS)
@@ -18,8 +20,22 @@ subdir = contrib/pgbench
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
+
+distprep: exprparse.c exprscan.c
 endif
 
 ifneq ($(PORTNAME), win32)
 override CFLAGS += $(PTHREAD_CFLAGS)
 endif
+
+# There is no correct way to write a rule that generates two files.
+# Rules with two targets don't have that meaning, they are merely
+# shorthand for two otherwise separate rules.  To be safe for parallel
+# make, we must chain the dependencies like this.  The semicolon is
+# important, otherwise make will choose the built-in rule for
+# gram.y=gram.c.
+
+exprparse.h: exprparse.c ;
+
+# exprscan is compiled as part of exprparse
+exprparse.o: exprscan.c
diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
new file mode 100644
index 000..243c6b9
--- /dev/null
+++ b/contrib/pgbench/exprparse.y
@@ -0,0 +1,96 @@
+%{
+/*-
+ *
+ * exprparse.y
+ *	  bison grammar for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+#include postgres_fe.h
+
+#include pgbench.h
+
+PgBenchExpr *expr_parse_result;
+
+static PgBenchExpr *make_integer_constant(int64 ival);
+static PgBenchExpr *make_variable(char *varname);
+static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
+		PgBenchExpr *rexpr);
+
+%}
+
+%expect 0
+%name-prefix=expr_yy
+
+%union
+{
+	int64		ival;
+	char	   *str;
+	PgBenchExpr *expr;
+}
+
+%type expr expr
+%type ival INTEGER
+%type str VARIABLE
+%token INTEGER VARIABLE
+%token CHAR_ERROR /* never used, will raise a syntax error */
+
+%left	'+' '-'
+%left	'*' '/' '%'
+%right	UMINUS
+
+%%
+
+result: expr{ expr_parse_result = $1; }
+
+expr: '(' expr ')'			{ $$ = $2; }
+	| '+' expr %prec UMINUS	{ $$ = $2; }
+	| '-' expr %prec UMINUS	{ $$ = make_op('-', make_integer_constant(0), $2); }
+	| expr '+' expr			{ $$ = make_op('+', $1, $3); }
+	| expr '-' expr			{ $$ = make_op('-', $1, $3); }
+	| expr '*' expr			{ $$ = make_op('*', $1, $3); }
+	| expr '/' expr			{ $$ = make_op('/', $1, $3); }
+	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
+	| INTEGER{ $$ = make_integer_constant($1); }
+	| VARIABLE { $$ = make_variable($1); }
+	;
+
+%%
+
+static PgBenchExpr *
+make_integer_constant(int64 ival)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_INTEGER_CONSTANT;
+	expr-u.integer_constant.ival = ival;
+	return expr;
+}
+
+static PgBenchExpr *
+make_variable(char *varname)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_VARIABLE;
+	expr-u.variable.varname = varname;
+	return expr;
+}
+
+static PgBenchExpr *
+make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_OPERATOR;
+	expr-u.operator.operator = operator;
+	expr-u.operator.lexpr = lexpr;
+	expr-u.operator.rexpr = rexpr;
+	return expr;
+}
+
+#include exprscan.c
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
new file mode 100644
index 000..4c9229c
--- /dev/null
+++ b/contrib/pgbench/exprscan.l
@@ -0,0 +1,105 @@
+%{
+/*-
+ *
+ * exprscan.l
+ *	  a lexical scanner for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+/* line and column number for error reporting */
+static int	yyline = 0, yycol = 0;
+
+/* Handles to the buffer that the lexer uses internally */
+static YY_BUFFER_STATE scanbufhandle;
+static char *scanbuf;
+static int	scanbuflen;
+
+/* flex 2.5.4 doesn't bother with a decl for this */
+int expr_yylex(void);
+
+%}
+
+%option 8bit
+%option never-interactive
+%option nodefault

Re: [HACKERS] Parallel Seq Scan

2015-01-01 Thread Amit Kapila
On Wed, Dec 31, 2014 at 9:46 PM, Thom Brown t...@linux.com wrote:

 Another issue (FYI, pgbench2 initialised with: pgbench -i -s 100 -F 10
pgbench2):


  ➤ psql://thom@[local]:5488/pgbench2

 # explain (analyse, buffers, verbose) select distinct bid from
pgbench_accounts;
 server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.
 Time: 14897.991 ms

 2014-12-31 15:21:57 GMT [9164]: [260-1] user=,db=,client= LOG:  server
process (PID 10869) was terminated by signal 9: Killed
 2014-12-31 15:21:57 GMT [9164]: [261-1] user=,db=,client= DETAIL:  Failed
process was running: explain (analyse, buffers, verbose) select distinct
bid from pgbench_accounts;
 2014-12-31 15:21:57 GMT [9164]: [262-1] user=,db=,client= LOG:
 terminating any other active server processes

 Running it again, I get the same issue.  This is with
parallel_seqscan_degree set to 8, and the crash occurs with 4 and 2 too.

 This doesn't happen if I set the pgbench scale to 50.  I suspect this is
a OOM issue.  My laptop has 16GB RAM, the table is around 13GB at scale
100, and I don't have swap enabled.  But I'm concerned it crashes the whole
instance.


Isn't this a backend crash due to OOM?
And after that server will restart automatically.

 I also notice that requesting BUFFERS in a parallel EXPLAIN output yields
no such information.
 --

Yeah and the reason for same is that all the work done related
to BUFFERS is done by backend workers, master backend
doesn't read any pages, so it is not able to accumulate this
information.

 Is that not possible to report?

It is not impossible to report such information, we can develop some
way to share such information between master backend and workers.
I think we can do this if required once the patch is more stablized.


Thanks for looking into patch and reporting the issues.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-01 Thread Andres Freund
On 2015-01-01 11:55:03 -0500, Robert Haas wrote:
 On Mon, Dec 29, 2014 at 6:48 PM, Andres Freund and...@2ndquadrant.com wrote:
  Andres reported the above 2x pgbench difference in February, but no
  action was taken as everyone felt there needed to be more performance
  testing, but it never happened:
 
  FWIW, I have no idea what exactly should be tested there. Right now I
  think what we should do is to remove the BUFFERALIGN from shmem.c and
  instead add explicit alignment code in a couple callers
  (BufferDescriptors/Blocks, proc.c stuff).
 
 That seems like a strange approach.  I think it's pretty sensible to
 try to ensure that allocated blocks of shared memory have decent
 alignment, and we don't have enough of them for aligning on 64-byte
 boundaries (or even 128-byte boundaries, perhaps) to eat up any
 meaningful amount of memory.  The BUFFERALIGN() stuff, like much else
 about the way we manage shared memory, has also made its way into the
 dynamic-shared-memory code.  So if we do adjust the alignment that we
 guarantee for the main shared memory segment, we should perhaps adjust
 DSM to match.  But I guess I don't understand why you'd want to do it
 that way.

The problem is that just aligning the main allocation to some boundary
doesn't mean the hot part of the allocation is properly aligned. shmem.c
in fact can't really do much about that - so fully moving the
responsibility seems more likely to ensure that future code thinks about
alignment.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-01 Thread Robert Haas
On Mon, Dec 29, 2014 at 6:48 PM, Andres Freund and...@2ndquadrant.com wrote:
 Andres reported the above 2x pgbench difference in February, but no
 action was taken as everyone felt there needed to be more performance
 testing, but it never happened:

 FWIW, I have no idea what exactly should be tested there. Right now I
 think what we should do is to remove the BUFFERALIGN from shmem.c and
 instead add explicit alignment code in a couple callers
 (BufferDescriptors/Blocks, proc.c stuff).

That seems like a strange approach.  I think it's pretty sensible to
try to ensure that allocated blocks of shared memory have decent
alignment, and we don't have enough of them for aligning on 64-byte
boundaries (or even 128-byte boundaries, perhaps) to eat up any
meaningful amount of memory.  The BUFFERALIGN() stuff, like much else
about the way we manage shared memory, has also made its way into the
dynamic-shared-memory code.  So if we do adjust the alignment that we
guarantee for the main shared memory segment, we should perhaps adjust
DSM to match.  But I guess I don't understand why you'd want to do it
that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-01 Thread Fabien COELHO


Hello David,


At the moment I feel the patch is a bit half done. I really think that
since the gaussian and exponential stuff was added in commit ed802e7d, that
this should now be changed so that we have functions like random(),
erandom() and grandom() and the way to use this becomes:



\set name random(1,10)
\set name erandom(1,10)

And we completely pull out the new \\setrandom additions added in that
commit. We'd likely keep \\setrandom name 1 10 for backwards compatibility.



Does anyone else feel strongly about fixing this now, while we have the
chance?


I thought about adding functions, possibly random, very probably abs  
some hash, but I felt it would be more for a second round.


The other point is that although uniform random is fine, the gaussian and 
exponential ones require an additional floating point argument which means 
handling some typing.


The current patch is just about handling operators as before, although 
in a much nicer and extensible way, thus I would suggest to let Robert's 
patch more or less as it is, and people will be able to propose new 
extensions later on.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-01 Thread David Rowley
On 1 January 2015 at 21:23, Fabien COELHO coe...@cri.ensmp.fr wrote:


  I was about to go and look at this, but I had a problem when attempting to
 compile with MSVC.


 Thanks! Here is a v4 which includes your fix.


Thank you.

I've had a quick look at the patch as I'm quite interested in seeing some
improvements in this area.

At the moment I feel the patch is a bit half done. I really think that
since the gaussian and exponential stuff was added in commit ed802e7d, that
this should now be changed so that we have functions like random(),
erandom() and grandom() and the way to use this becomes:

\set name random(1,10)
\set name erandom(1,10)

And we completely pull out the new \\setrandom additions added in that
commit. We'd likely keep \\setrandom name 1 10 for backwards compatibility.

Does anyone else feel strongly about fixing this now, while we have the
chance?

Regards

David Rowley


Re: [HACKERS] Parallel Seq Scan

2015-01-01 Thread Robert Haas
On Thu, Jan 1, 2015 at 12:00 PM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 Can we check the number of free bgworkers slots to set the max workers?

The real solution here is that this patch can't throw an error if it's
unable to obtain the desired number of background workers.  It needs
to be able to smoothly degrade to a smaller number of background
workers, or none at all.  I think a lot of this work will fall out
quite naturally if this patch is reworked to use the parallel
mode/parallel context stuff, the latest version of which includes an
example of how to set up a parallel scan in such a manner that it can
run with any number of workers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-01 Thread Fabrízio de Royes Mello
 I think one thing we could do minimize the chance of such an
 error is set the value of parallel workers to be used for plan equal
 to max_worker_processes if parallel_seqscan_degree is greater
 than max_worker_processes.  Even if we do this, still such an
 error can come if user has registered bgworker before we could
 start parallel plan execution.


Can we check the number of free bgworkers slots to set the max workers?

Regards,

Fabrízio Mello




-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] Additional role attributes superuser review

2015-01-01 Thread Magnus Hagander
On Wed, Dec 31, 2014 at 4:23 PM, Stephen Frost sfr...@snowman.net wrote:

 * Magnus Hagander (mag...@hagander.net) wrote:
  On Wed, Dec 31, 2014 at 3:08 PM, Stephen Frost sfr...@snowman.net
 wrote:
   * Magnus Hagander (mag...@hagander.net) wrote:
   I think having it do exactly what pg_dump needs, and not things like
execute functions etc, would be the thing people want for a 'DUMP'
privilege.
  
   What if we want pg_dump in 9.6 to have an option to execute xlog_pause
   and xlog_resume for you?  You wouldn't be able to run that against a
 9.5
   database (or at least, that option wouldn't work).
 
  It would if you added an explicit grant for it, which would have to be
  documented.

 Huh?  An explicit grant for xlog_pause/xlog_resume won't work as we
 check role attributes rights inside the function..


Correct, of course. I was confusing myself.


  We've discussed having a role attribute for COPY-from-filesystem, but
   pg_dump doesn't use that ever, it only uses COPY TO STDOUT.  I'm not
   a fan of making a COPY_TO_STDOUT-vs-SELECT distinction just for this..
 
  Yeah, it's probably going overboard with it, since AFAICT the only thing
  that would actually be affected is RULEs on SELECT, which I bet most
 people
  don't use on their tables.

 Well, we could make SELECT not work, but if you've got COPY then you can
 still get all the data, so, yeah, not much different.  I seriously doubt
 many people are using rules..


Yeah.


   We could/should also throw a WARNING if DUMP Is granted to a role
 without
BYPASSRLS in case row level security is enabled in the system, I
 think.
   But
that's more of an implementation detail.
  
   That's a bit ugly and RLS could be added to a relation after the DUMP
   privilege is granted.
 
  Yes, it's not going to be all-covering, but it can still be a useful
  hint/warning in the cases where it *does* that. We obviously still need
  pg_dump to give the error in both scenarios.

 I'm not against doing it, personally, but I suspect others won't like it
 (or at least, that's been the case in the past with other things..).


Heh, let's defer to a third party then :)


  Ok, I see the point you're making that we could make this into a
   capability which isn't something which can be expressed through our
   existing GRANT system.  That strikes me as a solution trying to find a
   problem though.  There's no need to invent such an oddity for this
   particular use-case, I don't think.
 
  Maybe not, but we should make sure we don't paint ourselves into a corner
  where we cannot do it in the future either.

 Agreed.  Do you see a risk here of that?


Not really, anymore, i think :)


  For regular permissions, we could just pre-populate the system with
predefined roles and use regular GRANTs to those roles, instead of
   relying
on role attributes, which might in that case make it even more clear?
  
   The reason for this approach is to address exactly the nightmare that
 is
   trying to maintain those regular permissions across all the objects in
   the system.  Today, people simply give the role trying to do the
 pg_dump
   superuser, which is the best option we have.  Saying 'grant SELECT on
   all the tables and USAGE on all the schemas' isn't suggested because
   it's a huge pain.  This role attribute provides a middle ground where
   the pg_dump'ing role isn't a superuser, but you don't have to ensure
   usage and select are granted to it for every relation.
 
  No, what I'm saying is we could have *predefined role* that allows
 select
  on all the tables and usage on all the schemas. And you are unable to
  actually remove that. It's not stored on the object, so you cannot REVOKE
  the permission on the *object*. Since it's not store on the object it
 will
  also automatically apply to all new objects, regardless of what you've
 done
  with DEFAULT PRIVILEGES etc.
 
  But you can grant users and other roles access to this role, and it's
 dealt
  with like other roles from the who has it perspective, instead of being
  special-cased.
 
  Instead of ALTER USER foo WITH DUMP (or whatever), you'd just do a
 GRANT
  pgdump TO foo (which would then also work through things like group
  membership as well)

 There's a definite backwards-compatibility concern with this, of course,
 but I see where you're coming from.  This only really applies with this
 particular pg_dump-related role-attribute discussion though, right?


Yes, I believe so. Because it's so similar to the regular permissions,
where as something like being able to take a base backup is more of a
boolean - it doesn't apply to actual objects in the database cluster, it's
more global.


This role wouldn't be able to be logged in with, I presume?


Definitely not.


 Would it
 show up when you run \du?  What about in pg_authid?  I feel like it
 would have to and I worry that users might not care for it- and what
 happens if they want to remove it?


Well, going by experience from other 

Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-01 Thread Bruce Momjian
On Thu, Jan  1, 2015 at 05:59:25PM +0100, Andres Freund wrote:
  That seems like a strange approach.  I think it's pretty sensible to
  try to ensure that allocated blocks of shared memory have decent
  alignment, and we don't have enough of them for aligning on 64-byte
  boundaries (or even 128-byte boundaries, perhaps) to eat up any
  meaningful amount of memory.  The BUFFERALIGN() stuff, like much else
  about the way we manage shared memory, has also made its way into the
  dynamic-shared-memory code.  So if we do adjust the alignment that we
  guarantee for the main shared memory segment, we should perhaps adjust
  DSM to match.  But I guess I don't understand why you'd want to do it
  that way.
 
 The problem is that just aligning the main allocation to some boundary
 doesn't mean the hot part of the allocation is properly aligned. shmem.c
 in fact can't really do much about that - so fully moving the
 responsibility seems more likely to ensure that future code thinks about
 alignment.

Yes, there is shared memory allocation alignment and object alignment. 
Since there are only about 50 cases of these, a worst-case change to
force 64-byte alignment would only cost 3.2k of shared memory.

It might make sense to make them all 64-byte aligned to reduce CPU cache
contention, but we have to have actual performance numbers to prove
that.  My two patches allow individual object alignment to be tested.  I
have not been able to see any performance difference (1%) with:

$ pgbench --initialize --scale 100 pgbench
$ pgbench --protocol prepared --client 32 --jobs 16 --time=100 
--select-only pgbench

on my dual-socket 16 vcore server.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-01 Thread Bruce Momjian
On Thu, Jan  1, 2015 at 09:04:48PM +0100, Andres Freund wrote:
 On January 1, 2015 8:49:06 PM CET, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 1, 2015 at 11:59 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  The problem is that just aligning the main allocation to some
 boundary
  doesn't mean the hot part of the allocation is properly aligned.
 shmem.c
  in fact can't really do much about that - so fully moving the
  responsibility seems more likely to ensure that future code thinks
 about
  alignment.
 
 That's true, but if you don't align the beginnings of the allocations,
 then it's a lot more complicated for the code to properly align stuff
 within the allocation.  It's got to insert a variable amount of
 padding based on the alignment it happens to get.
 
 Hm? Allocate +PG_CACHELINE_SIZE and do var = CACHELINEALIGN(var).

Yeah, I am afraid we have to do that anyway --- you can see it in my
patch too.  I guess if you had the shmem allocation aligned on 64-byte
boundries and required all allocations to be a multiple of 64, that
would work too.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-01 Thread Andres Freund
On January 1, 2015 8:49:06 PM CET, Robert Haas robertmh...@gmail.com wrote:
On Thu, Jan 1, 2015 at 11:59 AM, Andres Freund and...@2ndquadrant.com
wrote:
 The problem is that just aligning the main allocation to some
boundary
 doesn't mean the hot part of the allocation is properly aligned.
shmem.c
 in fact can't really do much about that - so fully moving the
 responsibility seems more likely to ensure that future code thinks
about
 alignment.

That's true, but if you don't align the beginnings of the allocations,
then it's a lot more complicated for the code to properly align stuff
within the allocation.  It's got to insert a variable amount of
padding based on the alignment it happens to get.

Hm? Allocate +PG_CACHELINE_SIZE and do var = CACHELINEALIGN(var).


-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-01 Thread Robert Haas
On Thu, Jan 1, 2015 at 11:59 AM, Andres Freund and...@2ndquadrant.com wrote:
 The problem is that just aligning the main allocation to some boundary
 doesn't mean the hot part of the allocation is properly aligned. shmem.c
 in fact can't really do much about that - so fully moving the
 responsibility seems more likely to ensure that future code thinks about
 alignment.

That's true, but if you don't align the beginnings of the allocations,
then it's a lot more complicated for the code to properly align stuff
within the allocation.  It's got to insert a variable amount of
padding based on the alignment it happens to get.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-01 Thread Alvaro Herrera
David Rowley wrote:

 At the moment I feel the patch is a bit half done. I really think that
 since the gaussian and exponential stuff was added in commit ed802e7d, that
 this should now be changed so that we have functions like random(),
 erandom() and grandom() and the way to use this becomes:
 
 \set name random(1,10)
 \set name erandom(1,10)
 
 And we completely pull out the new \\setrandom additions added in that
 commit.

Sounds good to me.  The current syntax is rather messy IMV, and
presumably a function-based syntax might be better.

 We'd likely keep \\setrandom name 1 10 for backwards compatibility.

Yep.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-01 Thread Peter Geoghegan
I've been working on fixing the bugs that Jeff Janes found [1] with
approach #2 to value locking [2]. Approach #1 was unaffected.

I'm starting this new thread, to discuss these issues. Let's try and
confine discussion of semantics and syntax to the main thread, since
that has been what has mostly been discussed there.

Jeff produced a stress testing tool which tests concurrent deletions
(and not just concurrent updates); his test case deletes tuples when a
counter value goes under zero, which happens occasionally. It also
independently does book keeping for all values in a range of values
that are randomly incremented or decremented, and verifies that
everything is as it should be. This revealed problems with #2 around
concurrent super deletion of unkept promise heap tuples. As Jeff
pointed out, all-NULL tuples were appearing as visible, or spurious
duplicates appeared. For example:

postgres=# select xmin, xmax, ctid, * from upsert_race_test where index = 1287;
  xmin  |  xmax  |   ctid| index  | count
++---++
  0 | 205438 | (260,27)  | [null] | [null]
 176066 |  0 | (258,158) |   1287 |  1
(2 rows)

I should point out that one of the problems I found here was a garden
variety bug, which has been fixed (the conflict flag in ExecInsert()
was not being reset correctly, I believe). However, the other
remaining problems could only be fixed with further changes to the
routines in tqual.c, which I'm not happy about, and require further
discussion here. I attach a differential patch which applies on top of
the most recent revision of #2, which is v1.8.vallock2.tar.gz [3].

Note that I've been extending Jeff's stress testing tool to make it
highlight problems earlier, and to make it provide more and better
instrumentation (e.g. printing of XIDs to correlate with pg_xlogdump
output). A good trick I added was to have an INSERT...ON CONFLICT
UPDATE predicate of WHERE TARGET.index = EXCLUDED.index, which
serves as a kind of assertion when used within the Perl script (the
Perl scripts checks RETURNING-projected tuples, and expects to always
insert or update - that WHERE clause should always pass, obviously).
Jeff's tool is available here (with my revisions):
https://github.com/petergeoghegan/jjanes_upsert

I've been running with fsync off, which Jeff felt increased the
likelihood of revealing the bug. It isn't necessary, though, and FWIW
it was easy for me to recreate this problem once I understood it,
using only my 4 core laptop. I found it important to build without
assertions and at optimization level -O2, though.

The attached patch fixes Jeff's test case, as far as it goes. But, as
I mentioned, it's not intended as a real fix. For one thing, I have
made multiple changes to HeapTupleSatisfies*() routines, but haven't
fully considered the consequences for, say, ri_triggers.c.
HeapTupleSatisfiesSelf() and HeapTupleSatisfiesHistoricMVCC() were not
modified. I've modified HeapTupleSatisfiesVacuum() to see
InvalidTransactionID (not invalid xid infomask bits) as visible for
garbage collection (alongside HeapTupleSatisfiesDirty() and
HeapTupleSatisfiesMVCC(), which I changed first), and I wouldn't be
surprised if that created new problems in other areas. In short, this
patch is just for illustrative purposes.

To see the difference this patch makes, I suggest commenting out the
new code, and running the following test after a fresh initdb:

$ perl count_upsert.pl 4 10

I think that the actual nature of the problem I fixed was that
concurrent upserters felt entitled to update a promise tuple that they
could at least initially see, but was then super deleted, but it
turned out that once it was super deleted it was okay to update (the
tuple wasn't actually duplicated insofar as HeapTupleSatisfiesDirty()
was then able to ascertain due to other concurrent activity). It's
pretty complicated, in any case. This is just a band-aid.

Thoughts? Does anyone have any thoughts on my diagnosis of the problem?

[1] 
http://www.postgresql.org/message-id/CAMkU=1w8e9Q6ZX3U85RtsyCMpdYWFrvVAO3=unevtqiruzn...@mail.gmail.com
[2] 
https://wiki.postgresql.org/wiki/Value_locking#.232._.22Promise.22_heap_tuples_.28Heikki_Linnakangas.29
[3] 
http://www.postgresql.org/message-id/cam3swzrg_htrol-6_wfe6_d_ucuyc28jfapsfh_tra76gkk...@mail.gmail.com
-- 
Peter Geoghegan
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4e5be1a..9bb13df 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -313,8 +313,8 @@ ExecInsert(TupleTableSlot *slot,
 		 * deleting the already-inserted tuple and retrying, but that's fairly
 		 * expensive, so we try to avoid it.
 		 */
-		conflict = false;
 vlock:
+		conflict = false;
 		ItemPointerSetInvalid(conflictTid);
 
 		/*
@@ -354,7 +354,8 @@ vlock:
 			 * which is appropriate only for non-promise tuples) to wait for us
 			 * to decide if we're going to go ahead with the 

Re: [HACKERS] orangutan seizes up during isolation-check

2015-01-01 Thread Noah Misch
On Wed, Dec 31, 2014 at 01:55:23PM -0500, Dave Cramer wrote:
 So at this point removing the  --enable-nls from my config will solve the
 build problem.
 
 Everyone knows there is an issue so there is no point in continuing to have
 it fail.

We hope all packagers will build with --enable-nls, so OS X buildfarm coverage
thereof will be valuable.  Let me see what I can pull together over the next
several weeks toward getting it green.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers